Ducky Thesis Proposal Notes

Problem Statement

I propose to study which techniques use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so by replicating part or all of Robillard et al's work.

Problems:

  • P1: We (software engineering researchers) do not know what different low-level techniques developers use when developing code using an IDE.
    • P1.1: We do not have a shared vocabulary for discussing different techniques that developers use when developing code with IDEs. (?)
  • P2: We do not know which techniques are the most productive.
  • P3: We do not know how to teach/train developers how to be more productive.
  • P4: Coding video is expensive, annoying, and it is difficult to be consistent in coding (especially when one doesn't know what one is looking for)

Givens (right word?):

Robillard et al. showed:
  • G1: Different people use different techniques for locating relevant pieces of code.
  • G2: Charactaristic interaction patterns reflecting those techniques can be discovered by analyzing coded transcripts of video of users navigating code.
  • G3: Success at finding relevant pieces of code correlates with what technique(s) the developer uses.

Hypotheses:

  • H1: These characteristic interaction patterns can be discovered by analyzing interaction telemetry of navigation tasks.
  • H2: Software can recognize those patterns in navigation tasks.
  • H3: Data mining software can discover interesting interaction patterns in navigation tasks.
  • H4: Data mining software can discover interesting interaction patterns in more general code-development tasks.
  • H5: Success in coding tasks correlates with which interaction patterns the developer uses.
  • H6: Patterns are similar across tools.
  • H7: Patterns are similar across languages.

I plan to do H1, H2, and H3. I hope to also do H4.

Literature Review

  • Robillard et al
  • Murphy/Kersten/Findlater
  • BSD et al (unpublished)
  • Jonathan Sillito's questions
  • Andrew Ko ISCE05

Proposed data-gathering methods

I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.

For further work, I have access to

  • from the Mylar bugzilla: many individual compressed logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the context
  • from the glob: many individual logs of traces of a large number of developers working on unknown material, without the code available

We do not have data corresponding to the second part of Robillard et al's study, where a small number of professional programmers all do the same task. I might need to run a user study replicating that part.

Proposed analysis methods

I will use three techniques to analyze the data:

  1. eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
  2. protein-motif finding algorithm -- I will use a modified protein motif-finding algorithm to search for common patterns, and judgement to select interesting ones.
  3. data-visualization and mining tools, e.g. YALE -- I will use data mining and visualization tools to search through the patterns.
  4. (potentially, tho hopefully not) write my own algorithm and/or modify an existing algorithm

Having found patterns, I will write code to

  • recognize those patterns
  • (optionally?) to generate reports with
    • what patterns were found
    • description of the pattern distribution compared with the broader population's distribution
    • ?proscriptive advice on more effective techniques for coding?
Edit | Attach | Watch | Print version | History: r25 | r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r17 - 2006-11-21 - DuckySherwood
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback