Ducky Thesis Proposal Notes
Problem Statement
I propose to study which techniques developers use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so via a user study that captures the questions that developers ask about the code and what IDE interactions they take to answer them.
Problems:
- P1: We (software engineering researchers) do not know what different low-level techniques developers use when developing code using an IDE.
- P1.1: We do not have a shared vocabulary for discussing different techniques that developers use when developing code with IDEs. (?)
- P2: We do not know which techniques are the most productive.
- P3: We do not know how to teach/train developers how to be more productive.
- P4: Coding video is expensive, annoying, and it is difficult to be consistent in coding (especially when one doesn't know what one is looking for)
Givens (right word?):
Robillard et al. showed:
- G1: Different people use different techniques for locating relevant pieces of code.
- G2: Charactaristic interaction patterns reflecting those techniques can be discovered by analyzing coded transcripts of video of users navigating code.
- G3: Success at finding relevant pieces of code correlates with what technique(s) the developer uses.
Hypotheses:
- H1: These characteristic interaction patterns can be discovered by analyzing interaction telemetry of navigation tasks.
- H2: Software can recognize those patterns in navigation tasks.
- H3: Data mining software can discover interesting interaction patterns in navigation tasks.
- H4: Data mining software can discover interesting interaction patterns in more general code-development tasks.
- H5: Success in coding tasks correlates with which interaction patterns the developer uses.
- H6: Patterns are similar across tools.
- H7: Patterns are similar across languages.
I plan to do H1, H2, and H3. I hope to also do H4.
Literature Review
- Robillard et al
- Murphy/Kersten/Findlater
- BSD et al (unpublished)
- Jonathan Sillito's questions
- Andrew Ko ISCE05
Proposed data-gathering methods
I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.
For further work, I have access to
- from the Mylar bugzilla: many individual compressed logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the context
- from the glob: many individual logs of traces of a large number of developers working on unknown material, without the code available
We do not have data corresponding to the second part of Robillard et al's study, where a small number of professional programmers all do the same task. I might need to run a user study replicating that part.
Proposed analysis methods
I will use three techniques to analyze the data:
- eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
- protein-motif finding algorithm -- I will use a modified protein motif-finding algorithm to search for common patterns, and judgement to select interesting ones.
- data-visualization and mining tools, e.g. YALE -- I will use data mining and visualization tools to search through the patterns.
- (potentially, tho hopefully not) write my own algorithm and/or modify an existing algorithm
Having found patterns, I will write code to
- recognize those patterns
- (optionally?) to generate reports with
- what patterns were found
- description of the pattern distribution compared with the broader population's distribution
- ?proscriptive advice on more effective techniques for coding?