Difference: DuckyThesisProposalNotes (1 vs. 25)

Revision 252007-03-21 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 62 to 62
 
  • Group A will be collaborators -- graduate students in CS specializing in software practices or human-computer interfaces.
  • Group B will be novice programmers -- undergraduate students in CS.
  • Group C will be experienced programmers -- professional code developers.
Changed:
<
<
Group A will have nine participants; Groups B and C will have six to ten participants each.
>
>
Group A will have four participants; Groups B and C will have six to ten participants each.
  For each study, I will observe two people from a given cohort pair-programming -- working (and talking) together -- using the Eclipse Integrated Development Environment (IDE) to perform three to five different (small) programming assignments on a large code base.

Revision 242007-03-07 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 64 to 64
 
  • Group C will be experienced programmers -- professional code developers.
Group A will have nine participants; Groups B and C will have six to ten participants each.
Changed:
<
<
Each of the groups will be videotaped while they use Eclipse to perform three to five different (small) programming assignments on a large code base. Their interactions with Eclipse will be logged using the Mylar Monitor, and the Mylar Monitor logs and the videotape will be synchronized.
>
>
For each study, I will observe two people from a given cohort pair-programming -- working (and talking) together -- using the Eclipse Integrated Development Environment (IDE) to perform three to five different (small) programming assignments on a large code base.
 
Changed:
<
<
The tasks will be designed to attempt to force the users to ask specific, relatively difficult types of questions. We expect that they will also ask a number of "easier" questions in the course of answering the more complex questions.
>
>
Before the pair-programming starts, I will give the programmers a very brief survey to determine how much experience they have with programming in general, how much experience they have with Java, and how much experience they have with the Eclipse IDE.
 
Changed:
<
<
I will run pilots of the study with people in cohort A. Three pilots will be talk-aloud done by individual collaborators (who are sophisticated enough to give good talk-aloud data). In addition to the videotaping/Mylar Monitor logs, I will perform a semi-structured interview with these three people. These three tests and interviews will be used to iteratively improve the study design.
>
>
I will videotape their interactions with each other. Their interactions with Eclipse will be logged using the Mylar Monitor, and the Mylar Monitor logs and the videotape will be synchronized.
 
Changed:
<
<
Two studies of cohort A will use pair-programming. In all five cases, I will do a semi-structured interview with the participants to gauge the effectiveness of the study. Afterwards, I will run two pilots with people drawn from the Group A pool, where they do the same tasks, but as a pair-programming team. (In addition to the data gathered from the team, this ensures that I will be competent to run the subsequent study.)
>
>
(Note to Gail: I sort of think I'd also like to do a screen capture. Is there any downside to doing that?)
 
Changed:
<
<
Finally, I will run the study with at least three pair-programming teams from cohort B and three from cohort C.
>
>
I will keep the code that they develop. While I do not forsee using the code, it is easy to keep it, minimally invasive, and potentially useful in unforseen ways.

The tasks will be designed to attempt to force the users to ask specific, relatively difficult types of questions. We expect that they will also ask a number of "easier" questions in the course of answering the more complex questions.

I will pilot the study with two pair-programming teams from cohort A. In addition to the videotaping and Mylar Monitor, I will do a semi-structured interview with the participants to gauge the effectiveness of the study.

 
Added:
>
>
Finally, I will run the study with at least three pair-programming teams from cohort B and three from cohort C.
 

Revision 232007-03-06 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 36 to 36
 
  • H1: There are a few (and only a few) different techniques for pulling answers to a given question out of a code base via an IDE.
  • H2: Which technique people use to pull information out of the source code via the IDE correlates with how fast they can at answering the question.
  • H3: Which technique people use to pull information out of the source code via the IDE correlates with the quality of their solution.
Changed:
<
<
  • H4: We can figure out when people get stuck by the patterns of their interactions with the IDE.
>
>
  • H4: Which technique people use to pull information out of the source code via the IDE correlates with how much experience they have programming.
  • H5: Which technique people use to pull information out of the source code via the IDE correlates with how much experience they have with the IDE.
  • H6: We can figure out when people get stuck by the patterns of their interactions with the IDE.
 
Line: 50 to 53
 
  • Anneliese von Mayrhauser/Andrews tbd
  • Robert DeLine tbd
  • Gina Venolia (at least check out again) Note to Gail: Gina has done some interesting email stuff AND a friend at Microsoft insists that I should meet her because I would like her; I can easily imagine going down to Seattle sometime and having coffee with her.
Added:
>
>
  • PPIG (NB: good stuff, fertile ground)
 
  • (programmer variability studies: Sackman/Erikson/Grant, Dickey, Curtis, DeMarco and Lister)

Proposed data-gathering methods

Changed:
<
<
10-15 pairs of programmers will be videotaped using Eclipse to perform three to five different (small) programming assignments. Their interactions with Eclipse will be logged using the Mylar Monitor.
>
>
Three cohorts of programmers will be observed:
  • Group A will be collaborators -- graduate students in CS specializing in software practices or human-computer interfaces.
  • Group B will be novice programmers -- undergraduate students in CS.
  • Group C will be experienced programmers -- professional code developers.
Group A will have nine participants; Groups B and C will have six to ten participants each.

Each of the groups will be videotaped while they use Eclipse to perform three to five different (small) programming assignments on a large code base. Their interactions with Eclipse will be logged using the Mylar Monitor, and the Mylar Monitor logs and the videotape will be synchronized.

  The tasks will be designed to attempt to force the users to ask specific, relatively difficult types of questions. We expect that they will also ask a number of "easier" questions in the course of answering the more complex questions.
Changed:
<
<
Prior to the large study, I will have casual discussions with a number of my collaborators in the Software Practices Lab to assess likely candidates for tasks that will elicit specific questions.
>
>
I will run pilots of the study with people in cohort A. Three pilots will be talk-aloud done by individual collaborators (who are sophisticated enough to give good talk-aloud data). In addition to the videotaping/Mylar Monitor logs, I will perform a semi-structured interview with these three people. These three tests and interviews will be used to iteratively improve the study design.

Two studies of cohort A will use pair-programming. In all five cases, I will do a semi-structured interview with the participants to gauge the effectiveness of the study. Afterwards, I will run two pilots with people drawn from the Group A pool, where they do the same tasks, but as a pair-programming team. (In addition to the data gathered from the team, this ensures that I will be competent to run the subsequent study.)

Finally, I will run the study with at least three pair-programming teams from cohort B and three from cohort C.

 
Deleted:
<
<
I will run pilots of the study with collaborators in Software Practices and HCI. Three pilots will be talk-aloud done by individual collaborators (who are sophisticated enough to give good talk-aloud data). Two will be pair-programming. In all five cases, I will do a semi-structured interview with the participants to gauge the effectiveness of the study.
 
Deleted:
<
<
I will then run three pilots with people who are not familiar with this project and its goals.
 

Proposed analysis methods

Changed:
<
<
I will watch the video specifically to look for which questions people ask, and I will put those questions into Sillito's taxonomy.
>
>
I will watch the video specifically to look for which questions people ask, and I will put those questions into Sillito's taxonomy. I will record the questions and the times they were asked.
 
Changed:
<
<
I will examine the Mylar Monitor logs "by hand" to see which techniques people use to answer the questions that arise. Both successful techniques -- those which lead to an answer to the question -- and unsuccessful techniques will be noted.
>
>
I will do a qualitative analysis of the logs and videotapes to see which techniques people perform after asking the questions in the service of answering those questions. I will analyze both successful techniques -- those which lead to an answer to the question.
 
Changed:
<
<
I will also note points where the participants are "stuck".
>
>
I will also note points where the participants are "stuck", and examine the logs in an attempt to discover usage patterns that indicate when someone is stuck.
 
Deleted:
<
<
I will also use data mining/machine learning techniques to find patterns that correlate with
  • speed
  • code quality
  • being "stuck"
 

Revision 222007-03-06 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 47 to 47
 
  • Jonathan Sillito's questions
  • Andrew Ko ISCE05, ICSE07-tbd, CHI07-tbd, other Ko papers
  • Emerson Murphy Hill tbd
Changed:
<
<
  • Analise Von Meyerhauser/Andrews tbd
>
>
  • Anneliese von Mayrhauser/Andrews tbd
 
  • Robert DeLine tbd
  • Gina Venolia (at least check out again) Note to Gail: Gina has done some interesting email stuff AND a friend at Microsoft insists that I should meet her because I would like her; I can easily imagine going down to Seattle sometime and having coffee with her.
  • (programmer variability studies: Sackman/Erikson/Grant, Dickey, Curtis, DeMarco and Lister)

Revision 212007-03-06 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 10 to 10
 I will do so via a user study that captures information about several assigned tasks:
  • the questions that developers ask about the code for that task
  • which IDE interactions they do to answer them
Deleted:
<
<
  • the source code that they add/modify/delete
 
  • the time that it takes to do the task
  • the quality of their code, as measured by the number of unit tests which pass
Changed:
<
<
From that, I will examine which techniques they use to answer which questions, and correlate the choice of technique with how fast and how well they complete the task.
>
>
From that, I will examine which techniques they use to answer which questions, and correlate the choice of technique with how fast and how well they complete the task
  • the source code that they add/modify/delete
 

Problems:

  • P1: We (software engineering researchers) do not know what different low-level techniques developers use when developing code using an IDE.
    • P1.1: We do not have a shared vocabulary for discussing different techniques that developers use when developing code with IDEs. (?)
Changed:
<
<
  • P2: We do not know which techniques are the most productive.
  • P3: We do not know how to teach/train developers how to be more productive.
  • P4: Coding video is expensive, annoying, and it is difficult to be consistent in coding (especially when one doesn't know what one is looking for)
>
>
  • P2: We do not know how to recognize when people are stuck.
  • P3: We do not know which techniques are the most productive.
  • P4: We do not know how to teach/train developers how to be more productive.
 

Givens:

Line: 31 to 32
 

Hypotheses:

  • H0.0: We can design a study that will force people ask (generally) the same questions.
  • H0.1: We can figure out what questions they are asking.
Changed:
<
<
  • H0.2: When guiding people to ask difficult questions, they will also by
>
>
  • H0.2: Developers will ask a lot of easier questions in the pursuit of answering more difficult questions. (This means that we don't have to put effort into guiding the coders to asking those easier questions.)
 
  • H1: There are a few (and only a few) different techniques for pulling answers to a given question out of a code base via an IDE.
  • H2: Which technique people use to pull information out of the source code via the IDE correlates with how fast they can at answering the question.
  • H3: Which technique people use to pull information out of the source code via the IDE correlates with the quality of their solution.
Line: 44 to 45
 
  • Robillard et al
  • BSD et al (unpublished)
  • Jonathan Sillito's questions
Changed:
<
<
  • Andrew Ko ISCE05
>
>
  • Andrew Ko ISCE05, ICSE07-tbd, CHI07-tbd, other Ko papers
  • Emerson Murphy Hill tbd
  • Analise Von Meyerhauser/Andrews tbd
  • Robert DeLine tbd
  • Gina Venolia (at least check out again) Note to Gail: Gina has done some interesting email stuff AND a friend at Microsoft insists that I should meet her because I would like her; I can easily imagine going down to Seattle sometime and having coffee with her.
 
  • (programmer variability studies: Sackman/Erikson/Grant, Dickey, Curtis, DeMarco and Lister)

Proposed data-gathering methods

Revision 202007-03-02 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Problem Statement

Changed:
<
<
I propose to study which techniques developers use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so via a user study that captures the questions that developers ask about the code and what IDE interactions they take to answer them.
>
>
I propose to study which techniques developers use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques.

I will do so via a user study that captures information about several assigned tasks:

  • the questions that developers ask about the code for that task
  • which IDE interactions they do to answer them
  • the source code that they add/modify/delete
  • the time that it takes to do the task
  • the quality of their code, as measured by the number of unit tests which pass
From that, I will examine which techniques they use to answer which questions, and correlate the choice of technique with how fast and how well they complete the task.
 

Problems:

Line: 15 to 23
 
  • P3: We do not know how to teach/train developers how to be more productive.
  • P4: Coding video is expensive, annoying, and it is difficult to be consistent in coding (especially when one doesn't know what one is looking for)
Changed:
<
<

Givens (right word?):

Robillard et al. showed:
  • G1: Different people use different techniques for locating relevant pieces of code.
  • G2: Charactaristic interaction patterns reflecting those techniques can be discovered by analyzing coded transcripts of video of users navigating code.
  • G3: Success at finding relevant pieces of code correlates with what technique(s) the developer uses.
>
>

Givens:

Jonathan Sillito showed:

  • G1: Questions that developers ask in the pursuit of code can be classified.
 

Hypotheses:

Changed:
<
<
  • H1: These characteristic interaction patterns can be discovered by analyzing interaction telemetry of navigation tasks.
  • H2: Software can recognize those patterns in navigation tasks.
  • H3: Data mining software can discover interesting interaction patterns in navigation tasks.
  • H4: Data mining software can discover interesting interaction patterns in more general code-development tasks.
  • H5: Success in coding tasks correlates with which interaction patterns the developer uses.
  • H6: Patterns are similar across tools.
  • H7: Patterns are similar across languages.
>
>
  • H0.0: We can design a study that will force people ask (generally) the same questions.
  • H0.1: We can figure out what questions they are asking.
  • H0.2: When guiding people to ask difficult questions, they will also by
  • H1: There are a few (and only a few) different techniques for pulling answers to a given question out of a code base via an IDE.
  • H2: Which technique people use to pull information out of the source code via the IDE correlates with how fast they can at answering the question.
  • H3: Which technique people use to pull information out of the source code via the IDE correlates with the quality of their solution.
  • H4: We can figure out when people get stuck by the patterns of their interactions with the IDE.
 
Deleted:
<
<
I plan to do H1, H2, and H3. I hope to also do H4.
 
Deleted:
<
<

Literature Review

 
Changed:
<
<
  • Robillard et al
>
>

Literature Review

 
  • Murphy/Kersten/Findlater
Added:
>
>
  • Robillard et al
 
  • BSD et al (unpublished)
  • Jonathan Sillito's questions
  • Andrew Ko ISCE05
Added:
>
>
  • (programmer variability studies: Sackman/Erikson/Grant, Dickey, Curtis, DeMarco and Lister)
 

Proposed data-gathering methods

Changed:
<
<
I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.
>
>
10-15 pairs of programmers will be videotaped using Eclipse to perform three to five different (small) programming assignments. Their interactions with Eclipse will be logged using the Mylar Monitor.
 
Changed:
<
<
For further work, I have access to
  • from the Mylar bugzilla: many individual compressed logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the context
  • from the glob: many individual logs of traces of a large number of developers working on unknown material, without the code available
>
>
The tasks will be designed to attempt to force the users to ask specific, relatively difficult types of questions. We expect that they will also ask a number of "easier" questions in the course of answering the more complex questions.
 
Changed:
<
<
We do not have data corresponding to the second part of Robillard et al's study, where a small number of professional programmers all do the same task. I might need to run a user study replicating that part.
>
>
Prior to the large study, I will have casual discussions with a number of my collaborators in the Software Practices Lab to assess likely candidates for tasks that will elicit specific questions.

I will run pilots of the study with collaborators in Software Practices and HCI. Three pilots will be talk-aloud done by individual collaborators (who are sophisticated enough to give good talk-aloud data). Two will be pair-programming. In all five cases, I will do a semi-structured interview with the participants to gauge the effectiveness of the study.

I will then run three pilots with people who are not familiar with this project and its goals.

 

Proposed analysis methods

Changed:
<
<
I will use three techniques to analyze the data:
  1. eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
  2. protein-motif finding algorithm -- I will use a modified protein motif-finding algorithm to search for common patterns, and judgement to select interesting ones.
  3. data-visualization and mining tools, e.g. YALE -- I will use data mining and visualization tools to search through the patterns.
  4. (potentially, tho hopefully not) write my own algorithm and/or modify an existing algorithm

Having found patterns, I will write code to

  • recognize those patterns
  • (optionally?) to generate reports with
    • what patterns were found
    • description of the pattern distribution compared with the broader population's distribution
    • ?proscriptive advice on more effective techniques for coding?
>
>
I will watch the video specifically to look for which questions people ask, and I will put those questions into Sillito's taxonomy.

I will examine the Mylar Monitor logs "by hand" to see which techniques people use to answer the questions that arise. Both successful techniques -- those which lead to an answer to the question -- and unsuccessful techniques will be noted.

I will also note points where the participants are "stuck".

I will also use data mining/machine learning techniques to find patterns that correlate with

  • speed
  • code quality
  • being "stuck"
 

Revision 192007-03-02 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 66 to 66
 
    • description of the pattern distribution compared with the broader population's distribution
    • ?proscriptive advice on more effective techniques for coding?
Added:
>
>

Revision 182007-03-02 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Problem Statement

Changed:
<
<
I propose to study which techniques use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so by replicating part or all of Robillard et al's work.
>
>
I propose to study which techniques developers use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so via a user study that captures the questions that developers ask about the code and what IDE interactions they take to answer them.
 

Problems:

Revision 172006-11-21 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 13 to 13
 
    • P1.1: We do not have a shared vocabulary for discussing different techniques that developers use when developing code with IDEs. (?)
  • P2: We do not know which techniques are the most productive.
  • P3: We do not know how to teach/train developers how to be more productive.
Added:
>
>
  • P4: Coding video is expensive, annoying, and it is difficult to be consistent in coding (especially when one doesn't know what one is looking for)
 

Givens (right word?):

Robillard et al. showed:
Line: 26 to 27
 
  • H3: Data mining software can discover interesting interaction patterns in navigation tasks.
  • H4: Data mining software can discover interesting interaction patterns in more general code-development tasks.
  • H5: Success in coding tasks correlates with which interaction patterns the developer uses.
Added:
>
>
  • H6: Patterns are similar across tools.
  • H7: Patterns are similar across languages.
 
Added:
>
>
I plan to do H1, H2, and H3. I hope to also do H4.
 

Literature Review

Deleted:
<
<
@@@ A presentation of the relevant literature and the theoretical framework.
 
  • Robillard et al
  • Murphy/Kersten/Findlater
  • BSD et al (unpublished)
Changed:
<
<
  • ?
>
>
  • Jonathan Sillito's questions
  • Andrew Ko ISCE05
 

Proposed data-gathering methods

I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.

For further work, I have access to

Changed:
<
<
  • many individual logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the code available
  • many individual logs of traces of a large number of developers working on unknown material, without the code available
>
>
  • from the Mylar bugzilla: many individual compressed logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the context
  • from the glob: many individual logs of traces of a large number of developers working on unknown material, without the code available
 
Changed:
<
<
We do not have data corresponding to the second part of Robillard et al's study, where professional programmers attempt to add a feature. I might need to run a user study replicating that part.
>
>
We do not have data corresponding to the second part of Robillard et al's study, where a small number of professional programmers all do the same task. I might need to run a user study replicating that part.
 

Proposed analysis methods

Line: 53 to 57
 
  1. eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
  2. protein-motif finding algorithm -- I will use a modified protein motif-finding algorithm to search for common patterns, and judgement to select interesting ones.
  3. data-visualization and mining tools, e.g. YALE -- I will use data mining and visualization tools to search through the patterns.
Added:
>
>
  1. (potentially, tho hopefully not) write my own algorithm and/or modify an existing algorithm
 
Changed:
<
<
Having found patterns, I will write code to recognize those patterns.

@@@ not sure what to put for what statistical tests I will use

>
>
Having found patterns, I will write code to
  • recognize those patterns
  • (optionally?) to generate reports with
    • what patterns were found
    • description of the pattern distribution compared with the broader population's distribution
    • ?proscriptive advice on more effective techniques for coding?
 

Revision 162006-11-20 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 37 to 37
 
  • ?

Proposed data-gathering methods

Deleted:
<
<
@@@ A description of the research design and instruments and data gathering methods.
  I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.
Line: 45 to 44
 
  • many individual logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the code available
  • many individual logs of traces of a large number of developers working on unknown material, without the code available

Changed:
<
<
We do not have data corresponding to the second part of Robillard et al's study, where professional programmers attempt to add a feature. I might need to run a study replicating that part.
>
>
We do not have data corresponding to the second part of Robillard et al's study, where professional programmers attempt to add a feature. I might need to run a user study replicating that part.
 

Proposed analysis methods

Deleted:
<
<
@@@ An outline of the plan for data analysis and the rationale for the level and method chosen, applicable statistical tests and computer programs.
  I will use three techniques to analyze the data:
  1. eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
Line: 59 to 56
  Having found patterns, I will write code to recognize those patterns.
Added:
>
>
@@@ not sure what to put for what statistical tests I will use

Revision 152006-11-20 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Problem Statement

Deleted:
<
<
@@@ A clear statement of the problem and the research question.

The differences in productivity between programmers is very high (cite @@@).

We want to investigate work practices of highly productive programmers and less-productive programmers. To do so, we will

  • Recruit test subjects from students in computer science classes where all students tackle the same assignments.
  • Have the students install logging software.
  • Log interactions that developers have with a Java integrated development environment called Eclipse.
  • Have the students submit the logs with the submitted assignments.
  • Have the instructor deliver the logs, the submissions, and the grade for the coding portion of the assignment.
  • Assign a score to each submission based on both mechanically-derived metrics (like how tangled@@@ the code is or how many unit tests it passed)and the grade.
  • Use data mining techniques to look for patterns in the data that correlate with the quality of the submissions.
 
Added:
>
>
I propose to study which techniques use when developing code, how that varies from person to person, and how success at programming tasks correlates with choice of techniques. I will do so by replicating part or all of Robillard et al's work.
 
Added:
>
>

Problems:

  • P1: We (software engineering researchers) do not know what different low-level techniques developers use when developing code using an IDE.
    • P1.1: We do not have a shared vocabulary for discussing different techniques that developers use when developing code with IDEs. (?)
  • P2: We do not know which techniques are the most productive.
  • P3: We do not know how to teach/train developers how to be more productive.

Givens (right word?):

Robillard et al. showed:
  • G1: Different people use different techniques for locating relevant pieces of code.
  • G2: Charactaristic interaction patterns reflecting those techniques can be discovered by analyzing coded transcripts of video of users navigating code.
  • G3: Success at finding relevant pieces of code correlates with what technique(s) the developer uses.

Hypotheses:

  • H1: These characteristic interaction patterns can be discovered by analyzing interaction telemetry of navigation tasks.
  • H2: Software can recognize those patterns in navigation tasks.
  • H3: Data mining software can discover interesting interaction patterns in navigation tasks.
  • H4: Data mining software can discover interesting interaction patterns in more general code-development tasks.
  • H5: Success in coding tasks correlates with which interaction patterns the developer uses.
 

Literature Review

@@@ A presentation of the relevant literature and the theoretical framework.
Added:
>
>
  • Robillard et al
  • Murphy/Kersten/Findlater
  • BSD et al (unpublished)
  • ?
 

Proposed data-gathering methods

@@@ A description of the research design and instruments and data gathering methods.
Changed:
<
<

Proposed analysis methods

@@@ An outline of the plan for data analysis and the rationale for the level and method chosen, applicable statistical tests and computer programs.
>
>
I will use data collected by BSD which contains a replication of the first part of Robillard et al's study, where professional programmers search for specific interesting methods in the code.
 
Changed:
<
<

Unsorted junk

Publishable papers

  • time spent vs. grade vs. metrics -- whole boatload of papers possible from that!

How evaluate

Follow-ons

  • early students vs. later students
  • students vs. professionals
  • single vs pair-programming
  • Eclipse vs other IDEs
  • Java vs other languages

Tools needed

  • Logging sw
  • visualization sw
    • something that replays the session
    • Mylog
  • data mining sw
  • something that checks that the trace is complete -- replays the session and makes sure that replaying the trace creates the handin
  • sw for doing acceptance tests on traces
  • some tool/mechanism for organizing/collecting all the user data

Need academic ref

Interesting references for me to chase down

  • Cross, E. The behavioral styles of computer programmers. in Proc 8th Annual SIGCPR Conference. 1970. Maryland, WA, USA.

  • Mayer, D.B. and A.W. Stalnaker. Selection and Evaluation of Computer Personnel – the Research History of SIG/CPR. in Proc 1968 23rd ACM National Conference,. 1968. Las Vegas, NV, USA.
>
>
For further work, I have access to
  • many individual logs of traces of a small number of developers either fixing one well-described bug or adding a well-defined feature, with the code available
  • many individual logs of traces of a large number of developers working on unknown material, without the code available
 
Changed:
<
<
  • Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan, Yifat Ben- David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz Wilusz. A multinational, multi-institutional study of assessment of programming skills of first-year CS students. In Working group reports from ITiCSE on Innovation and technology in computer science education, Canterbury, UK, 2001. ACM Press.
>
>
We do not have data corresponding to the second part of Robillard et al's study, where professional programmers attempt to add a feature. I might need to run a study replicating that part.

Proposed analysis methods

@@@ An outline of the plan for data analysis and the rationale for the level and method chosen, applicable statistical tests and computer programs.
 
Deleted:
<
<
  • B Adelson and E Soloway. The role of domain experience in software design. IEEE Transactions on Software Engineering, 11(November):1351–1360, 1985.
  • Jeffrey Bonar and Elliot Soloway. Uncovering principles of novice programming. In 10th ACM POPL, pages 10–13, 1983.
 
Changed:
<
<
and other references from This Camel Has Two Humps and Testing Programming Aptitude
>
>
I will use three techniques to analyze the data:
  1. eyeballs (better word?) -- I will examine the data visually, with filters as appropriate to change how the data is visualized
  2. protein-motif finding algorithm -- I will use a modified protein motif-finding algorithm to search for common patterns, and judgement to select interesting ones.
  3. data-visualization and mining tools, e.g. YALE -- I will use data mining and visualization tools to search through the patterns.
 
Changed:
<
<
follow-on to the camel
>
>
Having found patterns, I will write code to recognize those patterns.
 

Revision 142006-11-09 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 58 to 58
 
Added:
>
>
  • V.R. Basili, R.W. Selby, and D.H. Hutchens, “Experimentation in Software Engineering,” IEEE Trans. Software Eng., vol. 12, no. 7, pp. 733-743, July 1986. Martin says it's a good foundational doc.
 

Follow-ons

  • early students vs. later students

Revision 132006-11-08 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 32 to 32
 

Unsorted junk

Added:
>
>
 

Publishable papers

  • time spent vs. grade vs. metrics -- whole boatload of papers possible from that!

How evaluate

Changed:
<
<
  • Grade
>
>
  • Grades
 

Revision 122006-11-08 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 49 to 50
 
    • Table 2 lists the metrics evaluated in the study, including a short description and a reference to the definition of the metric. All of the metrics are proposed by Chidamber and Kimerer [CK94] or by Lorenz & Kidd [LK94] (?) However, we rule out some of the proposed metrics because they received serious critique in the literature (LCOM and RFC [CK94]), because the definition isn’t clear (MCX, CCO, CCP, CRE [LK94]; LCOM [CK94, EDL98]), because the lack of static typing in Smalltalk prohibits the computation of the metric (CBO [CK94]), because the metric is too similar with another metric included in the list (NIM, NCM and PIM in [LK94] resemble WMC-NOM in [CK94]), or simply because the metric is deemed inappropriate (NAC, SIX, MUI, FFU, FOC, CLM, PCM, PRC [LK94])

Added:
>
>
 

Follow-ons

  • early students vs. later students

Revision 112006-11-08 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 48 to 48
 
    • taxonomy for programming style prolly not useful
    • Table 2 lists the metrics evaluated in the study, including a short description and a reference to the definition of the metric. All of the metrics are proposed by Chidamber and Kimerer [CK94] or by Lorenz & Kidd [LK94] (?) However, we rule out some of the proposed metrics because they received serious critique in the literature (LCOM and RFC [CK94]), because the definition isn’t clear (MCX, CCO, CCP, CRE [LK94]; LCOM [CK94, EDL98]), because the lack of static typing in Smalltalk prohibits the computation of the metric (CBO [CK94]), because the metric is too similar with another metric included in the list (NIM, NCM and PIM in [LK94] resemble WMC-NOM in [CK94]), or simply because the metric is deemed inappropriate (NAC, SIX, MUI, FFU, FOC, CLM, PCM, PRC [LK94])
Added:
>
>
 

Follow-ons

  • early students vs. later students
  • students vs. professionals

Revision 102006-11-06 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 46 to 46
 
Added:
>
>
    • Table 2 lists the metrics evaluated in the study, including a short description and a reference to the definition of the metric. All of the metrics are proposed by Chidamber and Kimerer [CK94] or by Lorenz & Kidd [LK94] (?) However, we rule out some of the proposed metrics because they received serious critique in the literature (LCOM and RFC [CK94]), because the definition isn’t clear (MCX, CCO, CCP, CRE [LK94]; LCOM [CK94, EDL98]), because the lack of static typing in Smalltalk prohibits the computation of the metric (CBO [CK94]), because the metric is too similar with another metric included in the list (NIM, NCM and PIM in [LK94] resemble WMC-NOM in [CK94]), or simply because the metric is deemed inappropriate (NAC, SIX, MUI, FFU, FOC, CLM, PCM, PRC [LK94])
 
Added:
>
>

Follow-ons

  • early students vs. later students
  • students vs. professionals
  • single vs pair-programming
  • Eclipse vs other IDEs
  • Java vs other languages
 

Tools needed

  • Logging sw

Revision 82006-11-04 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 42 to 42
 
Added:
>
>
 

Tools needed

  • Logging sw

Revision 72006-11-03 - TWikiGuest

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Added:
>
>
 

Problem Statement

@@@ A clear statement of the problem and the research question.
Line: 31 to 32
 

Unsorted junk

Added:
>
>

Publishable papers

  • time spent vs. grade vs. metrics -- whole boatload of papers possible from that!

How evaluate

 

Tools needed

  • Logging sw
  • visualization sw

Revision 62006-11-03 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Revision 52006-11-02 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Problem Statement

@@@ A clear statement of the problem and the research question.
Added:
>
>
The differences in productivity between programmers is very high (cite @@@).

We want to investigate work practices of highly productive programmers and less-productive programmers. To do so, we will

  • Recruit test subjects from students in computer science classes where all students tackle the same assignments.
  • Have the students install logging software.
  • Log interactions that developers have with a Java integrated development environment called Eclipse.
  • Have the students submit the logs with the submitted assignments.
  • Have the instructor deliver the logs, the submissions, and the grade for the coding portion of the assignment.
  • Assign a score to each submission based on both mechanically-derived metrics (like how tangled@@@ the code is or how many unit tests it passed)and the grade.
  • Use data mining techniques to look for patterns in the data that correlate with the quality of the submissions.

 

Literature Review

@@@ A presentation of the relevant literature and the theoretical framework.
Line: 24 to 38
 
    • Mylog
  • data mining sw
  • something that checks that the trace is complete -- replays the session and makes sure that replaying the trace creates the handin
Added:
>
>
  • sw for doing acceptance tests on traces
  • some tool/mechanism for organizing/collecting all the user data
 

Need academic ref

Revision 42006-11-02 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Line: 17 to 17
 

Unsorted junk

Added:
>
>

Tools needed

  • Logging sw
  • visualization sw
    • something that replays the session
    • Mylog
  • data mining sw
  • something that checks that the trace is complete -- replays the session and makes sure that replaying the trace creates the handin
 

Need academic ref

Revision 32006-11-02 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Added:
>
>

Problem Statement

@@@ A clear statement of the problem and the research question.

Literature Review

@@@ A presentation of the relevant literature and the theoretical framework.

Proposed data-gathering methods

@@@ A description of the research design and instruments and data gathering methods.

Proposed analysis methods

@@@ An outline of the plan for data analysis and the rationale for the level and method chosen, applicable statistical tests and computer programs.


Unsorted junk

 

Need academic ref

Revision 22006-10-31 - DuckySherwood

Line: 1 to 1
 
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Changed:
<
<

Interesting references to chase

>
>

Need academic ref

Interesting references for me to chase down

 
  • Cross, E. The behavioral styles of computer programmers. in Proc 8th Annual SIGCPR Conference. 1970. Maryland, WA, USA.

  • Mayer, D.B. and A.W. Stalnaker. Selection and Evaluation of Computer Personnel – the Research History of SIG/CPR. in Proc 1968 23rd ACM National Conference,. 1968. Las Vegas, NV, USA.
Line: 10 to 14
 
  • Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan, Yifat Ben- David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz Wilusz. A multinational, multi-institutional study of assessment of programming skills of first-year CS students. In Working group reports from ITiCSE on Innovation and technology in computer science education, Canterbury, UK, 2001. ACM Press.

  • B Adelson and E Soloway. The role of domain experience in software design. IEEE Transactions on Software Engineering, 11(November):1351–1360, 1985.
Changed:
<
<
  • Jeffrey Bonar and Elliot Soloway. Uncovering principles of novice programming. In 10th ACM
POPL, pages 10–13, 1983.
>
>
  • Jeffrey Bonar and Elliot Soloway. Uncovering principles of novice programming. In 10th ACM POPL, pages 10–13, 1983.
  and other references from This Camel Has Two Humps and Testing Programming Aptitude

Revision 12006-10-31 - DuckySherwood

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DuckyHomework"

Ducky Thesis Proposal Notes

Interesting references to chase

  • Cross, E. The behavioral styles of computer programmers. in Proc 8th Annual SIGCPR Conference. 1970. Maryland, WA, USA.

  • Mayer, D.B. and A.W. Stalnaker. Selection and Evaluation of Computer Personnel – the Research History of SIG/CPR. in Proc 1968 23rd ACM National Conference,. 1968. Las Vegas, NV, USA.

  • Michael McCracken, Vicki Almstrum, Danny Diaz, Mark Guzdial, Dianne Hagan, Yifat Ben- David Kolikant, Cary Laxer, Lynda Thomas, Ian Utting, and Tadeusz Wilusz. A multinational, multi-institutional study of assessment of programming skills of first-year CS students. In Working group reports from ITiCSE on Innovation and technology in computer science education, Canterbury, UK, 2001. ACM Press.

  • B Adelson and E Soloway. The role of domain experience in software design. IEEE Transactions on Software Engineering, 11(November):1351–1360, 1985.
  • Jeffrey Bonar and Elliot Soloway. Uncovering principles of novice programming. In 10th ACM
POPL, pages 10–13, 1983.

and other references from This Camel Has Two Humps and Testing Programming Aptitude

follow-on to the camel

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback