Sara's Review
Problem
- Navigation and understanding of the source code for performing a modification task is difficult, especially when the system is complicated and the developer is unfamiliar with the system
- Methodologies are needed to help the developer find the parts of the code that are related to the concept that is to be modified
Contributions
- The paper offers a methodology for helping the developer navigate the parts of the code that are related to a concept, based on the topology of structural dependencies in a program
- Their algorithm takes as input a fuzzy set describing methods or fields of interest to a developer, and produces a fuzzy set containing methods and files that are of potential interest. For calculating the degree of interest, they use two characteristics of dependency, specificity and reinforcement. An element that is related to few other elements is more specific; and an element that is related to more elements of interest is reinforced.
- The offered methodology also works for incomplete code or code that cannot be executed
- The results of evaluation through two case studies seem to be interesting. The resulting suggestion sets refer to elements that are useful for performing modification or understanding the part of the system related to the specified concepts
Weaknesses
- In section 5.3, in order to show the stability of the order of elements, they only track the element with the highest degree in the suggestion set. I wonder if this result is acceptable for all elements, because the first element could be somehow distinguished from others in some occasions (it could have a very different degree of interest)
- I found Section 3.3 a bit difficult to understand, and one reason could be the notations that are used. For example in equation (2) for calculating the degree of interest, x which is the notation for an element is summed up with the size of a set in the numerator.
Questions
- I wished to see at least an estimation of the recall besides precision; however as they say, this algorithm is created to help the developers navigate through the code. But does this algorithm definitely return the most important elements that are related to the set of interest? Isnt it a possibility that they will be filtered? Are they necessarily specific and reinforced? (or had better only say are they specific, as the number of elements in the input set is small in these examples so being specific has a greater influence in finding the degree of interest)
- Is finding the initial set of interest always as easy as shown in the case studies? It seems that the quality of the suggestion set is highly dependent on how well the initial set is chosen.
Brian's Review
Summary
This paper describes an algorithm that, provided a set of program elements (the set of interest), uses two heuristics to suggest relevant program elements for investigation. Specificity emphasizes elements that are only related to the elements of interest. Reinforcement emphasizes elements that are related to the elements of interest. These two heuristics are able to use any type of program relation, called relatedness; the paper's investigations are limited to investigations using calls and accesses relations.
Preliminary empirical validation shows the approach has promise. Experiments performed on two small codebases show that, when the set of interest is a single element, suggestion sizes are reasonable size. There has been some investigation into useful bounds on the single tuning parameter, alpha. Two small cases studies provide some confirmation of the utility of the results in development.
Contributions
- a novel algorithm, independent of any type program relations, and able to combine the different types to produce an estimate of relatedness
- a simple implementation of the algorithm, able to be used on real programs
- three study designs for validation of the algorithm, able to be reused for other feature location algorithms
Weaknesses
- no analysis on why relevant elements were either not identified or given low marks by the algorithm (Sections 5.2, 6.1) -- are there improvements to the heuristics necessary?
- Section 5.3 is confusing as written. I believe it means stability when using static dependencies vs CHA, but I'm uncertain: this would only seem to matter if the relation types were being changed dynamically. Is this likely? It wouldn't seem to be given the tool implementation.
Questions
- Would incorporating further relation types likely to improve the results? Should there be a per-relation balancing constant?
- Why was it necessary to redefine the fuzzy union operation for the merge operation? (p 3/4; used in Fig 2, line 7)
Trevor's Review
Summary
This paper presents a technique (and tool) to automatically suggest elements of potential interest to a developer involved in a program investigation task. The inputs are a set of interesting (source code) elements that the developer chooses, and the output is a larger set of elements that are related to the initial set of interest. These recommendations are based on structural dependencies in the existing code base. A qualitative analysis of the results of two case studies is presented and discussed.
Contributions
Great first step in terms of quantitative analysis. Seems to come up with some novel ways of calculating useful statistics to base recommendations on.
I like the idea of using 'fuzzy' sets. Particularly in this area, where you can never really be certain what the ideal solution set should be, if one even exists. Using the term 'fuzzy' drives home the point that finding the solution to this problem is not an 'exact science', but the goal is really to aid the developer in going down the right path.
I'm not an expert in this field, but it seems like a fairly unique algorithm, backed by sound mathematical equations.
Nice job of justifying all the (math) groundwork with detailed explanations and reasoning.
Weaknesses
In the qualititative analysis section, I would have liked to know more about how this technique compares to some of the related work. It's not clear whether the evaluation technique is similar to techniques used in related work, but if so it would be nice to say 'technique X discovered this many related elements, compared to our technique'.
I thought the explanation of the 'direct and transpose (inverse) relations' (sec 3, last paragraph) was a bit confusing with the examples. It seems like a simple plain english explanation, and the the examples lost me a bit (made me think I was missing something) just because there were lot's of element 'A', 'B', '1', '2' etc. Had to read it twice going back/forth to the diagram to make sure I understood it.
I didn't quite understand the effect that the 'alpha' parameter had on the analysis. I think a reminder of what this param was for throughout the document would have also helped, as it's mentioned extensively in the analysis section, but I had to continually go back and remind myself what this parameter was for, and how it changed things.
I think the example could have been explained in better context. (ie. descibe it as a real scenario such as a change or modification task for
JHotDraw). Simply choosing a method and field makes it hard to picture how the developer might have chosen these in a real situation.
It's really hard to take away any concrete conclusions about the evaluation and the effectiveness of this technique since there is so much variance in the implementation of large systems (as well as developers work habits and techniques). The qualitative analysis is a great first step, but it seems like solutions to this problem mainly rely on numerous user studies and testing in real situations(?)
Questions
In a typical scenario, how are the initial 'sets of interest' determined? Ie. How do you foresee developers choosing these sets? In the paper, the researcher always selected these sets, based on classes that were modified many times in the repository. But there are many situations where change tasks involve code that may not have been modified before.
Is the evaluation method used here similar to what other researchers (from related work) use? It seems that you would want to evaluate this by observing users with this tool while performing various change tasks. Is this future work? Or is there a reason why that was not done?
Does this algorithm rely (or work better) on systems that are 'well designed' and follow standard OO practices? Would it work just as well on a poorly written system?
What are the major 'architectural' limitations of using this technique? For instance, client/server systems that try to de-couple components via middleware won't always reveal direct call relationships (ie. lookup for distributed components is done by naming like JNDI) How about design patterns where direct calls are also abstracted or de-coupled?
What other relations do you think would be useful to investigate (besides the calls relation)?