Difference: Sept222010Review (2 vs. 3)

Revision 32010-09-21 - robertob

Line: 1 to 1
 Kim, S. and Ernst, M. D. 2007. Which warnings should I fix first?. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (Dubrovnik, Croatia, September 03 - 07, 2007). ESEC-FSE '07. ACM, New York, NY, 45-54. DOI= http://doi.acm.org/10.1145/1287624.1287633

Roberto's Review

Added:
>
>
The paper deals with some drawbacks of static analysis tools: the high rate of false positives (warnings that do not correspond to bugs) and the ineffectiveness of warning prioritization as it is implemented in these tools.

The presented solution is a history-based warning prioritization (HWP) technique. The basic idea behind the technique is mining the software changes and the warning removal history as a means to prioritize warning categories. Warning instances are taken as bug predictors. To do such, the prioritization algorithm increases the weight of the warning category that has removed warning instances associated with software changes. When a change fixes a bug, the weight increases even more than in a regular change.

The main contributions of the paper are the following. It shows the limitations of tool warning prioritization with false positive rates extracted from software history data. It derives a warning prioritization algorithm inspired in machine learning techniques that uses fix-changes and warning removal history as input. Different from previous work, the algorithm is generic, i.e., it is applicable to any warning category. The algorithm operates at a fine level of granularity, identifying true and false positives at the level of line of code.

On the other hand, although it is interesting to derive false positive rates from history data, it does not replace the expert as the authority to state which warnings are really bugs and which are not. This way, the evaluation seems a bit biased, since a technique based on software history is evaluated only with software history. Another issue of the paper is measuring the false positive rate only at revision n/2. Although it takes into account all the following revisions to mark buggy lines of code, it might be more appropriate if other revisions were taken into account. Finally, the choice of alpha, the parameter of the prioritization training algorithm, was somewhat arbitrary.

I believe this paper is interesting for showing that software history data might be used as an important factor to prioritize bug warnings. It shows the limitations of tool priorities and it opens a research line that might explore software history as a means of improving warning priorities in static analysis tools. But I also think some other data sources, such as source code complexity, might also be used to infer the priority of such warnings.

Some questions that might be interesting to discuss would be:

  • Which factors would play a role in determining the priority of warnings in static analysis tools?
  • Is the indirect measure of true and false positive warning rates based on software history appropriate?
  • How would you design an experiment to use software history to measure false positive rates of warning messages?
  • Do you think the training algorithm provides a good heuristic to change warning priority weights? How would you do it differently?
 

Alex's Review

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback