Bug Triage
General Overview
Most open source software developments incorporate an open bug repository that allows
both developers and users to post problems encountered with the software, suggest possible enhancements, and comment upon existing bug reports.
One potential advantage of an open bug repository is that it may allow more bugs to be identified and solved, improving the quality of the software produced.
However, this potential advantage also comes with a significant cost. Each bug that is reported must
be
triaged to determine if it describes a meaningful new problem or enhancement,
and if it does, it must be assigned to an appropriate developer for further handling.
Consider the case of the
Eclipse Platform
open source project over a four month period (January 1, 2005 to April 30, 2005) when 3426 reports were filed, averaging 29 reports per day. Assuming that a triager
takes approximately five minutes to read and handle each report, two person-hours per day is
being spent on this activity. If all of these reports led to improvements in
the code, this might be an acceptable cost to the project. However, since many of the reports
are duplicates of existing reports or are not valid reports, much of this work does not improve the product.
For instance, of the 3426 reports for Eclipse, 1190 (36\%) were marked either
as invalid, a duplicate, a bug that could not be replicated, or one that will not be fixed.
We seek to improve the bug triage process by understanding the triage process and creating tools that support triagers in their decisions about bug reports.
The various subprojects that we are working on are:
Our work as been published in a variety of venues.
Bug Report Assignment
Overview
to be written - John
Duplicate Detection
Overview
to be written - Lyndon
Publications