June292005Reviews < SPL

Shawn's Review

Problem Addressed

This paper address the problem of bugs occuring in a GUI application. Since it is hard for people to remember what bugs exist 1 system so that they can work around them, working with multiple systems makes this almost impossible. Even though bugs are reported to the software provider, users will not go and memorize every bug that exists in the version of the software that they are using so as not to put the application in a corrupted or failure state.

To assist with this, the authors introduce a tool called Stabilizer. This tool watches command history as well as program context so that when a bug occurs (user manually says it exists) it is able to save this information for later analysis. When a user attempts to perform a command that may have caused a bug before, they are prompted to continue or not, assisting the user in avoiding a potential bug in the system.

Contributions

Simple way to help users avoid potential bugs
Screen shots before and after the execution helps explain the bug
If a bug is potential, the system can be rolled-back to a stable state before the command execution

Weaknesses

Study did not involve users to see how they would react to the system
Users may have a tendency to report too many bugs
The idea of a 'not bug' is confusing
The Stabilizer can not detect nondeterministic bugs
Initial training after a bug is reported to narrow down where the bug exists is tedious

Questions

What is the scalability to a large system? Will getting the bug information as well as querying the it severely slow the system down?
Since a check is performed on every method call, will there be a significant performance loss?
Will users properly submit 'bug' and 'not bug' reports?
Since the command history kept is just a set, can this system detect bugs when a command needs to be repeated multiple times? (memory leak)

Andrew's Review

Problem

Bugs can range from merely annoying to the cause of a serious loss of data. It is often easier to identify the actions that caused the bug than it is to fix it.

The authors introduce a technique whereby a user of a GUI application can avoid known bugs in the user interface. Furthermore, a user can submit a bug to a server, adding it to the database of known bugs. A prototype tool is described that uses machine learning techniques to determine whether a user's action is likely to result in one of the known bugs. The user is given the option of proceeding with or aborting the action.

The technique is validated using an automated version of the tool on a set of programs with bugs inserted into them.

Contributions

Interesting idea that tackles a real problem and a good start at a solution.
Working implementation of the technique and good motivating example
Good experimental design (but more work needs to be done to see if this technique could work with actual users). Research questions are well-formed and answered by the experiment. However, the experiment would be more convincing if larger programs were used.

Weaknesses

It seems that only certain types of bugs can be avoided using this technique. The authors are never explicit as to which kinds of bugs can be avoided and which kinds cannot. For example, can bugs that arise due to events not generated by the user be avoided?
There is a lot of calculation running behind the scenes (to create the histories and to calculate distances at each event). There is no talk as to whether this would cause a noticeable slowdown.
There is no discussion of what types of actions can be aborted and whether or not that would cause system instability
The description of the technique provided too much detail on the distance measurement (could have been put in an appendix) and not enough characterizing the situations and types of bugs it can be used for.
Unclear as to exactly how many tests were run during the experiment.

Questions

What would it take to make a tool like this useable and non-intrusive for a non-technical person?
What kinds of bugs are most easily avoided? What kinds cannot be avoided?
How come all the 'whiskers' in Figures 4-7 span the entire length of the graph?
Who would most benefit from and be most likely to use this tool (eg- end users, beta testers, in-house testers)?

John's Review

Problem

All software contains some form of bug. These range from bugs that crash the program, to bugs that cause incorrect behaviour, to bugs that cause unexpected behaviour. These bugs may exist for a long time in a system before they get fixed, and in the mean time the software will still be used. Users of buggy software often develop ‘workaround’ solutions to these bugs so that they can continue with their work. However, these workaround solutions can easily be forgotten, are rarely communicated the user group as a whole, and may not accurately reflect to true cause of the problem.

Contributions

Proposal of a tool for GUI applications that:
Can be trained by a user to recognize when buggy behaviour is about to happen
Warn the user when buggy behaviour is about to happen
Communicate buggy behaviour to other users so that they are warned about the behaviour.
Can provide a developer with information that causes a bug so that they can reproduce the bug.
A distance metric for determining the ‘closeness’ of bugs based on the events and/or code executed leading to the behaviour.
Analytical evaluation of the technique showing the effect of changing the size of the collected history and/or the use of code history.

Weaknesses

The approach makes some strong assumptions:
- The user can avoid the bug behaviour altogether and still complete their task
- Users can identify truly buggy behaviour. (They punted on this one)
- Users are willing to fill in bug reports.
- Users will fill out bug reports immediately when the bug occurs.
The approach appears limited to VM-style execution environment where the system can intercept callbacks.
The lack of user testing significantly weakens their argument of the usefulness of this technique.

Questions

What happens when bugs are fixed?
How is intuitive behaviour a bug?
How can one determine “users with similar bug reporting history”?
Would not taking a screenshot every ‘delta’ seconds cause significant performance problems with the user’s application?
Why would a user want to specify the value of k for the k nearest-neighbour algorithm?
Can the ‘always warn’ and ‘never warn’ flags be turned off when there is enough evidence for the system to make a good decision?
How is forking an experimental child process going to help in detecting GUI bugs when (from the examples given) the user is required to make a decision about whether or not there is an actual bug?

Belief

I am skeptical that this tool would be used in practice. It seems to me that the user would be interrupted too frequently during the initial use of the tool while the system collects enough evidence for each bug to be narrow down the specific cause. I also think that they are making too big of an assumption about a user’s motivation to report bugs and do so in a timely fashion.

Raw edit | More topic actions

Topic revision: r2 - 2005-10-25 - JohnAnvik