Sept262005Reviews < SPL

Ko AJ, Myers BA (2004). Designing the Whyline: a debugging interface for asking questions about program behavior

Brian's Review

PROBLEM ADDRESSED

In previous studies, the authors found that developers generally asked questions along the line of {why did {something} happen} (an unexpected runtime action) and {why didn't {something} happen} (assumes the absence of an expected runtime action). These types of question have three possible answers:

false propositions (or false assumptions): E.g., an action did occur, but no effect was seen, and so assumed not to have occurred;
invariants: the action always / never happens;
data and flow control: a chain of actions led to the effect.

Their previous work, involving developers using Alice, found that 50% of the errors in debugging were due to false assumptions in the developers' hypotheses. The se unchecked assumptions held by developers lead to longer times for investigating failures. They also identified three trends in the questions: (i) 68% of the questions were why-didn't questions, as opposed to why-did questions; (ii) programmers only asked why-didn't questions about code that plausibly might have executed; (iii) 85% of questions were about a single object, with the remaining about interactions.

The authors claim that none of the current debugging tools (directly) support developers in this hypothesis-making and evaluation form of reasoning about runtime actions. The authors propose a debugging approach they call Interrogative Debugging (ID) that supports the explicit phrasing and answering of these questions. The paper describes the Whyline, an ID addition to the Alice programming environment.

APPROACH

The authors enhance Alice to add a new button entitled `Why?'. During debugging, this button provides the ability to query about the objects in view and ask canned queries as to why some effect occurred -- or didn't occur. These effects are determined by performing static and dynamic analysis around the current expression, so as to identify the possible objects and determine what code has happened and what code is unlikely to occur. The Whyline tracks changes to output (e.g., animation actions and changes to visible properties) to generate the why-did menu. Data flow graphs (DFGs) are used to create the why-didn't questions; these are coupled with dynamic slices to identify invariants. Slices are used to build condition trees leading to effects. A timeline allows the users to rewind to investigate past effects.

Also described are some complementary features in the UI to draw the users attention to relevant elements (such as code fragments) to expose otherwise-hidden dependencies.

The authors also report on some of the tuning measures necessary to support the users. Interestingly, they use the latest execution of a statement when answering questions, even if the time is rewinded to a previous execution, as this better matched what the user appeared to expect.

EVALUATION

To evaluate, the authors undertook a comparative user study, comparing 9 developers using the Alice system on a preset task either with (5) or without (4) the Whyline. Although all were HCI students, they had a varied background. The authors wished to assess: whether the Whyline was useful, whether developers were able to determine answers in less time, and whether the developers would complete more tasks.

It appears the developers may have been interrupted during the experiment for questioning. Developers using the Whyline completed more tasks, and, for identical scenarios, were able to arrive to the solution much more quickly than the developers without the Whyline. Although the paper proclaims that developers were able to accomplish 40% more tasks, the numbers appear to say that Whyline developers were able to accomplish one additional task.

The Whyline was enhanced over the course of the With group; thus the With results should be a lower bound.

CONTRIBUTIONS

A novel solution for aiding developers in debugging.
A quantitative and qualitative evaluation of the types of questions posed.
An evaluation of trade-offs in the user interface

WEAKNESSES

is the experiment credible?
does this claim seem believable: "By restricting the programmer's ability to make assumptions about what did and did not happen, we enabled them to observe and explore the runtime actions that most likely caused failures." (p157)

QUESTIONS/COMMENTS

how much of this is feasible because of the domain (of manipulating 3D objects)?
does this scale? What if there are many different variables? Are they scoped by the frame? Are there features of industrial-strength languages that will lead to this being more difficult -- and are these features necessary?
is it necessary to have explicit (direct) support for interrogative debugging?

Sara's Review

PROBLEM ADDRESSED

Debugging is a common and costly activity in programming
There are no enough tools supporting this activity. Programmers still resort to breakpoints, code-stepping and print statements

CONTRIBUTIONS

The paper presents a new debugging paradigm named Interrogative Debugging, where programmers can ask why did and why didnt questions about program behavior and failures
A debugging interface named Whyline (Workspace that Helps You Link Instructions to Numbers and Events) is introduced which is prototyped in Alice, an event-based language for creating interactive 3D worlds. While execution, there are menus and sub-menus helping the programmer to choose the right question about program behavior, such as Why didnt -> Pac -> resize 0.5? As an answer to these questions, the part of the code that caused this event is being highlighted. The two case studies, one with and one without using Whyline, show that this debugging tool can reduce the debugging time by nearly a factor of 8.
Nice job of visualizing runtime execution using a graph

WEAKNESSES

The examples shown in the paper are somehow simple. I wished to see what happens in a more complicated project; I think the question menus and the execution graph would be more complicated and navigation of the questions might not be easy using the menus
I was a bit confused by the timing issue in the execution graph; it is not clear to me how far it goes back in time. If there are a series of events that have caused this new event, how many of them are being shown here?

QUESTIONS/COMMENTS

It is not clear to me how they produce some of the questions. For example, for why didnt Pac resize 0.5, the case is easy because there is a resize method which has not been called. But in Figure 3, how have they produced Why didnt Pac pointOfView change to something else?
Could this paradigm be used for all types of applications, such as middleware, where the output is not as obvious as the Alice environment?
I think the usefulness of this tool greatly depends on the quality of the question menus it provides; but the general idea is interesting

Ed's Review

PROBLEM ADDRESSED

Debugging always begins with a question, and to use existing tools programmers must struggle to map strategies for answering their question to the tools' limited capabilities.
The paper introduces a debugging paradigm called Interrogative Debugging which allows programmers to ask "why did" and "why didn't" questions regarding an object's behavior at runtime. An ID tool called WHYLINE is introduced, along with the results of a user study performed on a group of M.Sc. Students using WHYLINE to perform some simple debugging tasks.

CONTRIBUTIONS

WHYLINE: a prototype created to illustrate the concepts fo interrogative debugging. Built on Alice, a simple event-based language for creating 3D worlds.
The WHYLINE visualization was highly effective at linking and filtering information shown in multiple views in the workspace – addressing both the “hidden dependencies” and “visibility” design constraints.
There was a great deal of qualitative and quantitative user feedback explored throughout the paper. This was very helpful and effective in backing some of their claims and design choices.

WEAKNESSES

WHYLINE is a very domain-specific tool. It is unclear how useful ID would be outside of Alice.
Kind of a follow up: WHYLINE shows ID in a nice light because the user is able to query directly on the buggy object. Since this isn't the case in most real-world examples, it appears that a better secondary notation is needed to keep track of an exploration history. (something better than a “Questions I've Asked” button)
The iterative design process was unclear. Were the developers changing the implementation of WHYLINE in between each user?

QUESTIONS

The authors observations confirmed that asking questions in terms of program output rather than code or runtime actions made it easier to map questions to related code. How does this carry over to non-graphics programs?
do we consider WHYLINE to be a crosscutting-effective view?
What does it mean to 'scrub' the execution history?

Raw edit | More topic actions

Topic revision: r3 - 2010-09-14 - NimaKaviani