We would like to thank the reviewers for their thorough reviews, and to the meta-reviewer for his/her summary. The structure of this rebuttal will follow the ordering of main points of criticism in the meta-reviewer's summary. While we do not have the opportunity to address all points made by individual reviewers in great detail, we include a bullet list of planned revisions for the submission at the end of this rebuttal. Before addressing the points of criticism, we briefly highlight reviewers' positive responses to the paper. Our motivation for conducting this research was viewed as important, that we have identified a "clear need for this line of research" [R2], especially as unobserved medical assessment moves into the home. Our study methodology and results were seen to be interesting, and present ideas for future studies. Finally, our design implications for detecting and mitigating interruptions were also seen as having value. Generalizability & Related Work: It is clear that our experimental methodology was centred around the C-TOC tool. We maintain that our focus was not on "whether C-TOC works" [R1] (the focus of ongoing parallel studies). Our motivation was primarily to understand the impact of interruptions on unsupervised computerized test taking. The tasks we selected were unique in that they are open-ended, having several correct responses, allowing for partially-correct solutions. This is opposed to closed tasks with binary correct/incorrect responses (i.e. see methodologies in [4,10]). Therefore we required a novel methodology to coordinate primary and interrupting tasks, and to capture the range of responses in these tasks. Methodology: Regarding the cost of interruption (COI), defining the COI was not the contribution we intended to make. The lack of consensus in the literature on its definition is a parallel problem that emerged during the design of our study, obviously a open research problem for the CHI community. Regarding R1's comments regarding the coarse level of our measurements, the COI has historically been measured as coarsely as failure to resume a task at all (a binary response, [20]). We had three measurements at different scales, wherein task completion time is coarser than task resumption lag time. Accuracy was also a fine measurement (we accounted for partially correct solutions; a binary correct/incorrect response is the coarser alternative). We would be grateful for reviewers' recommendations on finer measurements of the COI, as we will continue to measure the COI in future planned studies. We also measured other variables (participant sex and age, time of day, which experimenter was present, and location: university clinic or community centre), none of which interacted significantly with the results. We admit the possibility that individuals, when they are alone, deal quite differently with interruptions. This must be taken into account when considering the generalizability of our findings, and will be the focus of future studies. Regarding the 2s interruption lag, its purpose was to add external validity, reflecting the observation that switching to an interrupting task in the real world is seldom instantaneous. Usually an individual has some warning of an imminent interruption; this was also the basis for the other works we cited [1,16,26]. Participants: Regarding why individuals with cognitive impairment were excluded from participation, we emphasize that this study is the first of several ongoing and planned studies to examine interruptions and older adults' C-TOC performance. The effect of interruptions needs to be gauged first when cognition is not impaired. We are currently designing a study which will repeat this methodology with different clinical groups. Regarding the choice of three age groups, we sought a finer distinction than two groups (young/old). [6] highlights normal but pronounced cognitive changes occurring at 55 and 70. Alternatively, a single age cutoff, for instance at the average age of retirement (~65), does not correspond to marked changes in cognition. In reponse to R2' comment, we recruited an equal number of participants in each age group for the purposes of a controlled mixed-factor experimental design, as opposed to a regression-based classification of participants. When we plotted our dependent measures vs. age of participant, we verified that OLD and YOUNG participants form distinct clusters for the measures we reported. PRE-OLD adults did not form a cohesive cluster, as reflected in our results. Regarding education levels of our participants, this was collected during the MOCA screening test, and indirectly by the NAART. All participants were familiar with the use a mouse (C-TOC does not require more than this). Additionally, we asked participants about the clarity of our instructions in a semi-structured interview at the end of the session. In one case, a participant's data was excluded due to their admitted misunderstanding of the instructions. Empirical Results: regarding why our results surprised us: when we considered both the cognitive aging and interruptions literature, we expected an interaction, a compounding effect of age and interruption demand. The fact that this was not universally found surprised us. Additional Comments & Planned Revisions: - Reviewers requested a figure to clarify coordination of primary and interrupting tasks. This figure appears in the 1st author's thesis: http://goo.gl/W3bAJ . We considered including this figure however this would cut the paper's content significantly. We seek reviewers' recommendations, both on whether the figure is useful and what to cut if they recommend what we include it in the submission. - The title of the submission will change to reflect R1's comments - "OLD" in caps refers to our group of study participants, while "older" adults mentioned to in the discussion section referes to the general population of older adults. - We will clarify Table 1 and the reporting of results as outlined by the meta-reviewer, as well as correct the scale axis in Figure 3 (bottom) [R1]