We thank the reviewers for their insightful critiques and are glad they found our work unique, sound and well written. We address critical issues. Contribution: All reviewers found the topic interesting (1AC) and of critical importance (R3). They (1AC, R3, R4) praised our well thought methodology as a model for others. Yet, we see that our our contribution to the under-researched topic of user upgrades needs clarification: We are the first to use observations and diaries in a longitudinal study. We complement existing work and extend knowledge by contributing important _new_ insights and findings: Delay of installing Mac OS upgrades has increased with each of the last 3 releases Users do not notice security improvements after upgrading Users (even advanced) do not prepare for upgrades Progress feedback during the installation is a source of frustration Users notice most changes in the first week Generally, the costs of an upgrade outnumber the benefits Users have partial recollections of the installation (even if it was negative) Only a slim majority are favourable towards the upgrade one month later The duration of the installation process influences future decisions to upgrade Further, we offer novel design recommendations with potential for significant impact (R3). While seemingly obvious (2AC), these indicate how upgrades break basic design principles. An interview study alone is unlikely to have uncovered most of these (2AC), especially since several participants misremembered their experiences. Vaniea’s [50, 51] earlier studies offer only broad implications for security, resources, and documentation. Our recommendations, only two of which overlap with earlier studies, open a design space on upgrades. Sample bias: We recognise issues in the sample, but argue it is less problematic than reviewers thought (1AC, R3, R4, 2AC). We will explain the factors that limit the bias. While Study 2 is weighted towards experts (5 experts, 4 above avg, 5 avg), previous work suggests that expertise does not affect issues experienced during an upgrade. [50] found that “People with more technical experience had similar issues to those with less experience." Also, our results align with their work that had broader samples, suggesting the limited role of expertise during an upgrade. Expertise only seems to influence installation rates: experts install updates faster and more frequently than non-experts [26]. Data from Dumitras' group [a] on 8.4M hosts supports this (R4). Technical users are 50% faster than common users in installing application patches. The median installation time for common users is 45 days, consistent with Study 1 (median delay of OS upgrades: 46 days), that we qualify as conservative but valid. As to compensation bias, our research institute (where hundreds of user studies have been conducted) does not allow paying participants (R4). Study 2 sample size: we believe the long recruitment is a result in and of itself. We did not intend to imply this caused a too small sample: 14 participants is not unusual for a rich observation study with a longitudinal diary component, e.g. [b, c]. Upgrade variety: unlike previous broader studies, we focused only on OS upgrades. This allowed us to reach targeted conclusions, but not so specific as if we had studied one OS. We agree that our claim on individual differences is too strong (R3) and needs softening. Peak-end: We see that our treatment of peak-end effects, and its relevance to upgrades, was not clear and needs adjustment (R2, 2AC). Our goal was to investigate the link between actual experience and future upgrade decisions.. We found that duration plays a bigger role than peak-end moments. We did not intended to frame this as a major theoretical claim (2AC). Rather, it is a new but narrow finding (an instance seemingly contrary to the peak-end rule), that some of our recommendations address. Methodology: We will better justify some choices in Study 2. Our lightweight grounded theory approach is more in line with thematic analysis, where intercoder reliability (2AC) is uncommon. Our goal was not to make any statistical claims. As for counts of valence/emotions (R2), we were careful to not over-interpret them. They give an idea of the relative frequency, but we make no further claims. Finally, we will spell out that many references in the introduction are news articles (R4). Most of what we know about upgrades does not come from academic work: this highlights the need for our contribution. We will address all of the above in a revised version of the paper. a. Nappa et al. The attack of the clones: a study of the impact of shared code on vulnerability patching. SP 2015 b. Rieman. A field study of exploratory learning strategies. TOCHI 1996 c. Jokela et al. A Diary Study on Combining Multiple Information Devices in Everyday Activities and Tasks. CHI '15 Numbered references are in the paper