HCI Experiments

  • t-Test

Running Repeated Measures ANOVA in R

Suppose your data was in a file all_results.csv, which had the form:


You could run an ANOVA and post-hoc tests comparing slow vs. medium vs. fast (within subjects/repeated measures) across the participants as follows:

all_results <- read.csv("all_results.csv")

# Take a subset of the data only slow vs med vs fast user correct
smf <- all_results[c("slow_userCorrect","med_userCorrect","fast_userCorrect")]

# Create the participant column for each of the 3 conditions (used when stacked)
participant <- rep(all_results$Participant, 3)

# stack the data for repeated-measures anova (1 row per condition)
smf_stack <- stack(smf)
smf_stack[3] <- participant

# Name the data
colnames(smf_stack) <- c("numCorrect", "condition", "participant")

writeLines("\nSummary of Slow/Medium/Fast\n----------------------------------------")

# run the ANOVA
aov.out = aov(numCorrect ~ condition + Error(participant/condition), data=smf_stack)
writeLines("\n\nANOVA Results\n----------------------------------------")

# run the post-hoc tests (t-Test with Holm correction)
writeLines("\n\nPost-hoc Test Results (Pairwise t-Test with Holm correction)\n----------------------------------------")
print(with(smf_stack, pairwise.t.test(numCorrect, condition, p.adjust.method="holm", paired=T)))

Non-parametric Tests

For analyzing non-parametric data, such as Likert-formatted items, consider using the tests listed below. It is also worthwhile to read through an email from Brian Gleeson on the topic.
  • Friedman Test
  • Wilcoxon Signed-Rank Test
Running Friedman and Wilcoxon in R
Suppose your data was in a file all_results.csv, which had the form:


Here slow_avatarLetter, med_avatarLetter, and fast_avatarLetter represent Likert-formatted question responses to the same question asked after the slow condition, medium condition, and fast condition (non-parametric repeated measures). You could run a Friedman test and Wilcoxon post-hoc tests to determine if there is significant differences between the responses (slow vs medium vs fast) as follows:

all_results <- read.csv("all_results.csv")

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]

# Take a subset of the data only slow vs med vs fast user correct
diff_smf <- all_results[c("slow_avatarLetter","med_avatarLetter","fast_avatarLetter")]

# Create the participant column for each of the 3 conditions (used when stacked)
participant <- rep(all_results$Participant, 3)

# stack the data for repeated-measures anova (1 row per condition)
diff_smf_stack <- stack(diff_smf)
diff_smf_stack[3] <- participant

# Name the data
colnames(diff_smf_stack) <- c("numCorrect", "condition", "participant")

writeLines("\nSummary of Slow/Medium/Fast\n----------------------------------------")
diff_smf_modes <- c(Mode(diff_smf[,1]), Mode(diff_smf[,2]), Mode(diff_smf[,3]))
cat(c("Modes: ", diff_smf_modes, "\n"))

# Run the Friedman test
writeLines("\n\nFriedman Rank Sum Test Results\n----------------------------------------")
diff_smf_results <- friedman.test(numCorrect ~ condition | participant, data=diff_smf_stack)

# Run the post-hoc tests (Wilcoxon with Holm correction)
writeLines("\n\nPost-hoc Test Results (Pairwise Wilcoxon Test with Holm correction)\n----------------------------------------")
print(with(diff_smf_stack, pairwise.wilcox.test(numCorrect, condition, p.adjust.method="holm", paired=T)))

Note that running this code will often produce warnings from R about ties and zero values preventing exact computation of a p-value. The tie warning happens when multiple participants in a block give the same response. The zero warning happens when a single participant gives the same response in two blocks (e.g., P1 slow = 4, P1 med = 4). It seems many people ignore these warnings and still use the results.


Between Subjects

  • Kruskal-Wallis
  • Mann-Whitney U

Running the analysis in R

See examples on how to use R to analyze your data here: https://github.com/pbeshai/stats. Feel free to contribute to the repository.


Reporting the data

TODO: Give examples of how to report the typical results. P-value and effect size. Examples for: ANOVA, t-test, Friedman, Wilcoxon, Kruskal-Wallis, and Mann-Whitney U.

