One-Way ANOVA

Custom Function

The following function, one_way_anova, takes a response variable and a grouping variable, computes the necessary sums of squares, degrees of freedom, and the F-statistic, and finally returns the p-value along with other ANOVA metrics.

# Function to perform one-way ANOVA
one_way_anova <- function(response, group) {
  # Data preparation: create a data frame with the response and grouping variables.
  data <- data.frame(response, group)
  unique_groups <- unique(group)
  
  # Calculate the overall mean of the response variable.
  overall_mean <- mean(response)
  
  # Calculate sum of squares between groups (SSB)
  SSB <- sum(table(data$group) * (tapply(data$response, data$group, mean) - overall_mean)^2)
  
  # Calculate sum of squares within groups (SSW)
  SSW <- sum(tapply(data$response, data$group, function(x) sum((x - mean(x))^2)))
  
  # Degrees of freedom
  df_between <- length(unique_groups) - 1
  df_within <- length(response) - length(unique_groups)
  
  # Mean squares for between and within groups
  MS_between <- SSB / df_between
  MS_within <- SSW / df_within
  
  # Compute the F-statistic
  F <- MS_between / MS_within
  
  # Calculate the p-value from the F-distribution.
  p_value <- pf(F, df_between, df_within, lower.tail = FALSE)
  
  # Return the results as a list.
  return(list(F_statistic = F, 
              p_value = p_value, 
              df_between = df_between, 
              df_within = df_within))
}

Analysis Using the mtcars Dataset

We now load the built-in mtcars dataset and apply our custom one-way ANOVA function to analyze the relationship between miles per gallon (MPG) and the number of gears.

# Load the mtcars dataset
data(mtcars)

# Apply the custom one-way ANOVA function on mpg and gear
result_custom_anova <- one_way_anova(mtcars$mpg, mtcars$gear)
print(result_custom_anova)
## $F_statistic
## [1] 10.90072
## 
## $p_value
## [1] 0.000294828
## 
## $df_between
## [1] 2
## 
## $df_within
## [1] 29

Comparison with Built-In aov Function

We perform the same ANOVA using R’s built-in aov function and compare the results.

# Perform one-way ANOVA using the aov function
result_2 <- aov(mpg ~ factor(gear), data = mtcars)

# Display a summary of the ANOVA results
summary(result_2)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(gear)  2  483.2  241.62    10.9 0.000295 ***
## Residuals    29  642.8   22.17                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results are identical.

BONUS: Groups Visualization with ggplot2

We use the ggplot2 package to create a boxplot that displays the distribution of MPG across different gear levels. Different colors are assigned to each level, and jittered data points are added for clarity.

library(ggplot2)

# Create a boxplot with jittered points for each gear level
ggplot(mtcars, aes(x = factor(gear), y = mpg, fill = factor(gear))) +
  geom_boxplot() +
  geom_jitter(position = position_jitter(width = 0.2), alpha = 0.5) +
  # Add data points with slight horizontal jitter
  scale_fill_manual(values = c("red", "green", "blue")) +
  # Specify colors for each gear level
  labs(title = "Boxplot of MPG at Different Gears",
       x = "Gears",
       y = "MPG") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)  # Center the title
  )

What do you observe?

Two-Way ANOVA with Interaction

Custom Function

The following function, two_way_anova, computes the various components of a two-way ANOVA with an interaction term from first principles. Such approach provides an insight into the mechanics inherent in ANOVA computations.

# Two-way ANOVA function with interaction
two_way_anova <- function(data, response_col, factor1_col, factor2_col) {
  # Extract data columns
  response <- data[[response_col]]
  factor1 <- data[[factor1_col]]
  factor2 <- data[[factor2_col]]
  
  # Identify unique levels for each factor
  levels_factor1 <- unique(factor1)
  levels_factor2 <- unique(factor2)
  
  # Compute overall (grand) mean and group means
  grand_mean <- mean(response)
  means_factor1 <- tapply(response, factor1, mean)
  means_factor2 <- tapply(response, factor2, mean)
  
  # Preallocate a matrix for the interaction means
  means_interaction <- matrix(0, nrow = length(levels_factor1), 
                                 ncol = length(levels_factor2))
  
  # Calculate cell means for the interaction between factor1 and factor2
  for (i in 1:length(levels_factor1)) {
    for (j in 1:length(levels_factor2)) {
      means_interaction[i, j] <- mean(response[factor1 == levels_factor1[i] & factor2 == levels_factor2[j]])
    }
  }
  
  # Calculate sums of squares
  ss_total <- sum((response - grand_mean)^2)
  ss_factor1 <- sum((means_factor1 - grand_mean)^2 * table(factor1))
  ss_factor2 <- sum((means_factor2 - grand_mean)^2 * table(factor2))
  
  # Preallocate matrix for the interaction sum of squares
  ss_interaction_mat <- matrix(0, nrow = length(levels_factor1), 
                                  ncol = length(levels_factor2))
  
  # Compute interaction sum of squares for each cell
  for (i in 1:length(levels_factor1)) {
    for (j in 1:length(levels_factor2)) {
      ss_interaction_mat[i, j] <- table(factor1, factor2)[i, j] * 
        (means_interaction[i, j] - means_factor1[i] - means_factor2[j] + grand_mean)^2
    }
  }
  
  ss_interaction <- sum(ss_interaction_mat)
  
  # Compute error sum of squares
  ss_error <- ss_total - ss_factor1 - ss_factor2 - ss_interaction
  
  # Calculate degrees of freedom for each component
  df_factor1 <- length(levels_factor1) - 1
  df_factor2 <- length(levels_factor2) - 1
  df_interaction <- df_factor1 * df_factor2
  df_error <- length(response) - (length(levels_factor1) * length(levels_factor2))
  
  # Compute mean squares
  ms_factor1 <- ss_factor1 / df_factor1
  ms_factor2 <- ss_factor2 / df_factor2
  ms_interaction <- ss_interaction / df_interaction
  ms_error <- ss_error / df_error
  
  # Calculate F-statistics for each source of variation
  f_factor1 <- ms_factor1 / ms_error
  f_factor2 <- ms_factor2 / ms_error
  f_interaction <- ms_interaction / ms_error
  
  # Calculate corresponding p-values
  p_factor1 <- 1 - pf(f_factor1, df_factor1, df_error)
  p_factor2 <- 1 - pf(f_factor2, df_factor2, df_error)
  p_interaction <- 1 - pf(f_interaction, df_interaction, df_error)
  
  # Create and return a summary data frame (ANOVA table)
  results <- data.frame(
    Factor = c(factor1_col, factor2_col, "Interaction", "Error"),
    Df = c(df_factor1, df_factor2, df_interaction, df_error),
    SumSq = c(ss_factor1, ss_factor2, ss_interaction, ss_error),
    MeanSq = c(ms_factor1, ms_factor2, ms_interaction, ms_error),
    Fvalue = c(f_factor1, f_factor2, f_interaction, ""),
    Pval = c(p_factor1, p_factor2, p_interaction, "")
  )
  
  return(results)
}

Applying the Custom Two-Way ANOVA Function

In this section, we load the CO2 dataset and apply the custom two-way ANOVA function. We fit a two-way ANOVA model with interaction term to estimate how the mean of a uptake changes according to the levels of the factors Type and Treatment. If the model is additive (i.e.  the effects of the factors combine independently), interaction term will not be statistically significant.

# Load the built-in CO2 dataset
data(CO2)

# Apply the custom two-way ANOVA function to the CO2 data
result <- two_way_anova(CO2, "uptake", "Type", "Treatment")
print(result)
##        Factor Df     SumSq     MeanSq           Fvalue                 Pval
## 1        Type  1 3365.5344 3365.53440 52.5085619793696 2.37768027488983e-10
## 2   Treatment  1  988.1144  988.11440 15.4164124400977 0.000181707966298905
## 3 Interaction  1  225.7296  225.72964 3.52179996311277   0.0642128296415815
## 4       Error 80 5127.5971   64.09496

Comparison with the Built-in ANOVA Function

We now compare the results from our custom function with those from R’s built-in aov function. The summary output is printed with high precision (8 digits) for a detailed comparison.

# Perform two-way ANOVA using R's built-in aov function
anova_model <- aov(uptake ~ Type * Treatment, data = CO2)
print(summary(anova_model), digits = 8)
##                Df    Sum Sq   Mean Sq  F value     Pr(>F)    
## Type            1 3365.5344 3365.5344 52.50856 2.3777e-10 ***
## Treatment       1  988.1144  988.1144 15.41641 0.00018171 ***
## Type:Treatment  1  225.7296  225.7296  3.52180 0.06421283 .  
## Residuals      80 5127.5971   64.0950                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results are identical.

BONUS: Visualizing Factors with ggpubr

The final section creates a visual summary of the data using the ggpubr package. The plot displays the mean and standard error of uptake for each level of Treatment, differentiated by Type. A dot plot overlay shows individual data points. Custom colors (red and green) are used to clearly distinguish between the groups.

library(ggpubr)

ggline(CO2, x = "Treatment", y = "uptake", color = "Type",
       add = c("mean_se", "dotplot"),
       palette = c("red", "green"))

If the underlying model was strictly additive, what would one expect to see here? Does the plot above reflect this expectation?

Multiple Comparisons and Tukey’s HSD

Type I Error in Pairwise Comparisons

In multiple testing scenarios, it is essential to account for the inflation of Type I error. First, we compute the number of pairwise comparisons among 6 groups using the combination formula, \(C(6,2)\). Then, we calculate the probability of encountering at least one false positive when performing 15 independent tests at a significance level of 0.05.

# Compute the number of pairwise comparisons among 6 groups: C(6,2)
num_comparisons <- factorial(6) / (factorial(2) * factorial(4))
num_comparisons
## [1] 15
# Compute the overall probability of no false positives for 15 tests
p_avoid_type1_error <- (1 - 0.05)^15
p_avoid_type1_error
## [1] 0.4632912
# Compute the probability of at least one false positive
p_atleast1_type1_error <- 1 - p_avoid_type1_error
p_atleast1_type1_error
## [1] 0.5367088

Simulation of Multiple Pairwise Comparisons

We now simulate a scenario where 6 independent samples (each of size 50) are drawn from the same population. For every possible pair, a two-sample t-test is conducted. We then calculate the proportion of tests that yield a p-value below the 0.05 significance threshold.

# Set seed for reproducibility
set.seed(1945)

# Preallocate a matrix to store 6 samples, each of size 50
r_matrix <- matrix(NA, nrow = 50, ncol = 6)

# Generate 6 samples drawn from a Normal distribution (mean = 5, sd = 3)
r_matrix <- apply(r_matrix, 2, function(x){ rnorm(50, mean = 5, sd = 3) })

# Generate all possible pairs of sample indices (i.e., pairwise comparisons)
pairwise_comparisons <- combn(1:ncol(r_matrix), 2)

# Conduct t-tests for each pair and extract the p-values
multiple_comparisons <- apply(pairwise_comparisons, 2, function(indexes) {
  group1 <- r_matrix[, indexes[1]]
  group2 <- r_matrix[, indexes[2]]
  t_test_result <- t.test(group1, group2)
  return(t_test_result$p.value)
})

# Identify significant comparisons at alpha = 0.05
significant_results <- multiple_comparisons < 0.05

# Calculate the proportion of significant results
proportion_significant <- sum(significant_results) / length(multiple_comparisons)
proportion_significant
## [1] 0.2

Analysis of the ChickWeight Dataset

The ChickWeight dataset provides a valuable framework for studying growth patterns under different dietary conditions. Here, we extract the final weights for each chick and examine the distribution of observations across the diet groups.

# Load the ChickWeight dataset
data(ChickWeight)

# Subset to obtain the final weights for each chick (i.e., at the maximum recorded time)
final_weights <- ChickWeight[ChickWeight$Time == max(ChickWeight$Time), ]

# Remove the unique identifier column (assumed to be the third column)
final_weights <- final_weights[, -3]

# Preview the structure of the dataset
head(final_weights)
##    weight Time Diet
## 12    205   21    1
## 24    215   21    1
## 36    202   21    1
## 48    157   21    1
## 60    223   21    1
## 72    157   21    1
# Determine the total number of observations (N) and the distribution per diet (n)
total_observations <- dim(final_weights)[1]
group_counts <- table(final_weights$Diet)
total_observations
## [1] 45
group_counts
## 
##  1  2  3  4 
## 16 10 10  9

Balancing the ChickWeight Dataset

For demonstration purposes, we address the issue of unequal group sizes by balancing the dataset through random sampling, ensuring that each diet group contributes an equal number of observations. Specifically, we use the minimum group size as the target for this balancing procedure. This approach facilitates the application of classical ANOVA methods, which assume a balanced design. However, in practical settings—where rigorous data analysis and nuanced interpretation are essential—it is advisable to employ alternative statistical techniques (such as Welch’s ANOVA, Generalized Linear Models, or linear mixed-effects models) that are capable of handling unequal group sizes.

# Function to balance the dataset based on a specified factor variable and target size
balance_data <- function(data, factor_var_name, min_size) {
  split_data <- split(data, data[[factor_var_name]])
  balanced_data <- lapply(split_data, function(sub_data) {
    if (nrow(sub_data) > min_size) {
      sub_data[sample(nrow(sub_data), min_size), ]
    } else {
      sub_data
    }
  })
  do.call(rbind, balanced_data)
}

# Determine the minimum group size across diets
min_size <- min(table(final_weights$Diet))

# Set seed for reproducibility
set.seed(1913)

# Generate a balanced dataset using the defined function
final_weights_balanced <- balance_data(final_weights, "Diet", min_size)

One-Way ANOVA Analysis of the ChickWeight Dataset

With a balanced dataset, we conduct a one-way ANOVA to evaluate the effect of the diet on chick weight. The ANOVA summary is stored for subsequent calculations.

# Run ANOVA to assess the effect of Diet on weight
anova_result <- aov(weight ~ Diet, data = final_weights_balanced)

# Save the ANOVA summary to extract the Mean Square Error (MSE) later
anova_summary <- summary(anova_result)
anova_summary
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Diet         3  80959   26986   7.778 0.000488 ***
## Residuals   32 111023    3469                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Manual Implementation of Tukey’s HSD Test

We manually implement Tukey’s Honest Significant Difference (HSD) test. This involves calculating the Mean Square Error (MSE), group means, and pairwise differences. These values are used to compute the test statistic (q value) and compare it with the critical value obtained from the Studentized range distribution.

# Extract the Mean Square Error (MSE) from the ANOVA summary
MSE <- anova_summary[[1]][["Mean Sq"]][2]

# Define variables necessary for further calculations:
# Total number of observations (N), sample size per group (n), and number of groups (k)
N <- dim(final_weights_balanced)[1]
n <- as.numeric(table(final_weights_balanced$Diet)[1])
k <- length(unique(final_weights_balanced$Diet))

# Calculate the mean weight for each diet group
group_means <- tapply(final_weights_balanced$weight, final_weights_balanced$Diet, mean)

# Generate all possible pairwise comparisons among the diet groups
comparisons <- combn(levels(final_weights_balanced$Diet), 2)

# Manually compute Tukey's HSD test for each pairwise comparison
tukey_test <- apply(comparisons, 2, function(pair) {
  x <- group_means[pair[1]]
  y <- group_means[pair[2]]
  mean_diff <- abs(x - y)
  
  # Compute the test statistic (q value)
  q_value <- mean_diff / sqrt(MSE / n)
  
  # Determine the critical value from the Studentized range distribution
  critical_value <- qtukey(p = 0.95, nmeans = k, df = N - k)
  
  # Compute the HSD (minimum difference required for significance)
  hsd <- critical_value * sqrt(MSE / n)
  
  # Assess whether the observed difference is statistically significant
  significant <- as.numeric(q_value >= critical_value)
  
  return(c(pair, round(mean_diff, 5), round(hsd, 5), round(q_value, 5), 
           round(critical_value, 5), significant))
})

# Transpose the result for improved readability and assign column names
tukey_test <- t(tukey_test)
colnames(tukey_test) <- c("Group 1", "Group 2", "Mean Abs Diff", "HSD", 
                          "Q Value", "Critical Value", "Significant")
tukey_test
##      Group 1 Group 2 Mean Abs Diff HSD        Q Value   Critical Value
## [1,] "1"     "2"     "67.88889"    "75.23024" "3.45771" "3.83162"     
## [2,] "1"     "3"     "131.88889"   "75.23024" "6.71735" "3.83162"     
## [3,] "1"     "4"     "86.44444"    "75.23024" "4.40278" "3.83162"     
## [4,] "2"     "3"     "64"          "75.23024" "3.25964" "3.83162"     
## [5,] "2"     "4"     "18.55556"    "75.23024" "0.94507" "3.83162"     
## [6,] "3"     "4"     "45.44444"    "75.23024" "2.31457" "3.83162"     
##      Significant
## [1,] "0"        
## [2,] "1"        
## [3,] "1"        
## [4,] "0"        
## [5,] "0"        
## [6,] "0"

HSD Test and Visualization

For comparative purposes, we also perform Tukey’s HSD test using R’s built-in function and visualize the results. This provides a useful benchmark against our manual implementation.

# Perform Tukey's HSD test using the built-in function
tukey_builtin <- TukeyHSD(anova_result)
tukey_builtin
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ Diet, data = final_weights_balanced)
## 
## $Diet
##          diff         lwr       upr     p adj
## 2-1  67.88889   -7.341349 143.11913 0.0887713
## 3-1 131.88889   56.658651 207.11913 0.0002304
## 4-1  86.44444   11.214207 161.67468 0.0192921
## 3-2  64.00000  -11.230237 139.23024 0.1181152
## 4-2  18.55556  -56.674682  93.78579 0.9082494
## 4-3 -45.44444 -120.674682  29.78579 0.3732078
# Plot the Tukey HSD results for visual inspection
plot(tukey_builtin)

It is important to emphasize that the primary objective of this demonstration is not the data analysis per se, but rather to illustrate the inner workings and practical applications of Tukey’s HSD test. Notably, the random sampling procedure employed to balance the groups for subsequent ANOVA analysis can influence the test outcomes. Specifically, if the procedure is repeated using a different random seed, the resulting balanced dataset may differ, potentially leading to variations in the results—where differences deemed non-significant in one instance might become significant in another.