Power and Sample Size in R

A website for all things power and sample size in R

Chapter 3 exercises

3: Comparing One or Two Means

3: Comparing One or Two Means

Published

January 25, 2025

Exercise 1 A systolic blood pressure reading of 140 mm Hg or higher is considered high blood pressure (hypertension). Suppose that systolic blood pressure measurements in a population are normally distributed.

  1. We expect that the mean systolic blood pressure in a population is 145 and that the standard deviation is 10 mm Hg. If we take a sample of 25 individuals from the population, what is the power for a one-sample \(t\) test to reject the null hypothesis \(H_0 \colon \mu \leq 140\) and conclude \(H_A \colon \mu > 140\) when specifying a significance level of 0.025?
  1. What sample size is needed to reject \(H_0 \colon \mu \leq 140\) and conclude \(H_A \colon \mu > 140\) with 80% power when specifying a significance level of 0.025?
  1. Because the standard deviation is not known in advance and has a strong effect on power, it is a good parameter to target for a sensitivity analysis. Produce a table showing how the sample size requirement changes as the true standard deviation varies from 6 to 20 mm Hg.
  1. Suppose that in part (b), we expect that only 75% of the individuals that we recruit will provide a valid measurement of systolic blood pressure. How many individuals should we recruit to achieve the desired level of power? Round your answer up to the next highest whole number.

Exercise 2 A randomized trial will use a two-sided, two-sample \(t\) test with significance level of 0.05 to compare systolic blood pressure outcomes in a treatment and a usual care condition.

  1. When using equal allocation with a total \(N\) of 100, what is the power to detect standardized effect sizes \(d\) ranging from 0.4 to 0.6?
  1. The investigators are considering a 3:1 allocation to the treatment and usual care conditions. If the allocation ratio \(r = n_T/n_{UC}\) is equal to 3, what proportion of the sample will be allocated to each condition?
  1. Find the power to detect \(d = 0.6\) when total \(N\) is 100 and allocation ratio \(r = n_T/n_{UC} = 3\). How does this compare to the power for \(d = 0.6\) in part (a)?
  1. Suppose that the variance of the outcome is expected to be twice as high in the treatment group compared to the usual care group. The investigators prefer the simplicity of equal allocation, but they are concerned that equal allocation could result in a loss of power. What allocation ratio will provide the highest power? What would be the proportions of the sample allocated to each group? How does power for this allocation compare to power for equal allocation? What would you advise in this situation?

Exercise 3 The POINTER trial was a two-arm randomized controlled trial comparing immediate to postponed catheter drainage in patients with infected necrotizing pancreatitis (Grinsven et al. Trials 20, 239 (2019). https://doi.org/10.1186/s13063-019-3315-6). The primary endpoint was the Comprehensive Complications Index (CCI), including all complications other than pre-existent complications occurring after randomization until 6 months after randomization. Based on this excerpt describing the sample size calculation, replicate the calculation:

“The sample size was calculated based on the primary endpoint, the CCI. A mean CCI score of 40 (with standard deviation of 27) for postponed catheter drainage is based on the number of complications identified in the step-up arm of the PANTER trial [3] and TENSION trial [9]. Analysis by Student’s t test will have 80% power to detect a clinically relevant reduction of 15 to a CCI score of 25 [21] at a significance level of 0.05; for a sample size that equals 2 x 51, this will result in 102 evaluable patients. Assuming a dropout rate of about 2%, then 104 patients need to be included.”

Exercise 4 Investigators are planning a single-arm pre-post study in which participants will be assessed for the outcome variable (\(Y_1\)), an intervention will be applied, and then the outcome variable will be assessed again (\(Y_2\)). They will conduct a one-sided test \(H_0 \colon \mu_{d} \leq 0\) versus \(H_A \colon \mu_{d}>0\), where \(\mu_{d} = \mu_1-\mu_2\) with \(\alpha=0.025\). Suppose that the true mean difference is 4 and the variance of the outcome variable is expected to \(100\). The correlation between measurements within participants is expected to be in the range of 0.4-0.6.

  1. How many participants are needed to have \(80\%\) power to reject \(H_0\)? Compute the required sample size assuming values of 0.4, 0.5 and 0.6 for the correlation. Comment on how the value of the correlation affects the sample size required.
  1. Suppose that a total of 40 participants are available for the study. What is the smallest mean difference that can be detected with \(80\%\) power?

Exercise 5 For \(t\) tests, sample size impacts power through two routes. Explain these two routes.

Exercise 6 For a two-sample \(t\) test with equal group variances, show that power for a \(k:1\) allocation to groups 1 and 2 is equal to power for a \(1:k\) allocation to groups 1 and 2. (Hint: Show that the noncentrality parameters are equal and the degrees of freedom for the test statistic are equal.)

Exercise 7 Sometimes, publications do not directly provide information that we would like, but we can calculate or estimate the quantity we are interested in using the information that is provided. Robinson et al (2016) <doi.org/10.1111/eip.12137> report results of a pilot study to assess the efficacy of a suicide prevention program among secondary school students. The study had a single-arm, pre-post design. The analysis of responses on the Suicide Ideation Questionnaire (SIQ) used a paired \(t\) test. The paper reports a \(T\) test statistic of 6.2 with 20 df. Using this information, obtain an estimate of the correlation between the pre and post test measurements. (Hint: What is the formula for the \(T\) statistic for a paired \(t\) test?)

Exercise 8 Show that for a two independent sample \(t\) test with equal variances, maximal power occurs when we have equal sample sizes in each group, \(n_1 = n_2\), i.e., allocation ratio \(r=1\). (Hint: Power is maximized when the noncentrality parameter is maximized.)

Exercise 9 In a pharmacokinetic study, investigators want to compare two formulations of a drug with respect to an outcome variable called the area under the curve (AUC). They have 60 participants available and will use equal allocation, with half of the participants getting formulation 1 and half getting formulation 2. AUC is highly skewed and the data will be log-transformed prior to analysis, which will be conducted using a two-sample \(t\) test.

Let \(\gamma_1\) and \(\gamma_2\) represent the medians for the two formulations on the original scale. The study plans to test \(H_0 \colon \frac{\gamma_1}{\gamma_2} \leq 1\) versus \(H_A \colon \frac{\gamma_1}{\gamma_2} > 1\). Suppose that the true median for formulation 1 is 3 and the true ratio of medians is 1.5.

  1. What is the true median for formulation 2? Assuming lognormal data, what are the means for formulations 1 and 2 on the log-transformed scale?
  1. Suppose that the common CV is 1.8. What is the common \(\sigma\) on the log-transformed scale?
  1. What is the power to reject the null with one-sided \(\alpha=0.025\)?