Chapter 10 exercises - Power and Sample Size in R

10: Correlation and Linear Regression

Published

January 25, 2025

Exercise 1 One of the objectives of a planned study is to estimate the correlation between two inflammatory markers, CRP and IL-6, in blood serum. Each participant will provide a blood sample and both CRP and IL-6 will be measured. Suppose that the true correlation between CRP and IL-6 is expected to be 0.4.

What sample size is required to achieve 80% power to reject the null that \(\rho = 0\), two-sided \(\alpha\) of 0.05?
What sample size is required to achieve 80% power to reject the null that \(\rho < 0.2\), one-sided \(\alpha\) of 0.025?

Exercise 2 Investigators hypothesize that a certain genotype is associated with insulin resistance in older adults. They are planning a study that will collect cross-sectional data from a sample of adults ages 50 years and older. To test their hypothesis, they will use linear regression to regress a measure of insulin resistance (insulinres) on genotype (1=yes, 0=no) and the potential confounders age, race category, and education category. To help them plan their sample size requirement, they have a data set (“insulinresist”, available in the powertools package) with similar variables collected from 60 individuals. They wish to know what sample size is needed for the planned study in order to have 80% probability of rejecting the null hypothesis that there is no association between the genotype and insulin resistance.

Conduct a sample size calculation for the planned study. Use the dataset to estimate the quantities that you need for the calculation. Include some sensitivity analysis to see how your required sample size varies as you change important assumptions. Write a short description of your calculation.

Exercise 3 The formula for the noncentrality parameter (NCP) for the two-sample \(t\) test of \(H_0 \colon \mu_T = \mu_C\) versus \(H_A \colon \mu_T \neq \mu_C\) with equal allocation and assuming equal variances is \(\lambda = \sqrt{\frac{n}{2}} \frac{(\mu_T-\mu_C)}{\sigma}\), where \(n\) is the sample size per group and \(\sigma\) is the standard deviation of the outcome variable. Suppose that you plan to analyze the data from a planned two-group trial with equal allocation using a linear regression model. The model will include a dummy variable \(X_i\) equal to 1 for treatment group and 0 for control group. Recall that for a simple linear regression model, the NCP for the test of \(H_0 \colon \beta_{1} = \beta_{10}\) is

\[ \frac{(\beta_{1A}- \beta_{10}) \sqrt{\sum_{i=1}^N (x_i - \bar x)^2}}{\sigma_{\epsilon}} \]

where \(\beta_{1A}\) is the assumed true value of the regression coefficient, \(N\) is the total sample size, and \(\sigma_{\epsilon}\) is the error standard deviation. Show that this quantity is mathematically equivalent to the NCP for the two-sample \(t\) test.