R Packages

Here is a compilation of R packages with functions to perform or facilitate power and sample size calculations. To search this page, we recommend using the Ctrl+F shortcut to open a search box.

Multipurpose packages

powertools: Power and sample size calculation for a range of different study designs and outcomes. This package is a companion to the book Crespi (2025) Power and Sample Size in R. Topics include: ANOVA, chi-square tests, means, proportions, correlation coefficients, cluster randomized trials, individually randomized group treatment trials, multisite trials, linear, logistic and Poisson regression, nonparametric tests of location, and multiple primary endpoints.
PSS.Health: Power and Sample Size for Health Researchers is a Shiny app with functions related to sample size and power calculations common in the health care field. Means and proportions (including test for correlated groups, equivalence, non-inferiority and superiority), association, correlations coefficients, regression coefficients (linear, logistic, gamma, and Cox), linear mixed model, Cronbach’s alpha, interobserver agreement, intraclass correlation coefficients, limits of agreement on Bland-Altman plots, area under the curve, sensitivity and specificity. Online version at <https://hcpa-unidade-bioestatistica.shinyapps.io/PSS_Health/>.
pwr2ppl: Power analysis for designs including t-tests, correlations, multiple regression, ANOVA, mediation, and logistic regression. Functions accompany Aberson (2019) <doi:10.4324/9781315171500>.
pwrss: Power and sample size calculations for proportion (one-sample), mean (one-sample), difference between two proportions (independent samples), difference between two means or groups, correlation (one-sample), difference between two correlations, single coefficient in multiple linear regression, logistic regression, and Poisson regression, an indirect effect in mediation analysis, linear regression, analysis of variance, and goodness-of-fit or independence for contingency tables. Bulus and Polat (2023) <https://osf.io/ua5fc>.
TrialSize: A wide range of functions for power and sample size calculation based on the book Chow SC, Shao J, Wang H. Sample Size Calculation in Clinical Research, 3rd edition. New York: Marcel Dekker, 2018. designsi ze has functions for sample size calculation based on the earlier 2nd edition of the book.
WebPower: Functions for conducting both basic and advanced statistical power analysis including correlation, proportions, t tests, one-way ANOVA, two-way ANOVA, linear regression, logistic regression, Poisson regression, mediation analysis, longitudinal data analysis, structural equation modeling (SEM) and multilevel modeling. It also serves as the engine for conducting power analysis online at <https://webpower.psychstat.org>.

ANOVA/factorial designs

BDEsize: Calculates sample size required to detect a certain standardized effect size for balanced design of factorial experiments. Also provides plots.
easypower: Facilitates user input for one-way ANOVA n.oneway and factorial ANOVA n.multiway from the ‘pwr’ package.
powerbydesign: Functions for bootstrapping the power of ANOVA designs based on estimated means and standard deviations of the conditions.
pwr2: Power and sample size for one-way ANOVA and two-way ANOVA. Balanced designs only.
pwr4exp: Tools for calculating power and sample size for a variety of experimental designs used in agricultural and biological research, including completely randomized, block and split-plot designs.
Superpower: Analytic and simulation-based power analysis for factorial designs. See Lakens, D., & Caldwell, A. R. (2021). “Simulation-Based Power Analysis for Factorial Analysis of Variance Designs”. <doi:10.1177/2515245920951503>.

Bayesian methods

bayescount: Power calculations and Bayesian analysis of count distributions.
BayesianPower: A collection of methods to determine the sample size for the evaluation of inequality constrained hypotheses by means of a Bayes factor.
BayesPPD: Bayesian power prior design. Bayesian power/type I error calculation and model fitting using the power prior and the normalized power prior for generalized linear models. Examples of applying the package available at <doi:10.32614/RJ-2023-016>.
BayesPPDSurv: Bayesian power/type I error calculation and model fitting using the power prior and the normalized power prior for proportional hazards models with piecewise constant hazard.
bfpwr: Implements z test, t test, and normal moment prior Bayes factors based on summary statistics, with functionality to perform corresponding power and sample size calculations as described in Pawel and Held (2024) <doi:10.48550/arXiv.2406.19940>.
SampleSizeMeans: Sample size calculation using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means: Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion. Joseph L. and Bélisle P. (1997) <https://www.jstor.org/stable/2988525>.
SampleSizeProportions: Sample size requirement calculation using three different Bayesian criteria in the context of designing an experiment to estimate the difference between two binomial proportions. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion. Joseph L., du Berger R. and Bélisle P. (1997) <
doi:10.1002/(sici)1097-0258(19970415)16:7%3C769::aid-sim495%3E3.0.co;2 -v >.

Causal inference

PSpower: Sample size calculations in causal inference with observational data. This package is a tool to calculate sample size under prespecified power with minimal (two) summary quantities needed. Three key components for the sample size calculation are the propensity score distribution, potential outcome distribution, and their correlation. Manuscript.

Clinical trials

blindrecalc: Computation of key characteristics and plots for blinded sample size recalculation. Continuous as well as binary endpoints are supported in superiority and non-inferiority trials. See Baumann, Pilz, Kieser (2022) <doi:10.32614/RJ-2022-001> for detailed description.
clinfun: Functions for designing studies as well as an assortment of other functions not related to power and sample size. Power/sample size topics include: Fisher exact test, group sequential designs, Simon two-stage designs, exact single stage designs and Kruskal-Wallis rank test.
CoRpower: Calculates power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in Gilbert, Janes, and Huang, Power/Sample Size Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials (2016, Statistics in Medicine).
esDesign: Adaptive enrichment designs with sample size re-estimation as presented in Lin et al. (2021) <doi:10.1016/j.cct.2020.106216>. Also contains several widely used adaptive designs, such as the Marker Sequential Test (MaST) design proposed Freidlin et al. (2014) <doi:10.1177/1740774513503739>, adaptive enrichment designs without early stopping, sample size re-estimation procedure based on conditional power proposed by Proschan and Hunsberger (1995).
eselect: Adaptive clinical trial designs with endpoint selection and sample size reassessment for multiple binary endpoints based on blinded and/or unblinded data. Trial design that allows an adaptive modification of the primary endpoint based on blinded information obtained at an interim analysis. The implemented design is proposed in Bofill Roig, M., Gómez Melis, G., Posch, M., and Koenig, F. (2022). <doi:10.48550/arXiv.2206.09639>.
GenTwoArmsTrialSize: Generalized two arms clinical trial sample size calculation. Based on type of endpoints (continuous/binary/time-to-event/ordinal), design (parallel/crossover), hypothesis tests (equality/noninferiority/superiority/equivalence), trial arms noncompliance rates and expected loss of follow-up.
HCT: Single arm trial comparing results to historical controls; given a database of previous treatment/placebo estimates, their standard errors and sample sizes, calculates a significance criteria and power estimate that takes into account the among-trial variation.
MedianaDesigner: Simulation-based power and sample size calculations for a broad class of late-stage clinical trials. Modules include: Adaptive designs with data-driven sample size or event count re-estimation, Adaptive designs with data-driven treatment selection, Adaptive designs with data-driven population selection, Optimal selection of a futility stopping rule, Event prediction in event-driven trials, Adaptive trials with response-adaptive randomization (experimental module), Traditional trials with multiple objectives (experimental module). Traditional trials with cluster-randomized designs (experimental module).
NBDesign: Functions for design and monitoring clinical trials with negative binomial endpoint with variable follow-up.
Power2Stage: Power and sample size for two-stage bioequivalence studies.
powerLATE: Implementation of the generalized power analysis for the local average treatment effect (LATE) proposed by Bansak (2020) <doi:10.1214/19-STS732>. LATE is also known as the complier average causal effect or CACE.
PowerTOST: Power and sample size for study designs used in bioequivalence studies, prominently the TOST procedure (two one-sided t-tests).
SampleSize4ClinicalTrials: Sample size calculations for means and proportions in two-group clinical trials, including equality, superiority, non-inferiority and equivalence. Offers two main functions, one for means and one for proportions.
spass: Study Planning and Adaptation of Sample Size. Blinded sample size reestimation in adaptive study design.
WRestimates: Calculates non-parametric estimates of the sample size, power and confidence intervals for a win-ratio composite endpoint. For more detail on the methodologies , see Yu, R. X. and Ganju, J. (2022) <doi:10.1002/sim.9297>.

Cluster randomized trials including stepped wedge

clusterPower: Power for cluster randomized trials including normal, binary and count outcomes. Includes functions for individually randomized group treatment trials, multiarm cluster randomized trials and classic stepped wedge designs. Some functions implement analytic solutions and others are based on simulation. As of November 1, 2024, this package is no longer on CRAN but is available on an archive page.
crt2power: Methods for powering cluster randomized trials with two co-primary outcomes using five different design techniques.
CRTpowerdist: Calculates attained power and constructs power distributions for unequal cluster size, cross-sectional stepped wedge and parallel cluster randomized trials, with or without stratification. Allowed outcome types are continuous, binary and count. Available at the author’s Github page.
CRTSize: Sample size estimation in cluster randomized trials. Contains traditional power-based methods, empirical smoothing (Rotondi and Donner, 2009), and updated meta-analysis techniques (Rotondi and Donner, 2012).
H2x2Factorial: Sample size methods for cluster randomized 2×2 factorial trials. Supports unequal cluster sizes.
PowerUpR: Tools to calculate power, minimum detectable effect size (MDES), MDES difference, and minimum required sample size for various multilevel randomized experiments (MRE) with continuous outcomes. Accommodates 14 types of MRE designs to detect main treatment effect, seven types of MRE designs to detect moderated treatment effect, five types of MRE designs to detect mediated treatment effects, four types of partially nested design to detect main treatment effect, and three types of PN designs to detect mediated treatment effects. See ‘PowerUp!’ <https://www.causalevaluation.org/>.
SteppedPower: Tools for power and sample size calculation as well as design diagnostics for longitudinal mixed model settings, with a focus on stepped wedge designs.
swCRTdesign: A comprehensive set of tools for examining the design and analysis aspects of stepped wedge cluster randomized trials. Supports a wide variety of designs.
swdpwr: Functions for power calculation for stepped wedge cluster randomized trials, including cross-sectional and cohort designs, binary and continuous outcomes, marginal (GEE) and conditional models (mixed effects model), three link functions (identity, log, logit links), with and without time effects under exchangeable, nested exchangeable and block exchangeable correlation structures.
SWSamp: Functions for sample size calculation and power analysis in a stepped wedge trial. Closed-form and simulation-based procedures. Not on CRAN. Available at author’s github page.

Epidemiology

epiR: Collection of tools for analysis of epidemiological and surveillance studies. Includes functions for sample size calculation for cross-sectional, case-control and cohort studies.
osDesign: Functions for design of case-control and two-phase studies, and the analysis of data that arise from them. Functions provide Monte Carlo based evaluation of operating characteristics such as powers for estimators of the components of a logistic regression model. See: Haneuse, Saegusa and Lumley (2011)<doi:10.18637/jss.v043.i11>.
powerSurvEpi: Power and sample size for testing main effect or interaction effect in the survival analysis of epidemiological (non-randomized) studies, taking into account the correlation between the covariate of the interest and other covariates. Some calculations also take into account the competing risks and stratified analysis. Also includes functions to calculate power and sample size for testing main effect in the survival analysis of randomized clinical trials and conditional logistic regression for nested case-control study.
precisely: Precision (confidence interval width)-based sample size calculation. Focuses on epidemiological studies. Based on Rothman and Greenland (2018). Supports calculations for risk differences and ratios, rate differences and ratios, and odds ratios. A companion Shiny app is available.
samplesizelogisticcasecontrol: Determine sample size or power for case-control studies to be analyzed using logistic regression.
TrendInTrend: Odds ratio estimation and power calculation for the trend in trend model. Estimation of causal odds ratio and power calculation given trends in exposure prevalence and outcome frequencies of stratified data.

Genetics/omics/bioinformatics

DBpower: Finite sample power calculations for detection boundary tests (e.g. Berk-Jones, Generalized Berk-Jones, innovated Berk-Jones) used in set-based inference studies. These detection boundary tests are described in Sun et al., (2019) <doi:10.1080/01621459.2019.1660170>.
genpwr: Power and sample size calculations for genetic association studies allowing for misspecification of the model of genetic susceptibility. Logistic (case/control study design) and linear (continuous phenotype) regression models, using additive, dominant, recessive or degree of freedom coding of the genetic covariate while assuming a true dominant, recessive or additive genetic effect. Gene by environment interactions.
HMP: Hypothesis testing and power calculations for comparing metagenomic samples for human microbiome experiments.
MetSizeR: Shiny app to estimate sample size for a metabolomic experiment to achieve a desired statistical power.
mthapower: Sample size and power for association studies involving mitochondrial DNA haplogroups. Based on Samuels et al. <doi:10.1086/502682>.
pbatR: Pedigree/family-based genetic association tests analysis and power. Power calculations via simulation methods. Also provides a frontend to the now abandoned PBAT program.
phylosamp: Tools for estimating sample sizes for phylogenetic studies, including studies focused on estimating the probability of true pathogen transmission between two cases given phylogenetic linkage and studies focused on tracking pathogen variants at a population level. Methods described in Wohl, Giles, and Lessler (2021) and in Wohl, Lee, DiPrete, and Lessler (2023).
poweRbal: Phylogenetic tree models and the power of tree shape statistics. Comparison of tree shape statistics by estimating their power to differentiate between different tree models. Kersting et al. (2024) <doi:10.48550/arXiv.2406.05185>
powerEQTL: Power and sample size calculation for bulk tissue and single-cell eQTL analysis based on ANOVA, simple linear regression or linear mixed effects model. Can also calculate power/sample size for testing the association of a SNP to a continuous type phenotype. Dong X et al. (2021) <doi:10.1093/bioinformatics/btab385>.
PoweREST: Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. Shui et al. (2024) <doi:10.1101/2024.08.30.610564>
powerGWASinteraction: Power calculations for gene-environment and gene-gene interactions for case-control studies of candidate genes and genome-wide association studies (GWAS).
powerpkg: Estimate power of test for linkage using an affected sib pair design, as a function of the recurrence risk ratios. Examine how power of the transmission disequilibrium test (TDT) depends on disease allele frequency, marker allele frequency, strength of the linkage disequilibrium, and magnitude of the genetic effect.
SPCompute: Computation of sample size or power for GWAS studies with different types of covariate effects and different types of covariate-gene dependency structure. See Zhang (2022) <doi:10.48550/arXiv.2203.15641>.
ssize.fdr: Sample size calculations for microarray experiments based on desired power while controlling for false discovery rates.
ssizeRNA: Sample size calculation for RNA-seq experimental design while controlling false discovery rate. Based on Law et al. (2014) <doi:10.1186/gb-2014-15-2-r29> and the sample size calculation method proposed for microarray experiments by Liu and Hwang (2007) <doi:10.1093/bioinformatics/btl664>.

Group sequential and multistage designs

ASSISTant: Adaptive Subgroup Selection in Group Sequential Trials. Clinical trial design for subgroup selection in three-stage group sequential trial as described in Lai, Lavori and Liao (2014, <doi:10.1016/j.cct.2014.09.001>). Includes facilities for design, exploration and analysis of such trials.
BinGSD: Supports the computation of boundaries and conditional power for single-arm group sequential test with binary endpoint, via either asymptotic or exact test. Also provides functions to obtain boundary crossing probabilities given the design.
gsDesign: Derives group sequential clinical trial designs focusing on time-to-event, binary, and continuous outcomes. Largely based on methods described in Jennison and Turnbull, 2000, “Group Sequential Methods with Applications to Clinical Trials” ISBN: 0-8493-0316-8.
gsDesign2: Group Sequential Design with Non-Constant Effect. Enable fixed or group sequential design under non-proportional hazards. Offers piecewise constant enrollment, failure rates, and dropout rates for a stratified population. This package includes three methods for designs: average hazard ratio, weighted logrank tests in Yung and Liu (2019) <doi:10.1111/biom.13196>, and MaxCombo tests.
HCTDesign: Functions to design historical controlled trials with survival outcome by group sequential method. Based on Jianrong and Xiaoping (2016) <doi:10.1002/pst.1756> and Jianrong and Yime (2020) <doi:10.1080/10543406.2019.1684305>.
MAMS: Designing multi-arm multi-stage studies with (asymptotically) normal endpoints and known variance.
PhIIdesign: Sample size calculations for Phase II clinical trials. Functionalities for Fleming 1-stage, Sargent 1-stage, Simon 2-stage, Fleming 2-stage and Sargent 2-stage.
POSSA: Power simulation for sequential analysis and multiple hypotheses. Lukács (2022) <doi:10.21105/joss.04643>.
PwrGSD: Tools for evaluation of interim analysis plans for sequentially monitored trials on a survival endpoint; tools to construct efficacy and futility boundaries, for deriving power of a sequential design at a specified alternative, template for evaluating the performance of candidate plans. See Izmirlian, G. (2014) <doi:10.4310/SII.2014.v7.n1.a4>.
RCTdesign: Comprehensive package for evaluating, analyzing, and reporting group sequential and adaptive clinical trial designs. Extensive website and tutorials.
rpact: Comprehensive package for design and analysis of confirmatory adaptive clinical trials with continuous, binary, and survival endpoints according to the methods described in Wassmer and Brannath (2016). Includes classical group sequential as well as multi-stage adaptive hypotheses tests based on the combination testing principle. The companion website has a Shiny app and other resources.
seqDesign: Simulation and group sequential monitoring of randomized two-stage treatment efficacy trials with time-to-event endpoints. A modification of the preventive vaccine efficacy trial design of Gilbert, Grove et al. (2011, Statistical Communications in Infectious Diseases) is implemented, with application generally to individual-randomized clinical trials with multiple active treatment groups and a shared control group, and a study endpoint that is a time-to-event endpoint subject to right-censoring. The design divides the trial into two stages of time periods, where each treatment is first evaluated for efficacy in the first stage of follow-up, and, if and only if it shows significant treatment efficacy in stage one, it is evaluated for longer-term durability of efficacy in stage two. The code can be used for a single active treatment versus control design and for a single-stage design.

Meta-analysis

metapower: Statistical power for meta-analysis, including power for main effects (Jackson & Turner, 2017)<doi:10.1002/jrsm.1240>, test of homogeneity (Pigott, 2012)<doi:10.1007/978-1-4614-2278-5>, subgroup analysis, and categorical moderator analysis (Hedges & Pigott, 2004)<doi:10.1037/1082-989X.9.4.426>.
OssaNMA: Optimal sample size and allocation with a network meta-analysis.
POMADE: Functions to compute and plot power levels, minimum detectable effect sizes, and minimum required sample sizes for test of the overall average effect size in meta-analysis of dependent effect sizes.

Multilevel and longitudinal data

cosa: Implements bound constrained optimal sample size allocation framework described in Bulus & Dong (2021) <doi:10.1080/00220973.2019.1636197> for power analysis of multilevel regression discontinuity designs (MRDDs) and multilevel randomized trials with continuous outcomes. See Bulus (2021) <doi:10.1080/19345747.2021.1947425>.
fPASS: Computes power and sample size to test for the difference in the mean function between two groups under a repeatedly measured longitudinal or sparse functional design. Koner and Luo (2023) <https://arxiv.org/abs/2302.05612>.
JMdesign: Power calculations for joint modeling of longitudinal and survival data with k-th order trajectories when the variance-covariance matrix is unknown.
longpower: Power and sample size for linear models of longitudinal data. Supported models include mixed-effects models and models fit by generalized least squares and generalized estimating equations. Package is described in Iddi and Donohue (2022) <doi:10.32614/RJ-2022-022>.
LPower: Calculates power, sample size or detectable effect for longitudinal analysis (a repeated measures model with attrition). Requires the variance covariance matrix of the observations but can compute this matrix for several common random effects models. See Diggle, Liang and Zeger (1994, ISBN:9780198522843).
mlmpower: Power analysis and data simulation for multilevel models. A declarative language for specifying multilevel models, solving for population parameters based on specified variance-explained effect size measures, generating data, and conducting power analyses to determine sample size recommendations. Allows for any number of within-cluster effects, between-cluster effects, covariate effects at either level, and random coefficients. The models do not assume orthogonal effects, and predictors can correlate at either level and accommodate models with multiple interaction effects.
MultiRR: Calculates bias, precision, and power for multilevel random regressions. Random regressions are types of hierarchical models in which data are structured in groups and (regression) coefficients can vary by groups. Provides simulation and analytical tools (based on “lme4”) to study model performance for random regressions that vary at more than one level (multilevel random regressions), allowing researchers to determine optimal sampling designs.
odr: Optimal design and power for experimental studies investigating main, mediation and moderation effects. Calculate optimal sample size allocation under a budget constraint, and perform power analyses with and without accommodating cost structures of sampling. Designs cover single-level and multilevel experiments detecting main, mediation, and moderation effects. References include: Shen, Z., & Kelcey, B. (2020). Optimal sample allocation under unequal costs in cluster-randomized trials. <doi:10.3102/1076998620912418>. Shen, Z., & Kelcey, B. (2022b). Optimal sample allocation for three-level multisite cluster-randomized trials. <doi:10.1080/19345747.2021.1953200>. Shen, Z., & Kelcey, B. (2022a). Optimal sample allocation in multisite randomized trials. <doi:10.1080/00220973.2020.1830361>.
pamm: Power analysis for random effects in mixed models. Simulation functions to assess or explore the power of a dataset to estimate significant random effects (intercept or slope) in a mixed model. The functions are based on the “lme4” and “lmerTest” packages.
pass.lme: Power and sample size calculation for testing fixed effect coefficients in multilevel linear mixed effect models with one or more than one independent populations. Laird and Ware (1982) <doi:10.2307/2529876>.
rmass2: Calculation of sample size or power in a two-group repeated measures design, accounting for attrition and accommodating a variety of correlation structures for the repeated measures; details in: Hedeker, Gibbons and Waternaux (1999) <doi:10.3102/10769986024001070>.
simr: Calculate power for generalised linear mixed models, using simulation. Designed to work with models fit using the “lme4” package. Described in Green and MacLeod, 2016 <doi:10.1111/2041-210X.12504>.
ssrm.logmer: Sample size for a longitudinal study with a binary outcome. Kapur, et al. (2014) <doi:10.1002/sim.6203>.

Multiple testing

FDRsamplesize2: Functions to compute average power and sample size for studies that use the false discovery rate (FDR) as the measure of statistical significance. FDRsampsize is an older version of the package.
gMCP: Functions and a graphical user interface for graph-based multiple testing procedures with power and sample size functions.
gMCPLite: A “lightweight” version of gMCP.
graposas: Sample size optimization for clinical trials using graphical approaches for multiplicity adjustment in clinical trials with multiple endpoints.
PUMP: Power Under Multiplicity Project. Estimates power, minimum detectable effect size and sample size requirements for multilevel randomized experiments with multiple outcomes using multiple testing procedures. For a full package description, see <doi:10.18637/jss.v108.i06>.
pwrFDR: Computing average and TPX power under various Benjamini–Hochberg (BH) false discovery rate (FDR) type sequential procedures. All of these procedures involve control of some summary of the distribution of the FDP, e.g. the proportion of discoveries which are false in a given experiment. TPX power is the probability that the true positive proportion (TPP) exceeds a given value. This package introduces a procedure for controlling the FDX called the BH-FDX procedure. The theoretical results are described in Izmirlian, G (2020) <doi:10.1016/j.spl.2020.108713>”.
rPowerSampleSize: Sample size computations controlling the Type II generalized familywise error rate. The significance of mean difference tests in clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. Enables one to compute necessary sample sizes for single-step (Bonferroni) and step-wise procedures (Holm and Hochberg).
Superpower: Analytic and simulation-based power analysis for factorial designs. Includes options for controlling familywise error rate. See Lakens, D., & Caldwell, A. R. (2021). “Simulation-Based Power Analysis for Factorial Analysis of Variance Designs”. <doi:10.1177/2515245920951503>.

Precision (confidence interval width)

precisely: Precision (confidence interval width)-based sample size calculation. Focuses on epidemiological studies. Based on Rothman and Greenland (2018). Supports calculations for risk differences and ratios, rate differences and ratios, and odds ratios. A companion Shiny app is available.
presize: Provides functions for such precision-based (confidence interval-based) sample size calculations. Topics include: AUC, correlation, Cronbach’s alpha, intraclass correlation coefficient, kappa, limits of agreement, likelihood ratio tests, means, odds ratio, proportions, rates, rate ratio, risk ratio and sensitivity.

Prediction modeling

HDDesign: High dimensional classification studies. Determine the sample size to achieve the target probability of correct classification (PCC) for studies employing high-dimensional features. The package implements functions to 1) determine the asymptotic feasibility of the classification problem; 2) compute the upper bounds of the PCC for any linear classifier; 3) estimate the PCC of three design methods given design assumptions; 4) determine the sample size requirement to achieve the target PCC for three design methods.
planningML: Machine-learning based high-dimensional classification analysis with imbalanced data and correlated features; sample size formulas for performance metrics that are sensitive to class imbalance such as Area Under the receiver operating characteristic Curve (AUC) and Matthews correlation coefficient (MCC). Uses a two-step approach involving feature selection using the innovated High Criticism thresholding method (Hall and Jin (2010) <doi:10.1214/09-AOS764>), then determining the sample size by optimizing the two performance metrics.
pmsampsize: Sample size for development of multivariable prediction models using the criteria proposed by Riley et al. (2018) <doi:10.1002/sim.7992>. Supports continuous, binary or survival (time-to-event) outcomes.
pmvalsampsize: Sample size required for the external validation of an existing multivariable prediction model using the criteria proposed by Archer (2020) <doi:10.1002/sim.8766> and Riley (2021) <doi:10.1002/sim.9025>.
sampsizeval: Estimation of required sample size to validate a risk model for binary outcomes, based on Pavlou et al. (2021) <doi:10.1177/09622802211007522>. For precision-based sample size calculations, the user is required to enter the anticipated values of the C-statistic and outcome prevalence. The user also needs to specify the required precision (standard error) for the C-statistic, the calibration slope and the calibration in the large.

Psychometrics and SEM

irtpwr: Power analysis for item response theory (IRT) models. Zimmer et al. (2022) <doi:10.1007/s11336-022-09883-5>
powerNLSEM: Simulation-based power estimation for nonlinear and linear structural equation models (SEM), path analysis and regression analysis. Irmer et al. (2024a) <doi:10.31219/osf.io/pe5bj>, Irmer et al. (2024b) <doi:10.3758/s13428-024-02476-3>.
powRICLPM: Power analyses for the random intercept cross-lagged panel model (RI-CLPM) and the bivariate stable trait autoregressive trait state (STARTS) model. Strategy as proposed by Mulder (2023) <doi:10.1080/10705511.2022.2122467>
pwrRasch: Power simulation for testing the Rasch model based on a three-way analysis of variance design with mixed classification.
semPower: Power analyses for structural equation models (SEM).

Randomized trial focus

RCT: Assists in the process of designing and evaluating randomized control trials. Robust treatment assignment by strata/blocks, that handles misfits; power calculations of the minimum detectable treatment effect or minimum populations; balance tables of t-test of covariates; balance regression: (treatment ~ all x variables). Athey, Susan, and Guido W. Imbens (2017) <doi:10.48550/arXiv.1607.00698>.
RCT2: Methods for designing and analyzing two-stage randomized controlled trials using the methods developed by Imai et al.(2021) <doi:10.1080/01621459.2020.1775612> and (2022+) <doi:10.48550/arXiv.2011.07677>. Enables estimation of direct and spillover effects, conduct hypotheses tests, and conduct sample size calculation for two-stage randomized controlled trials.
rerandPower: Computes the power and sample size for completely randomized and rerandomized experiments with two groups.
SMARTbayesR: Permits determination of a set of optimal dynamic treatment regimes and sample size for a SMART design in the Bayesian setting with binary outcomes. Artman (2020) <doi:10.48550/arXiv.2008.02341>.
smartDesign: SMART trial design as described by He, McClish and Sabo (2021) <doi:10.1080/19466315.2021.1883472>.
SMARTp: Sample size calculation for SMART design to detect dynamic treatment regime effects based on change in clinical attachment level outcomes from a non-surgical chronic periodontitis treatments study. The clustered tooth (sub-unit) level outcomes are skewed, spatially-referenced, and non-randomly missing. Xu et al. (2019+) <doi:10.48550/arXiv.1902.09386>.
smartsizer: Tools for determining the necessary sample size in order to identify the optimal dynamic treatment regime in an arbitrary SMART design. Utilizes multiple comparisons with the best methodology to adjust for multiple comparisons. Artman (2018) <doi:10.1093/biostatistics/kxy064>

Regression-specific packages

BetaPASS: Power and sample size for beta regression.
InteractionPoweR: Power analysis for regression models that test the interaction of two or three independent variables on a single dependent variable. Includes options for correlated interacting variables and specifying variable reliability. Two-way interactions can include continuous, binary, or ordinal variables. Power analyses can be done either analytically or via simulation. See Baranger et al. (2023) <doi:10.1177/25152459231187531>.
nRegression: Simulation-based calculations of sample size for linear and logistic regression.

Reliability/agreement

ICC.Sample.Size: Functions to calculate sample size or power for studies where the intraclass correlation coefficient (ICC) is the primary outcome, such as a reliability study. Based on the probability of achieving a prespecified width or lower limit of a confidence interval. Based on Zou (2012) <https://onlinelibrary.wiley.com/doi/10.1002/sim.5466>.
intrinsicKappa: Sample size planning based on intrinsic kappa value.
kappaSize: Sample size estimation in studies of interobserver/interrater agreement (reliability). Functions for both the power-based and confidence interval-based methods, with binary or multinomial outcomes and two through six raters.

Survey sampling

minsample1: Determine the minimum sample size required to attain the pre-fixed precision level by minimizing the difference between the sample mean and population mean.
minsample2: Determine the minimum sample size required so that the mean square error of the sample mean and the population mean of a distribution becomes less than some pre-determined epsilon.
PracTools: Functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, “Practical Tools for Designing and Weighting Survey Samples”. Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs.
RDSsamplesize: Sample size estimation and power calculation in respondent-driven sampling.
samplingbook: Survey sampling procedures from the book ‘Stichproben- Methoden und praktische Umsetzung mit R’ by Goeran Kauermann and Helmut Kuechenhoff (2010). Includes several sample size calculation functions for population surveys.
samplesize4surveys: Required sample size for estimation of totals, means and proportions under complex sampling designs.

Survival (time to event) endpoints

CP: Functions for calculating the conditional power for different models in survival time analysis within randomized clinical trials with two different treatments to be compared and survival as an endpoint.
DelayedEffect.Design: Sample size and power calculation using the piecewise weighted log-rank test to incorporate a delayed effect into the study design. The methods are described in Xu, Zhen, Park & Zhu. (2017) <doi:10.1002/sim.7157>.
lrstat: Power and sample size calculation for non-proportional hazards model using Fleming-Harrington family of weighted log-rank tests. The sequentially calculated log-rank test score statistics are assumed to have independent increments as in Tsiatis (1982) <doi:10.1080/01621459.1982.10477898>. Mean and variance of log-rank test score statistics are calculated based on Lu (2021) <doi:10.1002/pst.2069>. The boundary crossing probabilities are calculated using the recursive integration algorithm described in Jennison and Turnbull (2000, ISBN:0849303168). The package can also be used for continuous, binary, and count data. For binary data, it can design Simon’s 2-stage, modified toxicity probability-2 (mTPI-2), and Bayesian optimal interval (BOIN) trials. For count data, it can design group sequential trials for negative binomial endpoints with censoring. Also facilitates group sequential equivalence trials for all supported data types.
NPHMC: Sample size calculation for proportional hazards mixture cure model.
nphPower: Sample size calculation under nonproportional hazards. Performs combination tests and sample size calculation for fixed design with survival endpoints using combination tests under either proportional or non-proportional hazards. The combination tests include maximum weighted log-rank test and projection test. The sample size calculation also applies to various cure models. Trial simulation function is also provided to facilitate the empirical power calculation.
npsurvSS: Sample size and power calculation for common non-parametric tests in survival analysis. The companion R package to the paper by Yung and Liu (2020) <doi:10.1111/biom.13196>.
PDXpower: Time to event outcome in experimental designs of pre-clinical studies. Conduct simulation-based customized power calculation for clustered time to event data in a mixed crossed/nested design, where a number of cell lines and a number of mice within each cell line are considered to achieve a desired statistical power, motivated by Eckel-Passow et al. (2021) <doi:10.1093/neuonc/noab137> and Li et al. (2024) <doi:10.48550/arXiv.2404.08927>. Provides two commonly used models for powering a design, linear mixed effects and Cox frailty model.
powerCompRisk: Power analysis tool for jointly testing the cause-1 cause-specific hazard and the any-cause hazard with competing risks data.
PWEALL: Functions for design and monitoring survival trials accounting for complex situations such as delayed treatment effect, treatment crossover, non-uniform accrual, and different censoring distributions between groups. The event time distribution is assumed to be piecewise exponential distribution and the entry time is assumed to be piecewise uniform distribution.
SSRMST: Power and sample size based on the difference in restricted mean survival time.
survSNP: Power and sample size calculations for single-nucleotide polymorphism (SNP) association studies with right censored time to event outcomes.

Additional topics

BUCSS: Bias- and Uncertainty-Corrected Sample Size. Implements a method of correcting for publication bias and uncertainty when planning sample sizes in a future study from an original study. See Anderson, Kelley, & Maxwell (2017; Psychological Science, 28, 1547-1562).
CAISEr: Functions for performing experimental comparisons of algorithms using adequate sample sizes for power and accuracy. Implements the methodology presented in Campelo and Takahashi (2019) <doi:10.1007/s10732-018-9396-7> and Campelo and Wanner (Submitted, 2019) <doi:10.48550/arXiv.1908.01720>.
cbcTools: Design and evaluate choice-based conjoint survey experiments. Generate a variety of survey designs, including random, full factorial, orthogonal, D-optimal and Bayesian D-efficient designs as well as designs with “no choice” options and “labeled” designs. Conduct a power analysis for a given survey design.
ecopower: Estimates power by simulation for multivariate abundance data. Multivariate equivalence testing by simulation from a Gaussian copula model. Functions for parameterising multivariate effect sizes and simulating multivariate abundance data jointly. The discrete Gaussian copula approach is described in Popovic et al. (2018) <doi:10.1016/j.jmva.2017.12.002>.
Exact: Power calculations for 2×2 contingency tables.
exact2x2: Calculates conditional exact tests (Fisher’s exact test, Blaker’s exact test, or exact McNemar’s test) and unconditional exact tests and provides power and sample size calculations.
GenBinomApp: Clopper-Pearson Confidence Interval and Generalized Binomial Distribution. Includes a function to compute required sample size with respect to the extended upper Clopper-Pearson limit for a failure model, where countermeasures are introduced.
gridsampler: Simulation tool to facilitate determination of required sample size to achieve category saturation for studies using multiple repertory grids in conjunction with content analysis.
HMisc: Contains a wide variety of functions, mostly for data analysis. Power and sample size functions for two-sample binomial test, ordinal responses, and some time-to-event study designs.
MESS: Miscellaneous functions including a few that compute power. The power functions include binomial exact test, McNemar test, t tests, and two-sample proportions tests.
MIDN: Nearly exact sample size calculation for exact powerful nonrandomized tests for differences between binomial proportions.
MKpower: Power and sample size calculation for Welch and Hsu t-tests, Wilcoxon rank sum and signed rank tests via Monte-Carlo simulations, evaluation of a diagnostic test as well as for a single proportion, comparing two negative binomial rates, ANCOVA, reference ranges, multiple primary endpoints and AUC.
mlpwr: Power analysis tools for finding cost-efficient study designs. Implements a surrogate modeling algorithm to guide simulation-based sample size planning. Method described in Zimmer & Debelak (2023) <doi:10.1037/met0000611>. Supports multiple study design parameters and optimization with respect to a cost function. Can find optimal designs that correspond to a desired statistical power or that fulfill a cost constraint. Tutorial paper: Zimmer et al. (2023) <doi:10.3758/s13428-023-02269-0>.
mpower: Framework for power analysis using Monte Carlo simulation for settings in which considerations of the correlations between predictors are important. Users can set up a data generative model that preserves dependence structures among predictors given existing data (continuous, binary, or ordinal). Package includes several statistical models common in environmental mixtures studies. See Nguyen et al. (2022) <doi:10.48550/arXiv.2209.08036>.
MRMCsamplesize: Sample size for planning multi-reader multi-case (MRMC) studies. Calculate sample sizes where the endpoint of interest in the study is ROC AUC (Area-Under-the-Receiver-Operating-Characteristics-Curve) or sensitivity. Sample sizes for studies expected to have clustering effect. Can also estimate sample sizes for standalone studies where sensitivity or AUC are the primary endpoints. Based on methods described in Zhou et.al. (2011) <doi:10.1002/9780470906514> and Obuchowski (2000) <doi:10.1097/EDE.0b013e3181a663cc>.
MRTSampleSize: Sample size for micro-randomized trials in mHealth. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Liao et al. (2016) <doi:10.1002/sim.6847>
MRTSampleSizeBinary: Sample size calculator for micro-randomized trials with binary outcomes based on Cohn et al. (2023) <doi:10.1002/sim.9748>.
msamp: Estimates sample size needed to detect microbial contamination in a lot with a user-specified detection probability and sensitivity. Various patterns of microbial contamination are accounted for: homogeneous (Poisson), heterogeneous (Poisson-Gamma) or localized(Zero-inflated Poisson). Jongenburger et al. (2010) <doi:10.1016/j.foodcont.2012.02.004>. Simon (1963) <doi:10.1017/S0515036100001975> “The Negative Binomial and Poisson Distributions Compared”.
neuroUp: Plan sample size for task functional MRI using Bayesian updating. Method described in Klapwijk, Jongerling, Hoijtink and Crone (2024) <doi:10.31234/osf.io/cz32t>.
PASSED: Power and sample size calculations for test of two-sample means or ratios with data following beta, gamma (Chang et al. (2011), <doi:10.1007/s00180-010-0209-1>), normal, Poisson (Gu et al. (2008), <doi:10.1002/bimj.200710403>), binomial, geometric, and negative binomial (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>) distributions.
ph2mult: Phase II clinical trial design for multinomial endpoints. Provide multinomial design methods under intersection-union test (IUT) and union-intersection test (UIT) scheme for Phase II trial. Design types include : minimax (minimize the maximum sample size), optimal (minimize the expected sample size), admissible (minimize the Bayesian risk) and maxpower (maximize the exact power level).
powerly: Sample size for psychological networks and more. Implements the sample size computation method for network models proposed by Constantin et al. (2021) <doi:10.31234/osf.io/j5v7u>.
powerMediation: Power and sample size for testing mediation effects; slope in a simple linear regression; odds ratio in a simple logistic regression; mean change for longitudinal study with 2 time points; interaction effect in 2-way ANOVA; and slope in a simple Poisson regression.
powerPLS: Power and sample size for partial least squares-based methods described in Andreella, et al., (2024), <doi:10.48550/arXiv.2403.10289>.
pwr: Power analysis functions based on Cohen (1988).
qcapower: Power for qualitative comparative analysis (QCA).
rdpower: Tools to perform power, sample size and MDE calculations in regression discontinuity designs. Cattaneo, Titiunik and Vazquez-Bare (2019) <https://rdpackages.github.io/references/Cattaneo-Titiunik-VazquezBare_2019_Stata.pdf>
samplesize: Sample size for Student’s t-test and for the Wilcoxon-Mann-Whitney test for categorical data.
samplesizeCMH: Power and sample size for Cochran-Mantel-Haenszel tests. Also several helper functions for working with probability, odds, relative risk, and odds ratio values.
SampleSizeDiagnostics: Calculates sample size needed for evaluating a diagnostic test based on sensitivity, specificity, prevalence and desired precision. Based on Buderer (1996) <doi:10.1111/j.1553-2712.1996.tb03538.x>.
samplesizeestimator: Calculates sample size for various scenarios, such as estimating or testing population proportion, population mean, unpaired or independent means, two paired means, case control studies, estimating or testing odds ratio or relative risk, correlation coefficient.
skewsamp: Estimate sample sizes for comparing the location of data from two groups or categories when the distribution of the data is skewed. Offers a non-parametric method for a Wilcoxon Mann-Whitney test of location shift as well as methods for several generalized linear models, for instance, gamma regression.
skpr: Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/…/N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Includes a Shiny graphical user interface. Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
sparrpowR: Power to detect clusters using kernel-based spatial relative risk functions that are estimated using the “sparr” package.
ssanv: Sample Size Adjusted for Nonadherence or Variability of Input Parameters. Functions to calculate sample size for two-sample difference in means tests. Does adjustments for either nonadherence or variability that comes from using data to estimate parameters.
sse: Functions to facilitate sensitivity analyses, providing plots for a range of parameters.
ssev: Sample Size Computation for Fixed N with Optimal Reward. Computes the optimal sample size for various 2-group designs when the aim is to maximize the rewards over the full decision procedure of running a trial (with the computed sample size), and subsequently administering the winning treatment to the remaining N-n units in the population.
statmod: Has a power function for a Fisher exact test. Includes a wide assortment of other functions not related to power/sample size.
ThermalSampleR: Simulations to aid in determining appropriate sample sizes when performing critical thermal limits studies (e.g. CTmin/CTmin experiments).
ThreeArmedTrials: Available from the CRAN archive.
wmwpow: Power for the two-sample Wilcoxon-Mann-Whitney rank-sum test for a continuous outcome.

Did we omit a package?