Preprint has been published in a journal as an article
DOI of the published article 10.7717/peerj.10314
Preprint / Version 1

Does One Effect Size Fit All? The Case Against Default Effect Sizes for Sport and Exercise Science

##article.authors##

  • Aaron Caldwell
  • Andrew Vigotsky

DOI:

https://doi.org/10.31236/osf.io/tfx95

Keywords:

applied statistics, exercise, kinesiology, physiology, sport, statistics

Abstract

Recent discussions in the sport and exercise science community have focused on the appropriate use and reporting of effect sizes. Sport and exercise scientists often analyze repeated-measures data, from which mean differences are reported. To aid the interpretation of these data, standardized mean differences (SMD) are commonly reported as description of effect size. In this manuscript, we hope to alleviate some confusion. First, we provide a philosophical framework for conceptualizing SMDs; that is, by dichotomizing them into two groups: magnitude-based and signal-to-noise based SMDs. Second, we describe the statistical properties of SMDs and their implications. Finally, we provide high-level recommendations for how sport and exercise scientists can thoughtfully report raw effect sizes, SMDs, or other effect sizes for their own studies. This conceptual framework provides sport and exercise scientists with the background necessary to make and justify their choice of an SMD. The code to reproduce all analyses and figures within the manuscript can be found at the following link: https://www.doi.org/10.17605/OSF.IO/FC5XW.

References

Albers C, Lakens D. 2018. When power analyses based on pilot data are biased: inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology 74:187-195

Amrhein V, Greenland S, McShane B. 2019. Scientists rise up against statistical significance. Berlin: Springer Nature.

Baguley T. 2009. Standardized or simple effect size: What should be reported? British Journal of Psychology 100(3):603-617

Becker BJ. 1988. Synthesizing standardized mean-change measures. British Journal of Mathematical and Statistical Psychology 41(2):257-278

Borg DN, Bon JJ, Sainani K, Baguley BJ, Tierney NJ, Drovandi C. 2020. Sharing data and code: a comment on the call for the adoption of more transparent research practices in sport and exercise science. SportRxiv.

Buchanan EM, Gillenwaters A, Scofield JE, Valentine K. 2019. MOTE: measure of the effect: package to assist in effect size calculations and their confidence intervals. R package version 1.0.2 software

Cohen J. 1977. Statistical power analysis for the behavioral sciences (2nd edition). New York: Academic Press.

Dankel SJ, Loenneke JP. 2018. Effect sizes for paired data should use the change score variability rather than the pre-test variability. Journal of Strength and Conditioning Research Epub ahead of print Oct 24 2018

Dankel SJ, Mouser JG, Mattocks KT, Counts BR, Jessee MB, Buckner SL, Loprinzi PD, Loenneke JP. 2017. The widespread misuse of effect sizes. Journal of Science and Medicine in Sport 20(5):446-450

Dunlap WP, Cortina JM, Vaslow JB, Burke MJ. 1996. Meta-analysis of experiments with matched groups or repeated measures designs. Psychological Methods 1(2):170-177

Efron B, Morris C. 1977. Stein’s paradox in statistics. Scientific American 236(5):119-127

Flanagan EP. 2013. The Effect size statistic—applications for the strength and conditioning coach. Strength and Conditioning Journal 35(5):37-40

Gibbons RD, Hedeker DR, Davis JM. 1993. Estimation of effect size from a series of experiments involving paired comparisons. Journal of Educational Statistics 18(3):271-279

Gigerenzer G. 2018. Statistical rituals: the replication delusion and how we got there. Advances in Methods and Practices in Psychological Science 1(2):198-218

Goulet-Pelletier J-C, Cousineau D. 2018. A review of effect sizes and their confidence intervals, Part I: the Cohen’s d family. The Quantitative Methods for Psychology 14(4):242-265

Greenland S. 2019. Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values. The American Statistician 73(sup1):106-114

Greenland S, Maclure M, Schlesselman M, Poole C, Morgenstrern H. 1986. Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology 2(5):387-392

Grissom RJ. 1994. Probability of the superior outcome of one treatment over another. Journal of Applied Psychology 79(2):314-316

Hanel PH, Mehler DM. 2019. Beyond reporting statistical significance: identifying informative effect sizes to improve scientific communication. Public Understanding of Science 28(4):468-485

Hedges LV. 1981. Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics 6(2):107-128

Hedges LV. 2008. What are effect sizes and why do we need them? Child Development Perspectives 2(3):167-171

Hedges LV, Olkin I. 1985. CHAPTER 5 - estimation of a single effect size: parametric and nonparametric methods. In: Hedges LV, Olkin I, eds. Statistical methods for meta-analysis. San Diego: Academic Press. 75-106

Hislop J, Adewuyi TE, Vale LD, Harrild K, Fraser C, Gurung T, Altman DG, Briggs AH, Fayers P, Ramsay CR+6 more. 2020. Methods for specifying the target difference in a randomised controlled trial: the difference elicitation in trials (DELTA) systematic review. PLOS Medicine 9:e53275

Hunink MM, Weinstein MC, Wittenberg E, Drummond MF, Pliskin JS, Wong JB, Glasziou PP. 2014. Decision making in health and medicine: integrating evidence and values. Cambridge: Cambridge University Press.

Hönekopp J, Becker BJ, Oswald FL. 2006. The meaning and suitability of various effect sizes for structured rater × ratee designs. Psychological Methods 11(1):72-86

Kelley K, Preacher KJ. 2012. On effect size. Psychological Methods 17(2):137-152

Lakens D. 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology 4:863

Lenth RV. 2001. Some practical guidelines for effective sample size determination. The American Statistician 55(3):187-193

Mansfield RJ. 1974. Measurement, invariance, and psychophysics. In: Sensation and measurement. Dordrecht: Springer Netherlands. 113-128

Maxwell SE, Kelley K, Rausch JR. 2008. Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology 59(1):537-563

McGraw KO, Wong SP. 1992. A common language effect size statistic. Psychological Bulletin 111(2):361-365

McShane BB, Böckenholt U. 2014. You cannot step into the same river twice. Perspectives on Psychological Science 9(6):612-625

Morris SB. 2000. Distribution of the standardized mean change effect size for meta-analysis on repeated measures. British Journal of Mathematical and Statistical Psychology 53(1):17-29

Morris SB. 2008. Estimating effect sizes from pretest-posttest-control group designs. Organizational Research Methods 11(2):364-386

Morris SB, DeShon RP. 2002. Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods 7(1):105-125

Quintana DS. 2016. Statistical considerations for reporting and planning heart rate variability case-control studies. Psychophysiology 54(3):344-349

Quintana DS. 2020. A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation. eLife

Rhea MR. 2004. Determining the magnitude of treatment effects in strength training research through the use of the effect size. The Journal of Strength and Conditioning Research 18(4):918-920

Riley RD, Kauser I, Bland M, Thijs L, Staessen JA, Wang J, Gueyffier F, Deeks JJ. 2013. Meta-analysis of randomised trials with a continuous outcome according to baseline imbalance and availability of individual participant data. Statistics in Medicine 32(16):2747-2766

Robinson DH, Whittaker TA, Williams NJ, Beretvas SN. 2003. It’s not effect sizes so much as comments about their magnitude that mislead readers. The Journal of Experimental Education 72(1):51-64

Rousselet GA, Wilcox RR. 2019. Reaction times and other skewed distributions: problems with the mean and the median. PsyArXiv

Sundberg R. 1994. Interpretation of unreplicated two-level factorial experiments, by examples. Chemometrics and Intelligent Laboratory Systems 24(1):1-17

Thomas JR, Salazar W, Landers DM. 1991. What is missing in p < .05? Effect size. Research Quarterly for Exercise and Sport 62(3):344-348

Tukey JW. 1969. Analyzing data: sanctification or detective work? American Psychologist 24(2):83-91

Vickers AJ, Elkin EB. 2006. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making 26(6):565-574

Viechtbauer W. 2007. Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics 32(1):39-60

Downloads

Posted

2021-10-29