key: cord-0812546-2qqaisix authors: Tarantola, Arnaud; Gautier, Laurent title: Biostatistics to better detect fishy findings date: 2020-06-29 journal: Lancet Infect Dis DOI: 10.1016/s1473-3099(20)30557-0 sha: 553f6cd64013a79b58d9f2b52b48949200eebda4 doc_id: 812546 cord_uid: 2qqaisix nan We commend Srinivas Mantha 1 for the much needed clarification of the differences between risks, ratios, and rates, and of the latter's underlying notion of time. There is, however, an additional and important difference. The main scientific basis for epidemiology is biostatistics, 2 which applies rigorous mathematical laws of probability and statistics to the fascinating but unpredictable diversity of living organisms. This is done by accepting some measure of uncertainty. If the sample in which we document data is large enough and representative of the population from which the sample is selected, then we can be confident-at a usually chosen 5% risk of being wrongthat the measure in the population is close to that found in the sample and situated within a range of values called the confidence interval (CI). The CI is a fundamental statistical tool for estimating values and comparing them between groups. Upper and lower bounds of the CI of a risk or ratio computed using a normal or a binomial distribution are equally distant from the estimated value. Unlike risks and ratios, however, rates are usually very small numbers: their numerator can vary but their denominator is usually much larger, especially when composed of a number of people exposed multiplied by a number of days, weeks, or months of exposure. 3 CIs for rates, especially for rates of repeatable events, are computed using a Poisson distribution and can be substantially skewed towards the upper bound. This skew has important consequences: when calculating incidence rates of COVID-19 endpoints to compare them between different populations or groups (especially repeatable events such as hospital admissions or repeat clusters over a time period), computing their CIs using a normal instead of a Poisson distribution would wrongly cut them short on the right. This might result in a statistically significant difference between groups' incidence rates when there would not be any under a Poisson distribution. This also has consequences when estimating the sample size needed to achieve desired power before comparing incidence rates between samples. 4 The emergence and rapid global expansion of COVID-19 within weeks and implementation of lockdowns worldwide have made epidemiology a household word. 5 We enthusiastically welcome increased awareness among Published Online June 29, 2020 https://doi.org/10.1016/ S1473-3099(20)30557-0 clinicians, researchers, and indeed the general public of the importance of epidemiology and biostatistics. As we progress from computing percentages in observational studies to comparing rates and CIs within or among groups, clinicians and researchers must be aware that-unlike risks or ratiosincidence rates follow a Poisson distribution. We declare no competing interests. On the convergence of epidemiology, biostatistics, and data science A 6-months descriptive study of dog bites in rural Cambodia Power computations for designing comparative Poisson trials Someone had a worse response to coronavirus than Boris