key: cord-0745098-qs9t9ado authors: Davies, Gareth; Mazess, Richard; Benskin, Linda L. title: Letter to the editor in response to the article: “Vitamin D concentrations and COVID-19 infection in UK Biobank” (Hastie et al) date: 2021-02-09 journal: Diabetes Metab Syndr DOI: 10.1016/j.dsx.2021.02.016 sha: 65e0e1d479158be9d71bfbe6658200d5b02d6b94 doc_id: 745098 cord_uid: qs9t9ado nan UK Biobank analyses concluded that COVID-19 risk and higher risk rates in ethnic minorities were not explained by vitamin D. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] Hastie, et al. dismissed previous critiques, responding that their analyses were "…as powerful as any to date". [8] However, the reported statistical significance and high precision are illusory; these papers used unreliable data and contained grave errors, mislabelled data, flawed models, low power and high bias. Only 449 Covid-19 test-positive cases were available, containing just 31 Black and 19 Asian individuals; plus 1,025 test negatives. [1] The Covid-19-negative (control) set was artificially inflated by adding all 347,124 untested individuals. [1] At that time, only those hospitalized (~8.2% of cases) were tested. [10, 11] Therefore, the "Covid-19-negative" control set likely contained nine times as many positives as the "test-positive" set, including prehospitalisations, some in care homes, and milder cases. [11] Moreover, because COVID-19 risk is zero in the absence of SARS-CoV-2 exposure, the vast majority of the control set data was meaningless noise. [12] This data inflation led to serious errors: overfit, over-adjusted, and unnecessarily adjusted models. [13, 14] Too many model variables in logistic regressions introduces bias, obscures effects and reduces precision. [15, 16] Estimation efficiency deteriorates with each added covariate and reduces statistical significance, which can lead to important associations being declared insignificant. [17] Controlling for more variables does not necessarily reduce confounding; in fact, adding variables amplifies bias faster than it reduces confounding. [17] Selection criteria based on a priori theoretical or biological relationships should have been used to judiciously construct models. [18] These mistakes were compounded by using data concerning vitamin D levels and confounder variables (including self-reported subjective indexes) measured 10-14 years ago. [1, 9, 10] The authors claimed vitamin D levels remain stable over time, appearing to confuse the correlation coefficient, R, with explained variance, R2. [1] Indeed, studies they reference demonstrate levels are not stable over many years, particularly among 25(OH)D-deficient individuals [1, 5, 19, 20] -nor are blood pressure, pulse, and body mass index. [19] Biobank data explain only approximately 16% of variance in 2020 vitamin D values. [19] Categorising continuous variables is inadvisable in regressions, even for precise measures; categorising unreliable data amplifies errors by up to ten times. [21] A much larger sample size could increase power, [22] but inestimably large and insurmountable bias issues would remain. [23, 24] The reported conclusions were unjustified. The data set was 1,474, not 348,598; misused statistical methods led to misleading results; and the UK Biobank data are too old to be appropriate for investigating this subject. Serious Statistical Flaws in Hastie, et al., Vitamin D concentrations and COVID-19 infection in UK Biobank Analysis v2.0. A more detailed critique is available. [25] Funding & Declarations: The authors received no funding for this work. The authors have no conflicts of interest to disclose. Vitamin D concentrations and COVID-19 infection in UK Biobank Vitamin D concentrations and COVID-19 infection in UK Biobank Letter in response to the article: Vitamin D concentrations and COVID-19 infection in UK biobank Response to 'Vitamin D concentrations and COVID-19 infection in UK Biobank A Basic Review of the Preliminary Evidence That COVID-19 Risk and Severity Is Increased in Vitamin D Deficiency Vitamin D status as a predictor of Covid-19 risk in Black, Asian and other ethnic minority groups in the UK Prognostic implications of vitamin D in patients with COVID-19 Reply to: Prognostic implications of vitamin D in patients with COVID-19 Vitamin D and COVID-19 infection and mortality in UK Biobank Greater risk of severe COVID-19 in non-White ethnicities is not explained by cardiometabolic, socioeconomic, or behavioural factors, or by 25(OH)-vitamin D status: study of 1,326 cases from the UK Biobank LANCET: Comprehensive COVID-19 Hospitalization and Death Rate Estimates. Today's Practitioner 2020 Statistical Foundations for Model-Based Adjustments Some Surprising Results about Covariate Adjustment in Logistic Regression Models Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies Invited Commentary: Understanding Bias Amplification Principles of confounder selection Tracking of Serum 25-Hydroxyvitamin D Levels During 14 Years in a Population-based Study and During 12 Months in an Intervention Study Intraindividual Variation in Plasma 25-Hydroxyvitamin D Measures 5 Years Apart among Postmenopausal Women A method to automate probabilistic sensitivity analyses of misclassified binary variables Implications of Measurement Error in Exposure for the Sample Sizes of Case-Control Studies Random measurement error and regression dilution bias Reflection on modern methods: five myths about measurement error in epidemiological research Serious Statistical Flaws in Biobank Analyses