key: cord-0994262-0474ggrt
authors: Savaris, R. F.; Pumi, G.; Dalzochio, J.; Kunst, R.
title: Stay-at-home policy is a case of exception fallacy: an internet-based ecological study
date: 2021-03-05
journal: Sci Rep
DOI: 10.1038/s41598-021-84092-1
sha: 4125d57d6ec0d4286b92856c59a35c603fafc79b
doc_id: 994262
cord_uid: 0474ggrt

A recent mathematical model has suggested that staying at home did not play a dominant role in reducing COVID-19 transmission. The second wave of cases in Europe, in regions that were considered as COVID-19 controlled, may raise some concerns. Our objective was to assess the association between staying at home (%) and the reduction/increase in the number of deaths due to COVID-19 in several regions in the world. In this ecological study, data from www.google.com/covid19/mobility/, ourworldindata.org and covid.saude.gov.br were combined. Countries with > 100 deaths and with a Healthcare Access and Quality Index of ≥ 67 were included. Data were preprocessed and analyzed using the difference between number of deaths/million between 2 regions and the difference between the percentage of staying at home. The analysis was performed using linear regression with special attention to residual analysis. After preprocessing the data, 87 regions around the world were included, yielding 3741 pairwise comparisons for linear regression analysis. Only 63 (1.6%) comparisons were significant. With our results, we were not able to explain if COVID-19 mortality is reduced by staying at home in ~ 98% of the comparisons after epidemiological weeks 9 to 34.

suppl). After performing the residual analysis, by testing for cointegration between response and covariate, normality of the residuals, presence of residual autocorrelation, homoscedasticity, and functional specification, only 63 (1.6%) of models passed all tests (Table S2 -suppl). Closer inspection of several cases where the model did not pass all the tests revealed a common factor: the presence of outliers, mostly due to differences in the epidemiological week in which deaths started to be reported. A heat map showing the comparison between the 87 regions is presented in Fig. 2.

We were not able to explain the variation of deaths/million in different regions in the world by social isolation, herein analyzed as differences in staying at home, compared to baseline. In the restrictive and global comparisons, only 3% and 1.6% of the comparisons were significantly different, respectively. These findings are in accordance with those found by Klein et al. 46 These authors explain why lockdown was the least probable cause for Sweden's high death rate from COVID-19 46 www.nature.com/scientificreports/ week onwards in 2020) we included regions and countries with a "plateau" and a downslope phase in their epidemiological curves. Our findings are in accordance with the dataset of daily confirmed COVID-19 deaths/ million in the UK. Pubs, restaurants, and barbershops were open in Ireland on June 29th and masks were not mandatory 48 ; after more than 2 months, no spike was observed; indeed, death rates kept falling 49 . Peru has been considered to be the most strict lockdown country in the world 30 , nevertheless, by September 20th, it had the highest number of deaths/million 50 . Of note, differences were also observed between regions that were considered to be COVID-19 controlled, e.g., Sweden versus Macedonia. Possible explanations for these significant differences may be related to the magnitude of deaths in these countries. After October 2020, when our study was published in a preprint server for Health Sciences, new articles were published with similar results 51-54 .

Our results are different from those published by Flaxman et al. The authors applied a very complex calculation that NPIs would prevent 3.1 million deaths across 11 European countries 44 . The discrepant results can be explained by different approaches to the data. While Flaxman et al. assumed a constant reproduction number (R t ) to calculate the total number of deaths, which eventually did not occur, we calculated the difference between the actual number of deaths between 2 countries/regions. The projections published by Flaxman et al. 44 have been disputed by other authors. Kuhbandner and Homburg described the circular logic that this study involved. Flaxman et al. estimated the R t from daily deaths associated with SARS-CoV-2 using an a priori restriction that R t may only change on those dates when interventions become effective. However, in the case of a finite population, the effective reproduction number falls automatically and necessarily over time since the number of infections would otherwise diverge 55 . A recent preprint report from Chin et al. 56 explored the two models proposed by the Imperial College 44 by expanding the scope to 14 European countries from the 11 countries studied in the Table 2 . Comparisons using the 4-point criteria. Comparability was considered if at least 3 out of 4 of the following conditions were similar: a) population density, b) percentage of the urban population, c) Human Development Index and d) total area of the region. Similarity was considered adequate when a variation in conditions a), b) and c) was within 30%, while, for condition d), a variation of 50% was considered adequate (Further details are in Auxiliary Supplementary Material-4 point criteria). *Linear regression. www.nature.com/scientificreports/ original paper. They added a third model that considered banning public events as the only covariate. The authors concluded that the claimed benefits of lockdown appear grossly exaggerated since inferences drawn from effects of NPIs are non-robust and highly sensitive to model specification 56 . The same explanation for the discrepancy can be applied to other publications where mathematical models were created to predict outcomes [14] [15] [16] [17] [18] . Most of these studies dealt with COVID-19 cases 33, 34 and not observed deaths. Despite its limitations, reported deaths are likely to be more reliable than new case data. Further explanations for different results in the literature, besides methodological aspects, could be justified by the complexity of the virus dynamic, by its interaction with the environment, or they may be related to a seasonal pattern that was, by coincidence, established at the same time when infection rates started to decrease due to seasonal dynamics 57 . It is unwise to try to explain a complex and multifactorial condition, with the inherent constant changes, using a single variable. An initial approach would employ a linear regression to verify the influence of one factor over an outcome. Herein we were not able to identify this association. Our study was not designed to explain why the stay-at-home measures do not contain the spread of the virus SARS-CoV-2. However, possible explanations that need further analysis may involve genetic factors 58 , the increment of viral load, and transmission in households and in close quarters where ventilation is reduced.

This study has a few limitations. Different from the established paradigm of randomized clinical trial, this is an ecological study. An ecological study observes findings at the population level and generates hypotheses 59 . Population-level studies play an essential part in defining the most important public health problems to be tackled 59 , which is the case here. Another limitation was the use of Google Community Mobility Reports as a surrogate marker for staying at home. This may underestimate the real value: for instance, if a user´s cell phone is switched off while at home, the observation will be absent from the database. Furthermore, the sample does not represent 100% of the population. This tool, nevertheless, has been used by other authors to demonstrate the efficacy in reducing the number of new cases after NPI 60, 61 . Using different methodologies for measuring mobility may introduce bias and would prevent comparisons between different countries. The number of deaths may be another issue. Death figures may be underestimated, however, reported deaths may be more relevant than new case data. The arbitrary criteria used for including countries and regions, the restrictive comparisons, and our definition of an area as COVID-19 controlled are open for criticism. Nonetheless, these arbitrary criteria were created a priori to the selection of the countries. With these criteria, we expected to obtain representative regions of the world, compare similar regions, and obtain accurate data. By using a HAQI of ≥ 67, we assumed www.nature.com/scientificreports/ that data from these countries would be accurate, reliable, and health conditions were generally good. Nevertheless, the global analysis of the regions ( n = 3741 comparisons) overcame any issue of the restrictive comparison. Indeed, the global comparison confirmed the results found in the restrictive one; only 1.6% of the death rates could be explained by staying at home. Also, our effective sample size in all studies is only 25 epidemiological weeks, which is a very small sample size for a time series regression. The small sample size and the non-stationary nature of COVID-19 data are challenges for statistical models, but our analysis, with 25 epidemiological weeks, is relatively larger than previous publications which used only 7 weeks 62 . A short interval of observation between the introduction of an NPI and the observed effect on death rates yields no sound conclusion, and is a case where the follow-up period is not long enough to capture the outcome, as seen in previous publications 44, 45 . The effects of small samples in this case are related to possible large type II errors and also affect the consistency of the ordinary least square estimates. Nevertheless, given the importance of social isolation promoted by world authorities 63 , we expected a higher incidence of significant comparisons, even though it could be an ecological fallacy. The low number of significant associations between regions for mortality rate and the percentage of staying at home may be a case of exception fallacy, which is a generalization of individual characteristics applied at the group-level characteristics 64 .

There are strengths to highlight. Inclusion criteria and the Healthcare Access and Quality Index were incorporated. We obtained representative regions throughout the world, including major cities from 4 different continents. Special attention was given to compiling and analyzing the dataset. We also devised a tailored approach to deal with challenges presented by the data. To our knowledge, our modeling approach is unique in pooling information from multiple countries all at once using up-to-date data. Some criteria, such as population density, percentage of urban population, HDI, and HAQI, were established to compare similar regions. Finally, we gave special attention to the residual analysis in the linear regression, an absolutely essential aspect of studies using small samples.

In conclusion, using this methodology and current data, in ~ 98% of the comparisons using 87 different regions of the world we found no evidence that the number of deaths/million is reduced by staying at home. Regional differences in treatment methods and the natural course of the virus may also be major factors in this pandemic, and further studies are necessary to better understand it.

Rationale and approach for analyzing the time series data. The proposed approach was tailored to present a way to evaluate the influence of time spent at home and the number of deaths between two countries/ regions while avoiding common problems of other models presented in the literature. We focused on detecting the variation of the differences between the number of deaths and how much people followed stay-at-home orders in two regions in each epidemiological week.

For instance, let us consider two similar regions we shall call 'Stay In county' and 'Go Out county' . Both regions started with the same number of cases. After the first 1000 cases were recorded, Stay In county declared that all people should stay at home, while Go Out county allowed people to circulate freely. After a few epidemiological weeks, we examine the data collected on the number of deaths in both counties and how much time people stayed at home by using geolocation software. If the difference between the number of deaths in Stay In county and Go Out county (variable A) is affected by the difference of the percentage of time people stayed at home in these two areas (variable B), then we can consider that the difference in the number of deaths by COVID-19 is influenced by the difference in the percentage of time people stayed at home. Both effects can be detected using linear regression and careful examination of the problem.

Time series on COVID-19 mortality (deaths/millions) display a non-stationary pattern. The daily data present a very distinct seasonal behavior on the weekends, with valleys on Saturdays and Sundays followed by peaks on Mondays ( Figure S1 ). To account for seasonality, one may introduce dummy variables for Saturdays, Sundays, and Mondays, regress the number of deaths in these dummy variables, and then analyze the residuals. However, in most cases, the residuals are still non-stationary, and special treatment would be required in each case. Although this approach may be feasible for a few series, we are interested in analyzing hundreds of time series from different countries and regions. Hence, we need a more efficient way to deal with this amount of data. The covariates present another issue in regressing the daily time series of deaths/staying at home. The covariates are typically correlated with error terms due to public policies adopted by regions/countries. Mechanisms controlling social isolation are intrinsically related to the number of deaths/cases in each location. An increase in the death rate may cause more stringent policies to be adopted, which increases the percentage of people staying at home. This change causes an imbalance between the observed number of deaths and staying at home levels. In a regression model, this discrepancy is accounted for in the error term. Hence, the error term will change in accordance with staying at home levels.

Data aggregation by epidemiological week is a plausible alternative ( Figure S2 ). In this way, artificial seasonality, imposed by work scheduled during weekends and the effect of governmental control over social interaction, in a regression framework, are mitigated. The drawback is that the sample size is significantly reduced from 187 days ( Figure S1 ) to 26 epidemiological weeks ( Figure S2 ).

Aggregation by epidemiological week, however, still yields non-stationary time series in most cases. To overcome this problem, we differentiated each time series. Recall that if Z t denotes the number of deaths in the t-th epidemiological week, we define the first difference of Z t as Intuitively, Z t denotes the variation of deaths between weeks t and t-1, also known as the flux of deaths. The same is valid for the staying at home time series. This simple operation yielded, in most cases, stationary www.nature.com/scientificreports/ time series, verified with the so-called Phillips-Perron stationarity test 65 . In the few cases where the resulting time series did not reject the null hypothesis of non-stationarity (technically, the existence of a unitary root, in the time series characteristic polynomial), this was due to the presence of one or two outliers combined with the small sample size. These outliers were usually related to the very low incidence of COVID-19 deaths by the 9th epidemiological week when paired with countries with a significant number of deaths in that same week, thus resulting in an outlier which cannot be accounted for by linear regression.

To investigate pairwise behavior, we propose a method to assess the relationship between deaths and staying at home data between various countries and regions. For two countries/regions, say A and B, let Y A t and Y B t denote the number of deaths per million at epidemiological week t for country A and B respectively, while X A t and X B t denote the staying at home at epidemiological week t for A and B, respectively. The idea is to regress the difference

Formally, we perform the regression where β 0 and β 1 are unknown coefficients and ε t denotes an error term. Estimation of β 0 and β 1 is carried out through ordinary least squares. The interpretation of the model is important. We are regressing the difference in the variation of deaths between locations A and B into the difference in the variation of staying at home values between the same location. If the number of deaths in locations A and B have a similar functional behavior over time, then Y A t − Y B t tends to be near-constant, and Y A t − Y B t tends to oscillate around zero. If the same applies to X A t − X B t , then we expect β 1 = 0 ; consequently, we conclude that the behavior, between A and B, is similar and the number of deaths and the percentage of staying at home are associated in these regions. The other non-spurious situation implying β 1 = 0 occurs when the variation in the number of deaths in locations A and B increases/decreases over time following a certain pattern, while the variation in the percentage of "staying at home" values also increases/ decreases following the same pattern (apart from the direction). In this situation, we found different epidemiological patterns as in the variation in the number of deaths, and in the staying at home values, in locations A and B were on opposite trends. However, if these patterns were similar (proportional), this would be captured in the difference and, as a consequence, in the regression. This means that the different trends were near proportional and, hence, the variation in staying at home is associated with the variation in deaths.

In the section below "Definition of areas with and without controlled cases of COVID-19", each country/ region was classified into a binary class: either controlled or not controlled areas for COVID-19. The proposed method allows for insights regarding the association of the number of deaths and staying at home levels between countries/regions with similar/different degrees of COVID-19 control. Assumptions related to consistency, efficiency, and asymptotic normality of the ordinary least squares, in the context of time series regression, can be found in 66 . Since we are comparing many time series, to avoid any problem with spurious regression, we performed a cointegration test between the response and covariates. In this context, this is equivalent to testing the stationarity of ε t , which was done by performing the Phillips-Perron test. Residual analysis is of utmost importance in linear regression, especially in the context of small samples. The steps and tests performed in the residual analysis are described in the statistical analysis section. Study design. This is an ecological study using data available on the Internet.

Setting-data collection on mobility. Community Mobility Reports 31 provided data on mobility from 138 countries 67, 68 and regions between February 15th and August 21st, 2020. Data regarding the average times spent at home was generated in comparison to the baseline. Baseline was considered to be the median value from between January 3rd and February 6th, 2020. Data obtained between February 15th and August 21th 2020 was divided into epidemiological weeks (epi-weeks) and the mean percentage of time spent staying at home per week was obtained. Inclusion criteria for analysis. Only regions with mobility data and with more than 100 deaths, by August 26th, 2020, were included in this study. This criteria has been chosen since the majority of epidemiological studies start when 100 cases are reached 69, 70 . For data quality, only countries with Healthcare Access and Quality Index (HAQI) of ≥ 67 71 were included. The HAQI has been divided into 10 subgroups. The median class is 63.4-69.7. The average in this median class is 66.55 (rounding up to 67). By choosing a HAQI of ≥ 67, we assumed that data from these countries were reliable and healthcare was of high quality. For Brazilian regions, a HAQI was substituted for the Human Development Index (HDI), and those with < 0.549 (low) were excluded.

Three major cities with > 100 deaths and well-established results (Tokyo, Japan; Berlin, Germany, and New York, USA) were selected as controlled areas.

Dataset of COVID-19 cases and associated data to reduce bias. After inclusion of the countries/ regions, further data were obtained to reduce comparison bias, including population density (people/km 2 ), percentage of the urban population, HDI, and the total area of the region in square kilometers. All data were obtained from open databases 72-74 . www.nature.com/scientificreports/ Definition of areas with and without controlled cases of COVID-19. Regions were classified as controlled for cases of COVID-19 if they present at least 2 out of the 3 following conditions: a) type of transmission classified as "clusters of cases", b) a downward curve of newly reported deaths in the last 7 days, and c) a flat curve in the cumulative total number of deaths in the last 7 days (variation of 5%) according to the World Health Organization 75 . An example is shown in Figure S3 . Data from the cities (Tokyo, Berlin, New York, Fortaleza, Belo Horizonte, Manaus, Rio de Janeiro, São Paulo, and Porto Alegre) were obtained from official government sites [76] [77] [78] [79] . Tokyo, Berlin and New York were chosen for having controlled the COVID-19 dissemination, for representing 3 different continents, and for similarity to major Brazilian cities (Fortaleza, Belo Horizonte, Manaus, Rio de Janeiro, São Paulo, and Porto Alegre).

Merged database. Different databases from the sites mentioned above were merged using Microsoft Excel Power Query (Microsoft Office 2010 for Windows Version 14.0.7232.5000) and manually inspected for consistency.

Processing the data-cleaning. Data collected from multiple regions were processed using Python 3.7.3 in the Jupyter Notebook 80 environment through the use of the Python Data Analysis Library in Google Colab Research 81 . Details of preprocessing are described in Python script (Supplement). Briefly, after taking the sum of deaths/million per epi-week, and the average of the variable "staying at home" per epi-week, non-stationary patterns were mitigated by subtracting week t by week t-1 .

Time series data setup and variables. Details regarding the pre-processing and methodological details were presented on the Approach for analyzing the time series data section. Our variables were the difference in the variation of deaths between locations A and B (dependent variable-outcome), and the difference in the variation of staying at home values between the same location (independent variable).

Comparison between areas. Direct comparison, between regions with and without controlled COVID-19 cases, was considered in two scenarios: 1) Restrictive if, at least 3 out of 4 of the following conditions were similar: a) population density, b) percentage of the urban population, c) HDI and d) total area of the region. Similarity was considered adequate when a variation in conditions a), b), and c) was within 30%, while, for condition d), a variation of 50% was considered adequate. 2) Global: all regions and countries were compared to each other.

The restrictive comparison used parameters related to how close people may have made physical contact. The major route of transmission for COVID-19 is from person-to-person via respiratory droplets and direct personal and physical contact within a community setting 82,83 . Statistical analysis. After data preprocessing, the association between the number of deaths and staying at home was verified using a linear regression approach. Data were analyzed using the Python model statsmodels. api v0.12.0 (statsmodels.regression.linear_model.OLS; statsmodels.org), and double-checked using R version 3.6.1 84 . False Discovery Rate proposed by Benjamini-Hochberg (FDR-BH) was used for multiple testing 85 .

We checked the residuals for heteroskedasticity using White's test 86 ; for the presence of autocorrelation using the Lagrange Multiplier test 87 ; for normality using the Shapiro-Wilk's normality test 88 ; and for functional specification using the Ramsey's RESET test 89 . All tests were performed with a 5% significance level and the analysis was performed with R version 3.6.1 84 .

Data from 30 restrictive comparisons were manually inspected and checked a third time using Microsoft Excel (Microsoft). A heat map was designed using GraphPad Prism version 8.4.3 for Mac (GraphPad Software, San Diego, California, USA). Graphs plotting the number of deaths/million and staying at home over epidemiological weeks were obtained from Google Sheets 90 . 

COVID-19 Virus Pandemic -Worldometer

The war against the coronavirus disease (COVID-2019): keys to successfully defending Taiwan

Masks and thermometers: paramount measures to stop the rapid spread of SARS-CoV-2 in the United States

Policy decisions and use of information technology to fight COVID-19 Taiwan

The three steps needed to end the COVID-19 pandemic: bold public health leadership, rapid innovations, and courageous political will

WHO Director-General's opening remarks at the media briefing on COVID-19 -13

Coronavirus disease (COVID-19): Herd immunity, lockdowns and COVID-19

Covid-19|Un juez de Lleida avala ahora las medidas de confinamiento en Segrià

Governor Cuomo Signs the 'New York State on PAUSE' Executive Order. Governor Andrew M

Ministry of Housing, Communities & Local Government. Government advice on home moving during the coronavirus (COVID-19) outbreak

A Movement to Stop the COVID-19 Pandemic | #StayTheFuckHome

Modeling containing covid-19 infection. A conceptual model

Mathematical modelling to assess the impact of lockdown on COVID-19 transmission in India: model development and validation

Only strict quarantine measures can curb the coronavirus disease (COVID-19) outbreak in Italy

Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review

Modeling the trend of coronavirus disease 2019 and restoration of operational capability of metropolitan medical service in China: a machine learning and mathematical model-based analysis

Mobility restrictions for the control of epidemics: when do they work?

COVID-19 CovidSim microsimulation model

Epidemiological characteristics and forecast of COVID-19 outbreak in the Republic of Kazakhstan

Modeling future spread of infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil

Social distancing measures to control the COVID-19 pandemic: potential impacts and challenges in Brazil

Is the lockdown important to prevent the COVID-9 pandemic? Effects on psychology, environment and economyperspective

The country with the world's strictest lockdown is now the worst for excess deaths

Google COVID-19 Community Mobility Reports

Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study

County level analysis to determine If social distancing slowed the spread of COVID-19

Spatiotemporal characteristics of COVID-19 epidemic in the United States

Association of mobile phone location data indications of travel and stay-at-home mandates with COVID-19 infection rates in the US

Evolution and epidemic spread of SARS-CoV-2 in Brazil

Physical distancing interventions and incidence of coronavirus disease 2019: natural experiment in 149 countries

Interrupted time series regression for the evaluation of public health interventions: a tutorial

Stationary and non-stationary time series

Stay-at-home orders, African American population, poverty and statelevel Covid-19 infections: are there associations? Public and Global Health

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus

Non pharmaceutical interventions for optimal control of COVID-19

After less than 2 months, the simulations that drove the world to strict lockdown appear to be wrong, the same of the policies they generated

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

16 Possible factors for Sweden's High COVID death rate among the nordics

A country level analysis measuring the impact of government actions, country preparedness and socioeconomic factors on COVID-19 mortality and related health outcomes

Government confirms that it is safe to proceed to Phase 3 of the Roadmap for Reopening Business and Society

Daily confirmed COVID-19 deaths per million, rolling 7-day average

Coronavirus Update (Live): 31,036,957 Cases and 962,339 Deaths from COVID-19 Virus Pandemic -Worldometer

Covid-19 mortality: a matter of vulnerability among nations facing limited margins of adaptation

Association of country-wide coronavirus mortality with demographics, testing, lockdowns, and public wearing of masks

A phenomenological approach to assessing the effectiveness of COVID-19 related nonpharmaceutical interventions in Germany

Lockdown Effects on Sars-CoV-2 Transmission -The evidence from Northern

Commentary: estimating the effects of non-pharmaceutical interventions on COVID-19 in

Effects of non-pharmaceutical interventions on COVID-19: a tale of three models

Global seasonality of human coronaviruses: a systematic review

Insights to SARS-CoV-2 life cycle, pathophysiology, and rationalized treatments that target COVID-19 clinical complications

The ecological fallacy strikes back

No place like home: cross-national data analysis of the efficacy of social distancing during the COVID-19 pandemic

The effect of social distance measures on COVID-19 epidemics in Europe: an interrupted time series analysis

Impact of complete lockdown on total infection and death rates: a hierarchical cluster analysis

COVID-19 advice -Physical distancing

The A-Z of social research: a dictionary of key social science research concepts

Trends and random walks in macroeconomic time series

Epidemiological characteristics of the first 100 cases of coronavirus disease 2019 (COVID-19) in Hong Kong Special Administrative Region, China, a city with a stringent containment policy

& Taiwan COVID-19 outbreak investigation team. Epidemiology of the first 100 cases of COVID-19 in Taiwan and its implications on outbreak control

Healthcare access and quality index based on mortality from causes amenable to personal health care in 195 countries and territories, 1990-2015: a novel analysis from the Global Burden of Disease Study

COVID-19) Dashboard

htm#:~:text=With%20a%20pop ulati on%20den sity%20of

COVID-19:Data. nychealth/coronavirus-data

Planning-Population-Census

Building Machine Learning and Deep Learning Models on Google Cloud Platform

ns-for-infec tion-preve ntion -preca ution s#:~:text=Curre nt%20evi dence %20sug gests %20tha t%20tra nsmis sion

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges

The R Project for Statistical Computing. The R Foundation

On the adaptive control of the false discovery rate in multiple testing with independent statistics

A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity

The lagrange multiplier test for autocorrelation in the presence of linear restrictions

An analysis of variance test for normality (complete samples)

Tests for specification errors in classical linear least-squares regression analysis

conceived and formalized the statistical model applied, wrote the original draft and interpreted the data. R.K. participated in the conception of the study, implemented computer code in Python, validated results, provided computer resources, maintained research data for initial use, critically reviewed the initial draft

The Python and R scripts are available at https ://gist.githu b.com/rsava ris66 /eccfc 6caf4 c9578 d676c 134fa c74d3 fe. Auxiliary Supplementary Material data is available at this link. (https ://docs.googl e.com/sprea dshee ts/d/1itCP JLWCX ORYDT xBY0M 21VJf 7PEyS 4B0K0 0lOoN pqrA/edit?usp=shari ng).

We are grateful to Dr. Jair Ferreira, from the Epidemiology Department of the Universidade Federal do Rio Grande do Sul, for his critical feedback.

R.F.S. was responsible for the conception of the study, designed the methodology, tested code components in Python and R, verified reproducibility, made formal analysis, data collection, provided other analysis tools, was responsible for data curation, wrote the initial draft, interpreted the data, reviewed the manuscript, created the data presentation, oversight execution, coordinate execution of the project. G.P. conceived the project, designed

The authors declare no competing interests.

Supplementary Information The online version contains supplementary material available at https ://doi. org/10.1038/s4159 8-021-84092 -1.Correspondence and requests for materials should be addressed to R.F.S.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.