key: cord-1018083-qobq1kyf authors: Caldirola, Daniela; Daccò, Silvia; Cuniberti, Francesco; Grassi, Massimiliano; Alciati, Alessandra; Torti, Tatiana; Perna, Giampaolo title: First-onset major depression during the COVID-19 pandemic: A predictive machine learning model date: 2022-04-27 journal: J Affect Disord DOI: 10.1016/j.jad.2022.04.145 sha: e8d8a274779c9d5c3e3c2a1adaab5cdda4f82f1a doc_id: 1018083 cord_uid: qobq1kyf BACKGROUND: This study longitudinally evaluated first-onset major depression rates during the pandemic in Italian adults without any current clinician-diagnosed psychiatric disorder and created a predictive machine learning model (MLM) to evaluate subsequent independent samples. METHODS: An online, self-reported survey was released during two pandemic periods (May to June and September to October 2020). Provisional diagnoses of major depressive disorder (PMDD) were determined using a diagnostic algorithm based on the DSM criteria of the Patient Health Questionnaire-9 to maximize specificity. Gradient-boosted decision trees and the SHapley Additive exPlanations technique created the MLM and estimated each variable's predictive contribution. RESULTS: There were 3532 participants in the study. The final sample included 633 participants in the first wave (FW) survey and 290 in the second (SW). First-onset PMDD was found in 7.4% of FW participants and 7.2% of the SW. The final MLM, trained on the FW, displayed a sensitivity of 76.5% and a specificity of 77.8% when tested on the SW. The main factors identified in the MLM were low resilience, being an undergraduate student, being stressed by pandemic-related conditions, and low satisfaction with usual sleep before the pandemic and support from relatives. Current smoking and taking medication for medical conditions also contributed, albeit to a lesser extent. LIMITATIONS: Small sample size; self-report assessment; data covering 2020 only. CONCLUSIONS: Rates of first-onset PMDD among Italians during the first phases of the pandemic were considerable. Our MLM displayed a good predictive performance, suggesting potential goals for depression-preventive interventions during public health crises. differ greatly across countries, likely due to differences in psychometric tools, phase of the pandemic, and cultural or social context. However, they were higher than pre-pandemic rates (Adu et al., 2021; Cénat et al., 2021; Ettman et al., 2020; Hao et al., 2020; Lakhan et al., 2020; Liu et al., 2020b Liu et al., , 2020a Luo et al., 2020; Salari et al., 2020; Solomou and Constantinidou, 2020; Wu et al., 2021) . A recent estimate of the global prevalence and burden of major depressive disorder (MDD) in 204 countries and territories found an increase of 27.6% in 2020 due to the COVID-19 pandemic (Santomauro et al., 2021) . Most surveys use self-report depressive symptom scales that estimate only probable cases of major depression, so it is difficult to distinguish a true major depressive episode from a transient physiological response to the unexpected global crisis. However, it is nevertheless conceivable that, during the pandemic, some vulnerable individuals may have developed depression that is worthy of clinical attention. It is well-known that exposure to environmental and psychosocial stressors or traumatic events is associated with various mental health consequences, including first-onset depression or worsening of pre-existing depression (Ettman et al., 2020; Gilman et al., 2013; Goldmann and Galea, 2014) . As MDD is a leading cause of disease burden and disability worldwide (GBD 2019 Mental Disorders, 2022 , it is medically necessary to identify the predictors that have placed a portion of the general population at higher risk of developing depression due to the COVID-19 pandemic. Modifiable risk factors may be a prime target for public depression prevention programs. Early interventions into mental health care may be suitable for certain at-risk groups (Meng et al., 2017) during both the ongoing pandemic and future public health emergencies. Several possible predictors of self-reported depression in the general population during the pandemic have been recognized, including sociodemographic factors, such as being female; employment status, such as job loss; pandemic-related stressful experiences; J o u r n a l P r e -p r o o f low social support; and personal characteristics such as having a lower level of coping mechanisms with stressors (Adu et al., 2021; Arpino and Pasqualini, 2021; Bruno et al., 2020; Liu et al., 2020b; Prout et al., 2020; Rossi et al., 2020; Solomou and Constantinidou, 2020; Vindegaard and Benros, 2020) . However, most results came from traditional statistical analyses, which were insufficient for identifying the most relevant predictors of depression in large sets of interrelated variables. A few studies predicted pandemic-related mental health via machine learning (ML) techniques (Eder et al., 2021; Flesia et al., 2020; Ge et al., 2020; Prout et al., 2020) , which are especially suitable to identify predictive models in extensive and complex data sets (Orrù et al., 2020; Perna et al., 2018; Wardenaar et al., 2021) . Only one of this small group of studies included assessment of depressive symptoms (Prout et al., 2020) . This study analyzed data in two waves of an online survey that we disseminated among the general population of Italy in two periods of the first year of the pandemic. This study longitudinally evaluated the rates of first-onset MDD in Italian adults without any current clinician-diagnosed psychiatric disorder (CPsyD) and created a predictive ML model of first-onset MDD using in subsequent independent samples of Italians. We were interested in including the general population not directly exposed to highly specific COVID-19-related risk factors for mental health, such as having contracted COVID-19 (Awan et al., 2021; De Berardis, 2020) or being health care workers (Awan et al., 2022; De Berardis et al., 2021) during the pandemic. For this reason, the survey was dedicated to people who were not health care workers, while people who contracted COVID-19 were excluded from this study. We employed a screening questionnaire based on the DSM diagnostic criteria to maximize the specificity of the identification of a major depressive episode and minimize the risk of classifying a physiological depressive response to an unexpected global crisis as J o u r n a l P r e -p r o o f pathological. However, as the depression screening was self-report, we considered the diagnosis of MDD to be provisional (PMDD). To the best of our knowledge, no other study with these purposes has been published. The detailed procedures used have been described elsewhere (Caldirola et al., 2022) . Briefly, we disseminated an online self-report survey among the general Italian population in two pandemic periods, from May 18 to June 20, 2020 (first wave survey) and from September 15 to October 20, 2020 (second wave survey). These two waves were part of an ongoing longitudinal study approved by the Ethics Committee of Humanitas Research Hospital. The study was monitoring the mental health of the Italian general population mental health up to 2 years from the beginning of the pandemic through successive online surveys that we distributed approximately every three months. The survey was conducted through the SurveyMonkey platform, an online survey provider (http://www.surveymonkey.com), and was advertised and shared via social media (Facebook, Instagram, LinkedIn, and WhatsApp) . People who were ≥18 years old and were not health care workers were invited to fill in the survey voluntarily and anonymously. Before the survey began, participants were required to provide written informed consent. At the beginning of each survey, the participants were asked to enter a few letters and numbers in response to identical standardized hints across the surveys (e.g., "please enter the first two letters of your mother's name") to create a unique anonymous identifier. We used this identifier to track respondents longitudinally during the study. Moreover, at the J o u r n a l P r e -p r o o f beginning of the second wave survey, the participants were asked whether they had previously participated in the first wave. Each participant's collected data were saved and managed under the European regulations for privacy and protected health information. All relevant information is available in the SurveyMonkey Privacy Notice (www.surveymonkey.com/mp/legal/privacy/). The study included 3532 participants who provided informed consent. In this study, we included in the analyses only those who met the following criteria: having completed the entire survey; having declared never having had a clinician-diagnosed mental disorder, or only have had a disorder in the past (in the latter case, we excluded those who declared having had major depression); having declared to never having contracted and, for participants in the second wave, not having participated in the first. Our final sample included 633 participants in the first wave and 290 participants in the second wave. Figure 1 presents the participant selection. A previous study with different aims and inclusion/exclusion criteria included a different subsample of the entire group of 3532 participants. Specifically, we included only participants with no psychiatric history; we estimated the rates of new-onset psychiatric disorders throughout the pandemic; and we developed an ML model predictive of at least one new-onset psychiatric disorder in subsequent independent samples (Caldirola et al., 2022). The survey included two sections. The first consisted of a series of ad hoc questions to collect participants' sociodemographic data and certain personal information, such as lifestyle, personal relationships, medical and psychiatric history, occupation, and usual disposition toward multiple aspects of daily life. The other section included several validated self-report screening questionnaires (an Italian-language version). Below we describe the two of these that we used to gather data for the aims of the present study, namely, the Depression Module of the Patient Health Questionnaire (PHQ-9) (Spitzer et al., 1999) and the six-question Brief Resilience Scale (BRS) (Pirro et al., 2020; Smith et al., 2008) . The complete list of the ad hoc questions and self-report questionnaires is available on request. The PHQ-9 is a screening tool to detect a current major depressive episode. This tool consists of nine questions that reference the last two weeks; the responses are given on a scale that ranges from "not at all" (scored as 0) to "nearly every day" (scored as 3). The DSM-IV criteria-based diagnostic algorithm identifies a current major depressive episode with a sensitivity and specificity (95% confidence interval [CI]) of 0.73 (0.59-0.87) and 0.98 (0.9-1.00), respectively (Spitzer et al., 1999) ; recent estimates from multiple studies place pooled sensitivity and specificity (95% CI) at 0.61 (0.54-0.68) and 0.95 (0.93-0.96), respectively (He et al., 2020) . Due to the self-report nature of the assessment, the PHQ-9-based diagnosis of MDD has to be considered as provisional (PMDD) and should be confirmed via direct clinical assessment (He et al., 2020; Spitzer et al., 1999) . The six-question BRS considers resilience to be "the ability to bounce back or recover from stress;" the responses are given on a scale that ranges from "strongly disagree" (scored as 1) to "strongly agree" (scored as 5). We compared the categorical and ordinal variables between the first and the second wave with the chi-square (χ2) or the Fisher's exact tests and the Mann-Whitney W test, respectively. In the χ2 test, we calculated the standardized adjusted residuals and correspondent p-values for each cell of the contingency tables to determine which cell J o u r n a l P r e -p r o o f differences contributed to the significance of the χ2 test results; and we applied Bonferroni's correction to the p-values of the adjusted standardised residuals. The statistical significance level was set at the conventional 0.05. We used the R programming language version 3.6.3 (R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2020) in the statistical analyses. ML methodology is used to develop algorithms through training examples that are capable of producing the best possible prediction when used in new cases with an undefined outcome. In this study, we included as potential predictive variables all personal information related to factors preceding the pandemic or representing its direct consequences, such as pandemicrelated individual experiences, namely the 46 variables reported in Tables 1 and 2 (see Results) and the Supplementary Material (Table S1 and S2). We chose gradient-boosted decision trees among the several available ML techniques (GBDT). In this approach, several decision trees are consecutively trained to reduce misclassification errors in the previous decision trees. The final prediction is based on a weighted sum of the predictions performed by all decision tree models, which can be in the hundreds (Friedman, 2001) . This ML technique requires several hyper-parameters to be defined during the development of the algorithm. Each configuration of the hyper-parameters can lead to a different predictive performance of the algorithm through differential tuning of the training process. To identify the optimal hyper-parameter configuration, we attempted 40 random hyper-parameter configurations and used a Bayesian optimization approach to supply 60 further configurations progressively. Because the aim is to identify a hyper-parameter configuration that produces an algorithm with the best possible performance when applied to new cases not used during training, a stratified 10-fold cross-validation was applied. The training sample is divided into folds of cases not used in training. Instead, the training is performed iteratively on the remaining cases. After training, the algorithm is then applied to the previously omitted cases. The hyper-parameter configuration that demonstrated the best average cross-validated area under the receiving operating curve (AUROC) was considered the best configuration and was chosen to be used for the final training algorithm. Possible AUROC values range from 0.5 when the algorithm makes effectively random predictions to 1 when it is correct in every prediction. Applying an a priori selection of the predictive variables to be included in an algorithm can be expected to improve its performance due to the inclusion of only relevant input variables and excluding irrelevant and redundant ones. To achieve this, we used the Minimum Redundancy, Maximum Relevance (MRMR) technique, which ranks all available predictive variables in order of importance by simultaneously considering the association with the output variable (maximum relevance) and the association with other predictive variables (minimum redundancy). The hyperparameter optimization and cross-validation procedure were performed 46 times. Each time was considered as predictors a subset of 1 to 46 variables (all variables), as indicated by the mRMR procedure, and defined the best subset of the initial 46 predictive variables based on average cross-validated AUROC. A more detailed description of the ML methodology is reported in the Supplementary Training and testing protocol J o u r n a l P r e -p r o o f We used observations from the first wave survey for training and cross-validation of the final algorithm. The observations from the second wave survey were used as an independent test set to investigate the predictive performance of the algorithm after its development was completed. The algorithm initially outputs a continuous prediction (range: 0-1; values closer to 1 show a higher predicted probability of having first-onset PMDD). A classification threshold is applied to obtain the final dichotomous prediction. Different threshold values result in different levels of sensitivity and specificity. This study chose a threshold value that minimized the difference between sensitivity and specificity of the cross-validated predictions in the training dataset. This value was then applied to obtain the final predictions in the test dataset. As with most ML techniques, the inherent complexity of the GBDT model does not allow us to create a direct interpretation of how the algorithm estimates the output beginning from the included features. Techniques to make ML algorithms more interpretable have been developed to overcome this limitation. In this study, we used the SHapley Additive exPlanations (SHAP) technique (Lundberg and Lee, 2017) . A SHAP value is assigned to each variable for each prediction created by the algorithm. The larger the absolute SHAP value for a certain variable, the larger its contribution to determining that prediction in a specific case. In the current study, a positive SHAP value contributes to an increased risk of first-onset PMDD, whereas a negative SHAP value indicates a contribution toward reduced risk. The absolute average of the SHAP values observed for all cases in a dataset can be used to identify each variable's overall importance for the algorithm. Plotting the value of a variable against the associated SHAP values can be used to visualize the relationship between the given variable and the risk of first-onset PMDD modeled by the algorithm. The SHAP approach was applied separately to the observations collected in the first wave survey (training dataset) and the second wave survey (test dataset). Descriptive statistics of the variables used as potential predictors are presented in Tables 1 and 2 and the Supplementary Material (Tables S1 and S2). The geographical distribution of participants (Table S8 ) and the distribution of past clinician-diagnosed psychiatric disorders (Table S3 and The criteria for first-onset PMDD were met by 47 participants (7.4 %) in the first wave and 21 (7.2 %) in the second, and no significant difference in distribution was seen between the waves. Among the 46 variable subsets (from size 1 to all variables) indicated by the mRMR procedure, the subset whose hyper-parameter optimization and cross-validation procedure in We tested the final model using data from the second wave (test dataset). The AUROC was 0.856 (95% bootstrap CI 0.77-0.93%). With the categorical predictions generated with the threshold identified above, our results indicated an average sensitivity of 76.5% (95% bootstrap CI 52.9-94.1%), an average specificity of 77.8% (95% bootstrap CI 72.4-82.85%), an average positive predictive value of 19.7% (95% bootstrap CI 14.3-25.8%), and an average negative predictive value of 97.9% (95% bootstrap CI 96.0-99.5%). J o u r n a l P r e -p r o o f The relative importance of the 10 variables included in the final model was analyzed using the SHAP technique with data from the first wave (the training dataset) and the second wave (the test dataset). The study data were obtained from the first (May 18 to June 20, 2020) and second (September 15 to October 20, 2020) waves of an online survey conducted among Italy's general population during the pandemic. We had two novel aims in our study. First, we estimated first-onset PMDD over the first eight months of the pandemic in Italians who had no clinician-diagnosed CPsyDs. Second, we produced an ML model to predict the emergence of first-onset PMDD in an independent sample of participants in the second wave after it was trained on data from the first wave. The main advantage of using the ML approach was its capability to identify the most relevant potential predictors for first-onset PMDD among a very large array of interrelated individual and psychosocial variables. We identified 7.4% of participants in the first wave and a further 7.2% in the second as meeting the criteria for first-onset PMDD. These rates were obtained by applying the DSM-IV criteria-based diagnostic algorithm to self-reported PHQ-9 scores to maximize the specificity of depression screening compared with the use of a score cutoff threshold of ≥10 (He et al., 2020; Spitzer et al., 1999) . Health Promotion (CNaPPS), 2019), which were lower than rates we identified during the J o u r n a l P r e -p r o o f pandemic. However, those studies did not use the PHQ-9 instrument. Finally, the lack of epidemiological data for Italy concerning the pre-pandemic incidence of MDD prevented us from providing direct pre-versus during-pandemic incidence comparisons. Considering the shortcomings of the self-report MDD screening relative to the pre-pandemic MDD rates obtained via clinical interviews (Girolamo et al., 2006) , our findings pointed to a possible contribution of the pandemic to raising the MDD rates among the Italians. In most other Italian studies conducted during the pandemic, the rates at which the general population scored above the clinical threshold for self-reported depression were higher than the rates we found, ranging from 20% to 40%, approximately (Bruno et al., 2020; Mazza et al., 2020; Mencacci and Salvi, 2021; Roma et al., 2020) . This discrepancy may have been due to the use of different psychometric tools or the application of the PHQ-9 cutoff score threshold, which produces higher sensitivity but lower specificity in detecting a major depressive episode than the PHQ-9 diagnostic algorithm that we used (He et al., 2020; Spitzer et al., 1999) . Finally, we were specifically interested in the first onset of primary MDD, we excluded participants with CPsyDs and past major depression, but the comparison studies were more inclusive. Therefore, the higher depression rates identified in previous Italian studies may have included depressive conditions before the pandemic, were secondary to other current psychiatric disorders, or were recurrent depression in previously remitted people. Similar factors, in addition to social and cultural differences, may explain the higher rates of self-reported depression that were found in most other studies worldwide (Adu et al., 2021; Cénat et al., 2021; Ettman et al., 2020; Hao et al., 2020; Lakhan et al., 2020; Liu et al., 2020b Liu et al., , 2020a Luo et al., 2020; Salari et al., 2020; Solomou and Constantinidou, 2020; Wu et al., 2021) . Bearing all these factors in mind, it seems that the rates of first-onset PMDD among Italians during the COVID-19 pandemic deserve consideration. Our final best ML predictive model incorporated ten variables and displayed a sensitivity of 76.5% and a specificity of 77.8% when tested during the second wave, suggesting a good prediction performance for first-onset PMDD in independent samples of Italians during the pandemic. These results suggest that the same variables that had contributed to the emergence of first-onset PMDD at the beginning of the pandemic may have continued to affect Italians even in the following months. Participants in the two waves significantly differed in some of the 46 variables used as potential predictors, possibly due to changes in pandemic-related conditions and situations over time. However, the good performance of the algorithm in the second wave makes it highly unlikely that changes in the distribution of variables between the first and the second wave may have significantly affected the predictive capability of the algorithm. The largest contributions to the prediction among the model's variables came from individual features that preceded the pandemic, namely low "ability to bounce back or recover from" setbacks or stressful events, being an undergraduate student, and being unsatisfied with usual sleep before the pandemic. Two other factors directly related to the pandemic were highly perceived stress regarding the possibility of spreading the infection to others and in response to measures restricting personal activities and movement. A lower but significant contribution was also provided by the pre-pandemic dispositional perception of being poorly supported by relatives or household members in facing difficulties. Finally, being an active smoker, including continuing a pre-existing smoking habit or starting smoking during the pandemic, and taking medications for medical disease treatment contributed to the prediction, although to a more limited extent than the other predictors. "Having experienced a loved one's J o u r n a l P r e -p r o o f hospitalization" displayed the smallest average contribution, playing only a marginal role in the model. Our results are consistent with other findings around the world that find low resilience is a risk factor for poorer mental health outcomes in the general population, including depressive symptoms, after wide-scale stressors, such as natural disasters (Blackmon et al., 2017; Osofsky et al., 2011; Shenesey and Langhinrichsen-Rohling, 2015) , and the COVID-19 pandemic (Landi et al., 2020; Lenzo et al., 2020; Liu et al., 2020b; Prout et al., 2020; Song et al., 2021) . Our previous work has identified low resilience as a predictive factor in developing different new-onset psychiatric disorders during the COVID-19 pandemic among Italians with no psychiatric history (Caldirola et al., 2022) . Although different resilience assessment questionnaires have been used in different studies, overall results support the idea that low resilience may be a non-specific vulnerability factor in different psychiatric disorders, including depression, following exposure to severe stressors, such as the ongoing pandemic. However, resilience is currently conceptualized as a complex and dynamic quality, including multiple resilience factors that range from neurobiological and psychological features of the individual to the social context and relationship networks (Ayed et al., 2019; Kageyama et al., 2021; Kalisch et al., 2019; Perna et al., 2020; Roeckner et al., 2021) . In line with this, we also found that preceding low levels of support by relatives or household members in difficult situations was as an additional significant predictor of first-onset PMDD. Because we used a single resilience questionnaire that only explores a specific aspect of this complex construct, future studies with broader resilience assessments may identify other resilience factors relevant to mental health during large-scale, long-lasting stressful events. Being an undergraduate student was the only occupational status relevant to the predictive model. This finding supports and broadens previous findings in different countries J o u r n a l P r e -p r o o f of higher pandemic-related depressive symptoms among students than in other employment groups (González-Sanguino et al., 2020; Lei et al., 2020; Olagoke et al., 2020) . It is also consistent with the multiple reports of substantial rates of psychiatric symptoms and disorders among university students that have been published during the COVID-19 pandemic (Caldirola et al., 2022; Dogan-Sander et al., 2021; Li et al., 2021; McLafferty et al., 2021) . Several pandemic-related changes affected students' mental well-being, including difficulty adjusting to online classes, self-regulated learning, and self-motivation, as well as daily physical isolation, concerns about decreased practical learning experience, perceived increases in university workload, and worry regarding capacity to successfully meet academic criteria (Conceição et al., 2021; Dogan-Sander et al., 2021; Guse et al., 2021; Matos Fialho et al., 2021) . These student-specific stressors might have influenced the occurrence of firstonset PMDD in this particular population subgroup, considering that the younger population is usually more vulnerable to depression than the older people (American Psychiatric Association, 2013). Therefore, strategies to improve students' mental well-being during the ongoing pandemic or future crises should be implemented at universities, including psychological and educational support and resilience programs (Akeman et al., 2020). Low satisfaction with usual sleep before the pandemic contributed to the prediction of first-onset PMDD. This finding is consistent with longitudinal studies that identified sleep complaints or disorders in non-depressed people as a significant risk factor for later Two other important predictive variables for first-onset PMDD were the higher levels of perceived stress regarding common pandemic-related issues, namely the possibility of transmitting COVID-19 to others and the restriction of personal autonomy. We previously showed the same variables are significant predictors of different new-onset psychiatric disorders among Italians during the pandemic (Caldirola et al., 2022) . Thus, the subjective emotional responses to unexpected stressors may potentially be non-specific risk factors for developing mental disorders, including MDD. Even though decreasing personal hyperactivity to stressful issues would require psychological treatment, more vulnerable people might benefit from supportive public campaigns to manage related difficulties during public health crises. The contribution of active smoking to our predictive model is in line with findings for smoking as a potential risk factor for depression, which has multiple possible mechanisms, including oxidative stress, chronic inflammation, neural damage, and neurotransmission impairment (Hahad et al., 2021) . Therefore, strengthening information campaigns regarding smoking's detrimental effects on mental health and encouraging people to decrease or not to start smoking, especially under highly stressful conditions, could play a part in counteracting the development depression during public emergencies. Finally, our finding that taking medication for medical diseases is a predictor for firstonset PMDD is consistent with the well-known bidirectional association between physical illnesses and depression (Roohi et al., 2021; Thom et al., 2019) . Although further confirmation is needed, self-reported medication use for physical diseases may be a more reliable proxy for the presence of true medical conditions relative to self-reported medical J o u r n a l P r e -p r o o f diagnoses. Our finding suggests that careful mental health monitoring may be called for among people with medical diseases during large-scale stressful events. Sex did not contribute to the prediction of first-onset PMDD. It seems unexpected due to the usual higher prevalence of MDD among women relative to men (American Psychiatric Association, 2013) . Although a large part of participants in our study were women, the ML technique we used was suitable to take into account this sex distribution imbalance without an expected significant impact on the identification of sex as a relevant predictor. Our result may be explained by the use of the MRMR technique to maximize the relevance of each variable to the outcome of interest and minimize its redundancy among a large array of interrelated variables. Therefore, being female may be individually associated with a higher risk of depression onset, but it may have been excluded from the final predictive ML model because it was associated with and redundant relative to other variables highly relevant to the model. In line with this, gender is not relevant to the prediction of PHQ-9 score cutoff threshold of ≥10 during the pandemic also in the only other study that used an ML approach (Prout et al., 2020) . The same reason may partly explain our finding that having had a clinician-diagnosed psychiatric disorder only in the past is not a predictor of first-onset PMDD. This finding may also suggest that a personal vulnerability to other psychiatric disorders does not necessarily confer an increased risk of first-onset PMDD in the general population, at least under the conditions analyzed in this study. The strengths of this study include the use of an ML approach, which is particularly suitable for developing a predictive model among large and complex data sets, and the application of the SHAP technique, which allowed us to identify the importance of each variable for the prediction. It should be noted that these characteristics of the ML approach make it J o u r n a l P r e -p r o o f remarkably promising for future studies in the psychiatric field, considering that psychiatric disorders are highly complex conditions, involving an interplay of multiple individual, environmental, and genetic features and risk factors. Finally, the longitudinal design of this study enabled us to recognize a set of variables that continue to exert their influence on first-onset PMDD for two periods of the pandemic. Likewise, some limitations are present. The sample size was limited due to the restrictive inclusion/exclusion criteria. Probably due to the involvement of official institutional sites in the recruitment, most participants were from north-western Italy. Hence, we were unable to include geographical distribution as a potential predictor in the model. Considering that sociodemographic and economic differences across the country exist, and north-western Italy has been particularly affected by the COVID-19, especially at the beginning of the pandemic, that limitation can make our results not applicable to the general population of Italy. The rates of CPsyDs and past major depression, which were exclusion criteria in this study, were particularly high, suggesting that participants in our study may be not representative of the entire general Italian population. All participants in the survey were ≥18 years, so no data were collected on younger people. The self-report nature of the entire survey cannot exclude inaccuracy or subjective bias among participants, even concerning the main inclusion/exclusion criteria, such as psychiatric history. Indeed, we cannot exclude that people who reported not to have CPsyDs, or not to have had past major depression, could have had undiagnosed psychiatric conditions, representing a relevant limitation of the study. Likewise, although we used the most conservative method of MDD self-assessment available, we could not exclude the consideration that at least part of the PHQ-9-based firstonset PMDDs was not related to true clinical MDD diagnoses. Because our survey explored a large array of variables with multiple questionnaires, we simplified the evaluation of each J o u r n a l P r e -p r o o f variable, so in depth-details on several personal features, experiences, and behaviors were lacking. Moreover, due to the length of the survey and the high probability that information on some specific topics was unreliable, we did not include questions concerning other known vulnerability factors, such as childhood trauma or family history for depression, which have a predictive value for first-onset MDD in a pre-pandemic algorithm developed in the US general population (Wang et al., 2014) . Finally, our predictive model is based on data that only covers 2020. However, the pandemic is still ongoing. The global scenario has partly changed, including the administration of vaccines and a substantial decrease of restrictions on activities and personal movement in the general Italian population. Further prospective studies should use data from the later pandemic phases in 2021 to assess the course of depression, test the model's validity, and explore whether other predictive factors may have played a part in first-onset PMDD. This study identified considerable rates of first-onset PMDD during the first eight months of the pandemic among Italian adults without CPsyDs and developed an ML model with a good predictive capability in two independent samples. The model's predictive variables could be used to develop goals for preventive interventions during the ongoing pandemic or future public health crises, to decrease depression risk among the general population. Wu, T., Jia, X., Shi, H., Niu, J., Yin, X., Xie, J., Wang, X., Ph, D., Jia, X., Ph, D., Shi, H., Ph, D., Niu, J., Yin, X., Xie, J., Wang, X., Ph, D., 2021. Prevalence of mental health problems during the COVID-19 pandemic: A systematic review and meta-analysis. J. Affect. Disord. 281, 91-98. https://doi.org/10.1016/j.jad.2020.11.117 In the column "Characteristics": the variables included in the model as potential predictors are bolded and the possible levels of each variable are italicized. * Cardiovascular diseases, diabetes, metabolic disorders, respiratory diseases, migraine/headache, oncological disorders/cancer, neurological disordes, others; **considered illegal in Italy; *** e.g., contact with people who were diagnosed as having COVID 19; N: number; PMDD: provisional diagnosis of major depressive disorder; Prts: participants The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of firstonset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with "BRS-item 6"; higher levels of "Being scared of transmitting COVID-19"; higher disagreement with "BRS-item 3"; lower levels of "satisfaction with the usual sleep before the pandemic"; higher levels of "Being stressed by pandemic-related restrictions on activities and personal movement "; being an undergraduate student ("Employment status"); higher disagreement with "perception of being supported.."; having continued or started smoking ("Smoking habit during the pandemic"); yes ("current medications for medical diseases"); and yes ("Having experienced a loved one's hospitalization"). ML: machine learning; SHAP: SHapley Additive exPlanations technique The larger the absolute SHAP value of a certain variable, the larger the contribution of that variable in determining that prediction in a specific case. Specifically, a higher risk of firstonset provisional diagnosis of major depressive disorder (PMDD) was associated with higher agreement with "BRS-item 6"; higher levels of "Being scared of transmitting COVID-19"; being an undergraduate student ("Employment status"); higher disagreement with "BRS-item 3"; higher levels of "Being stressed by pandemic-related restrictions on activities and personal movement "; lower levels of "satisfaction with the usual sleep before the pandemic"; higher disagreement with "perception of being supported.."; having continued or started smoking ("Smoking habit during the pandemic"); yes ("current medications for Predicting Perceived Stress Related to the Covid-19 Outbreak through Stable Psychological Traits and Machine Learning Models Greedy function approximation: A gradient boosting machine Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. The lancet Predicting psychological state among chinese undergraduate students in the covid-19 epidemic: A longitudinal study using a machine learning Psychosocial stressors and the prognosis of major depression: a test of Axis IV Prevalence of common mental disorders in Italy Mental health consequences of disasters Coronavirus pandemic (COVID-19) in Spain Understanding Mental Burden and Factors Associated With Study Worries Among Undergraduate Medical Students During the COVID-19 Pandemic Smoking and Neuropsychiatric Disease-Associations and Underlying Mechanisms Do psychiatric patients experience more psychiatric symptoms during COVID-19 pandemic and lockdown? A case-control study with service and research implications for immunopsychiatry The Accuracy of the Patient Health Questionnaire-9 Algorithm for Screening to Detect Major Depression: An Individual Participant Data Meta-Analysis Hypothalamic Regulation of Corticotropin-Releasing Factor under Stress and Stress Resilience Deconstructing and Reconstructing Resilience: A Dynamic Network Approach Prevalence of Depression, Anxiety, and Stress during COVID-19 Pandemic Advances in Neural Information Processing Systems The psychological and mental impact of coronavirus disease 2019 (COVID-19) on medical staff and general public -A systematic review and meta-analysis Perceptions of Study Conditions and Depressive Symptoms During the COVID-19 Pandemic Among University Students in Germany: Results of the International COVID-19 Student Well-Being Study A nationwide survey of psychological distress among italian people during the covid-19 pandemic: Immediate psychological responses and associated factors Depression, anxiety and suicidal behaviour among college students: Comparisons pre-COVID-19 and during the pandemic Expected effects of COVID-19 outbreak on depression incidence in Italy Risk factor J o u r n a l P r e -p r o o f Journal Pre-proof modifications and depression incidence: a 4-year longitudinal Canadian cohort of the Montreal Catchment Area Study National Centre for Disease Prevention and Health Promotion (CNaPPS) Sleep disorders as core symptoms of depression Exposure to coronavirus news on mainstream media: The role of risk perceptions and depression Machine learning in psychometrics and psychological research Deepwater horizon oil spill: mental health effects on residents in heavily affected areas Osservatorio Nazionale sulla Salute nelle Regioni Italiane The revolution of personalized psychiatry: Will technology make it happen sooner? Heart rate J o u r n a l P r e -p r o o f Journal Pre-proof variability: Can it serve as a marker of mental health resilience? The impact of COVID-19 pandemic in a cohort of Italian psoriatic patients treated with biological therapies Identifying Predictors of Psychological Distress During COVID-19: A Neural contributors to trauma resilience: a review of longitudinal neuroimaging studies A 2-Month Follow-Up Study of Psychological Distress among Italian People during the COVID-19 Lockdown On inflammatory hypothesis of depression: what is the role of IL-6 in the middle of the chaos? COVID-19 Pandemic and Lockdown Measures Impact on Mental Health Among the General Population in Italy. Front Prevalence of stress, anxiety, depression among the general population during the COVID-19 pandemic: a systematic review and meta-analysis Global prevalence and burden of depressive and anxiety disorders Perceived resilience: Examining impacts of the deepwater horizon oil spill one-year post-spill The brief resilience scale: Assessing the ability to bounce back Prevalence and Predictors of Anxiety and Depression Symptoms during the COVID-19 Pandemic and Compliance with Precautionary Measures: Age and Sex Matter Psychological Resilience as a Protective Factor for Depression and Anxiety Among the Public During the Outbreak of COVID-19 Validation and utility of a self-report version of PRIME-MD: The PHQ Primary Care Study Major Depressive Disorder in Medical Illness: A Review of Assessment, Prevalence, and Treatment Options COVID-19 pandemic and mental health consequences: Systematic review of the current evidence A prediction algorithm for first onset of major depression in the general population: development and validation Education, years (mean ± SD) Alessandra Alciati, and Giampaolo Perna are scientific consultants for Medibio LTD Francesco Cuniberti has served as consultant for Menarini Industrie Farmaceutiche Riunite Giampaolo Perna has served as consultant for Menarini Industrie Farmaceutiche Riunite, Lundbeck and Pfizer The authors would like to thank Medibio LTD for having supported the dissemination of the questionnaires to the general population.The authors would like to thank Enago (www.enago.com) for the English language review. The authors report no conflicts of interest in this work.This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors J o u r n a l P r e -p r o o f