key: cord-0853098-iigpjpfm authors: Aebi, Nicola Julia; De Ridder, David; Ochoa, Carlos; Petrovic, Dusan; Fadda, Marta; Elayan, Suzanne; Sykora, Martin; Puhan, Milo; Naslund, John A.; Mooney, Stephen J.; Gruebner, Oliver title: Can Big Data Be Used to Monitor the Mental Health Consequences of COVID-19? date: 2021-04-08 journal: Int J Public Health DOI: 10.3389/ijph.2021.633451 sha: 6fe1ac51a1fa8f993f88207d15460002c262bd27 doc_id: 853098 cord_uid: iigpjpfm nan The COVID-19 pandemic has profound mental health consequences [1] . Yet, opportunities to monitor and mitigate mental health problems in this context remain scarce [2] . At the same time, nearly half of the world's population (49%) now use social media and digital tools such as natural language processing have improved considerably, particularly for mental health [3] . Using these tools, researchers have identified and monitored signs of mental illness reflected in social media data including stress, loneliness, depression, or post-traumatic stress [4] . Such approaches, part of a growing field called digital epidemiology, could help identify populations in need of mental health support during the current pandemic. More specifically, sentiment analysis of content posted on popular social media platforms, combined with detection of spatiotemporal disease incidence changes could provide decision makers and public health experts with critical information to supplement traditional epidemiological data sources, and to inform the implementation of targeted mental health interventions [5] [6] [7] . Despite the promise of Big Data, it is important to acknowledge that these digital epidemiologic approaches also raise ethical and legal concerns, particularly with regards to consent, privacy expectations, data protection, and security. Social media users posting publicly may not have consented to being in a research study, and those suffering from mental illness may not have intended for their posts to reveal their health status. People may have shared their information via social media while in a temporary vulnerable state of mind, e.g., during a crisis or during a disease outbreak. In this case, they may not necessarily realize that what they share can potentially be collected and analyzed by third parties, either for relief, marketing, or scientific activities. Yet being identified as mentally ill might cause stigma in private life, at work, become a source of discrimination, and might affect access and use of healthcare services. These ethical issues are compounded by potential legal issues, including regulations regarding the security and protection of the data, and the malicious use of sensitive, health-related data by third parties. Therefore, methodologies, such as de-identification and anonymization, can ensure data protection and privacy by removing personal identifiers. Geo-masking or aggregation of spatial data are also applied to remove geographical attributes [8] . Research or interventions based on Big Data are subject to validity concerns. The theory underlying formal statistics typically assumes random sampling [9] , but because e.g., social media users may not be representative of the general population in terms of demographics or socioeconomic factors, analyzing these data without accounting for the potential non-representativeness may result in selection bias and low internal and external validity [10] . Furthermore, when Big Data are missing key covariates, it may be difficult to account for the effect of confounding factors (sex, socioeconomic determinants, ethnicity). An additional important challenge concerns the assessment of the mental health outcome itself. While the development of advanced sentiment analysis function as a proxy for highlighting emotional distress in the digital sphere, this type of approach precludes any formal assessment of actual mental health outcomes and may result in distorted conclusions. Big Data is also prone to p-hacking (manipulation of data to achieve statistical significance) and harking (hypothesizing after the results are known), especially if the data contains many variables. Hence, a pre-registered analysis plan adds credibility. This plan should include an adjusted significance level, because very small effects may become significant by chance when working with Big Data. Finally, claims of causality cannot be made; therefore, data have to be interpreted carefully. Overall, the strict adherence to reporting guidelines is of utmost importance to overcome methodological concerns. Despite these concerns, Big Data analysis may contribute to a more comprehensive understanding of the mental health consequences from the current COVID-19 crisis. Big Data are not only "long" (covering many individuals), they are also "large", that is, they contain many variables that are already included or that can be easily extracted from these data [6] . The main strength of this approach, however, is the huge data volume made available even across national borders and health care systems. Thereby, dozens of millions of e.g., geo-referenced Twitter tweets, may be analyzed, substantially increasing the statistical power of spatial analyses linking mental health determinants, COVID-19 case counts or regulations, and sentiments of social media users in those locations [10] . Therefore, Big Data analyses could help identify regional differences and establish correlations with other factors such as incidence rates of COVID-19, lockdown strictness or other policies aimed at containing the pandemic, or hospital overcrowding. Analysis of big social media data in combination with spatial epidemiological approaches may further identify geographic hotspots of increased symptoms of mental health problems over time [7] . This in turn could provide key operational information to help implement appropriate mental health support and prevention measures. Moreover, real time monitoring of the mental health consequences of COVID-19 may help set up governments to respond rapidly and appropriately to changes in mental health status. Unlike formal epidemiological studies, the huge data volume and wide geographic coverage of Big Data surveillance come at limited costs and in real-time, making this approach an efficient use of resources. The main limitations are computational power, interpretability, and threats to generalizability. We recommend the use of Big Data approaches to monitor mental health in the general population, especially in the context of heightened anxieties and threats to mental wellbeing owing to the COVID-19 pandemic, as there may be ways to leverage these novel data sources to help deliver targeted support to specific populations including those who are most susceptible to the impacts of the pandemic and resulting mental health consequences. Hence, Big Data hold potential to strengthen our mental health prevention systems in the context of a global public health crisis. There will be ethical and technical challenges, which will require careful and continued efforts to overcome, but these digital approaches can support multifaceted strategies including both modern technologies and traditional approaches. NA, DR, DP, and CO wrote the manuscript. OG acquired funding. OG, SM and JN conceptualized and supervised the study. OG, SM, JN, MF, SE, MS, and MP reviewed and edited the manuscript. All authors contributed to the article and approved the submitted version. This work was funded by the Swiss School of Public Health (SSPH+) (to OG) through a mandate for a PhD course on Big Data in Public Health 2020 and is a direct outcome of this online seminar (SSPH + PhD course website). The Lancet Infectious Diseases. The intersection of COVID-19 and mental health Bidirectional associations between COVID-19 and psychiatric disorder: retrospective cohort studies of 62 354 COVID-19 cases in the USA Machine learning in mental health: a scoping review of methods and applications Using geolocated social media for ecological momentary assessments of emotion: innovative opportunities in psychology science and practice Digital innovations for global mental health: opportunities for data science, task sharing, and early intervention. Curr Treat Options Psych Big data opportunities for social behavioral and mental health research Mental health surveillance after the terrorist attacks in Paris Street masking: a network-based geographic mask for easily protecting geoprivacy Sampling and sampling frames in big data epidemiology Big data in public health: terminology, machine learning, and privacy This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.