key: cord-1023467-99dvwq50 authors: Stier, Andrew J.; Schertz, Kathryn E.; Rim, Nak Won; Cardenas-Iniguez, Carlos; Lahey, Benjamin B.; Bettencourt, Luís M. A.; Berman, Marc G. title: Evidence and theory for lower rates of depression in larger US urban areas date: 2021-08-03 journal: Proc Natl Acad Sci U S A DOI: 10.1073/pnas.2022472118 sha: b471b253f17d745d8d480414940f57760113019c doc_id: 1023467 cord_uid: 99dvwq50 It is commonly assumed that cities are detrimental to mental health. However, the evidence remains inconsistent and at most, makes the case for differences between rural and urban environments as a whole. Here, we propose a model of depression driven by an individual’s accumulated experience mediated by social networks. The connection between observed systematic variations in socioeconomic networks and built environments with city size provides a link between urbanization and mental health. Surprisingly, this model predicts lower depression rates in larger cities. We confirm this prediction for US cities using four independent datasets. These results are consistent with other behaviors associated with denser socioeconomic networks and suggest that larger cities provide a buffer against depression. This approach introduces a systematic framework for conceptualizing and modeling mental health in complex physical and social networks, producing testable predictions for environmental and social determinants of mental health also applicable to other psychopathologies. It is commonly assumed that cities are detrimental to mental health. However, the evidence remains inconsistent and at most, makes the case for differences between rural and urban environments as a whole. Here, we propose a model of depression driven by an individual's accumulated experience mediated by social networks. The connection between observed systematic variations in socioeconomic networks and built environments with city size provides a link between urbanization and mental health. Surprisingly, this model predicts lower depression rates in larger cities. We confirm this prediction for US cities using four independent datasets. These results are consistent with other behaviors associated with denser socioeconomic networks and suggest that larger cities provide a buffer against depression. This approach introduces a systematic framework for conceptualizing and modeling mental health in complex physical and social networks, producing testable predictions for environmental and social determinants of mental health also applicable to other psychopathologies. cities | depression | social networks | built environment | complex systems L iving in cities changes the way we behave and think (1) (2) (3) . Over a century ago, the social changes associated with massive urbanization in Europe and in the United States focused social scientists on the nexus between cities and mental life (2) . Along with the urban public health crises of the time, a central question became whether cities are good or bad for mental health. Subsequently, social psychologists (1) started to document and measure the systematic behavioral adaptations among people living in cities. These adaptations included a more intense use of time [e.g., faster walking (4)], a greater tolerance for diversity (5) , and strategies to curb unwanted social interactions-such that people in larger cities act in colder and more callous ways (1) . These studies attributed the influences of urban environments on mental health to the intensity of social life in larger cities, mediated by densely built spaces and associated dynamic and diverse socioeconomic interaction networks. They did not, however, ultimately clarify whether urban environments promote better or worse mental health. Consequently, concerns persisted that cities are mentally taxing (6) (7) (8) (9) and can induce "stimulus overload," including stress, mental fatigue (10) , and low levels of subjective well-being (SWB) (11) . More recent studies have focused less on urban environments as a whole and more on contextual and environmental factors associated with depression. For example, a study of the entire population in Sweden (9) uncovered a positive association between neighborhood population density and depressionrelated hospitalizations. In addition, individual factors of gender, age, socioeconomic status, and race, which vary at neighborhood levels within cities, have been found to be statistically associated with depression (12) (13) (14) . Other studies using various measures of mental health and broader definitions of urban environments have found evidence for an association between poorer mental health in cities vs. rural areas (7, 8) . However, this evidence and that linking SWB and cities (15) (16) (17) (18) have remained mixed and often explicitly inconsistent (19, 20) due to differences in 1) reporting (e.g., surveys vs. medical records); 2) types of measurement (e.g., surveys vs. interviews); 3) definitions of what constitutes urban; and 4) the mental disorders studied (e.g., schizophrenia vs. depression). For these reasons, it is desirable to create a systematic framework that organizes this diverse body of research and interrogates how varying levels of urbanization influence mental health across different sets of indicators. Here, we begin to build this framework for depression in US cities. We show that, surprisingly, the per capita prevalence of depression decreases systematically with city size. Like earlier classic approaches, our strategy frames the effects of city size on mental health through the lens of the individual experience of urban physical and socioeconomic environments. Crucial to our purposes, many characteristics of cities have been recently found to vary predictably with city population size. These systematic variations in urban indicators are explained by denser built environments and their associated increases in the intensity of human interactions and resulting adaptive behaviors (21) . More specifically, people in larger cities have, on average, more socioeconomic connections mediating a greater variety of Significance Depression is the global leading cause of disability and related economic losses. Cities are associated with increased risk for depression, but how do depression risks change between cities? Here, we develop a mathematical theory for how the built urban environment influences depression risk and predict lower depression rates in larger cities. We demonstrate that this model fits empirical data across four large-scale datasets in US cities. If our model captures some of the underlying causal mechanisms, then these results suggest that depression within cities can be understood, in part, as a collective ecological phenomenon mediated by human social networks and their relationship to the urban built environment. functions. This effect is understood theoretically by the statistical likelihood to interact with more people over space per unit time, leading to potential mental "overload" but also, to greater stimulation and choice along more dimensions of life. This expansion of socioeconomic networks is supported structurally by economies of scale (e.g., road length) in urban built environments and by occupational specialization and associated increases in economic productivity and exchange (3) . This effect leads to a number of quantitative predictions about the nature of urban spaces and socioeconomic variables, the most central of which is the variation of the average number of socioeconomic interactions, k (network degree), with city size, N , as k (N ) = k0N δ e ξ . Here, k0 is a prefactor independent of city size, and ξ is a residual measuring the distance from the population average. The exponent 0 < δ 1/6 < 1 measures the percentage increase in the number of connections with each percentage increase in city population, which is an elasticity in the language of economics. Because the ξ reflects city size-independent statistical fluctuations, these errors average out across cities, and k obeys a scaling relationship on average over cities, such that k (N ) ∼ N δ . This expectation that k follows a scaling law with city population is directly observed in cell phone networks (22) and indirectly via the faster spread of infectious diseases such as COVID-19 (23) , and by higher per capita economic productivity and rates of innovation (4, 21) . This result is important to mental health because depression is associated, at the individual level, with fewer social contacts (24, 25) . To translate the general scaling of social interactions with city size into a model for the incidence of depression in urban areas, we will now need to pay particular attention not only to the average number of social connections in a city of size N , k (N ), but also, to its variance across individuals in that city and how they influence depression. We developed a statistical mathematical model that brings together socioeconomic network structure with individual risk of depression (Fig. 1 ). This model takes the form of a generative social network, which combines 1) a degree distribution with mean scaling as k (N ) = k0N δ (Fig. 1B) with 2) the risk (probabil-ity) for an individual to manifest depression, p d (k ), taken to be inversely proportional to their social connectivity, p d (k ) ∼ 1/k (Fig. 1C ). We will return to the finer issue of quality and type of connections below. For now, note that a larger number of connections in larger cities entails a qualitatively different experience because it is driven by the need to obtain support, goods, and services in environments with deep divisions of knowledge and labor. To complete the model, we need to specify the probability distribution of degree, f (k ), in each city. We adopt a log skewnormal distribution with parameters similar to those measured in ref. 22 (Fig. 1B) . This choice introduces another assumption into our model because lognormal distributions arise from multiplicative random processes, which compound risk over time to generate outcomes. In this sense, the adoption of this distribution assumes that depression is the result of a cumulative exposure process over time (26) (Fig. 1A ) mediated by an individual's social network. Fig. 1D shows results from this model obtained by sampling each city's degree distribution N times, corresponding to a city's population. Each simulated city resident is then diagnosed with a binary outcome, manifesting depression or not proportionally to their individual risk, p d (k ). We used this model to generate urban socioeconomic networks and computed their associated number of depression cases, Y , for a range of city sizes from N = 10 4 to 10 7 that span the population range of US metropolitan areas (Fig. 1D ). We observed a simple scaling relation for the total number of depressive cases, with a sublinear exponent β = 1 − δ < 1. For β = 1 (δ = 0), cases of depression increase proportionally to population so that there would be no city size effect. In contrast, for β < 1 (sublinear), a smaller proportion of the population manifests depression in larger cities. We express the quantitative consequences of the model based on 100 iterations for each city to predict that the number of depression cases follows a power law function of city size with a scaling exponent β = 0.859 (95% CI = [0.854, 0.863]) (Fig. 1D ). This cumulative exposure results in social networks with log skew-normal degree (k) statistics with a mean that increases with city size, indicating more per capita social interactions in larger cities, on average. (C) Individual risk for depression is inversely proportional to social connectivity (degree) and is superimposed on the social networks generated within cities. (D) The combination of how cities shape social networks and how social networks shape individual depression risk results in a prediction of sublinear scaling of depression cases with increased city size (i.e., lower depression rates in larger cities; Inset). The logarithm of population and depression incidence are mean centered for ease of comparison with the empirical results. Thus, under the model's assumptions, we expect larger cities to show substantially lower per capita rates of depression. To test these quantitative expectations, we asked whether empirical measurements of depression exhibit a systematic scaling relationship with city population size. We analyzed four independent datasets, which allow for consistent assessments of cases of depression across different urban areas in the United States. First, we employed estimates of the prevalence of depression in US cities produced as a part of two annual population surveys: the National Survey on Drug Use and Health (NSDUH) (27) from the Substance Abuse and Mental Health Services Administration and the Behavioral Risk Factor Surveillance System (BRFSS) (28) from the Centers of Disease Control and Prevention (Materials and Methods and SI Appendix, Figs. S1 and S3 and Tables S2-S4). The NSDUH asks respondents whether they have experienced a major depressive episode in the past year, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) (27) . The BRFSS asks respondents if they have ever been told that they have a depressive disorder. Both surveys involved a social interaction between a surveyor and the respondent, which takes place over the phone for the BRFSS and in person for the NSDUH. The differences between the two surveys provide a consistency test on measured cases of depression and partially rule out the possibility that their variation with city size is idiosyncratic to particular experimental or survey methodologies. Second, to generalize across different indicators and to avoid biases in reporting due to social stigma (29), we added two additional estimates of depression prevalence based on passive observation, which does not rely on an overt survey instrument. Specifically, we explored two large geolocated Twitter 10 datasets of individuals and their messages for depressive symptoms in different cities. Twitter requires users to opt in to geolocation, and as a result, only a small fraction of tweets are geolocated (30) . Importantly, this bias further distinguishes the two Twitter datasets from the survey-based data and strengthens any claims of generalization across the ways in which the data were collected and the populations of people studied. These two Twitter datasets included an existing dataset collected over 1 wk in 2010 (31) and a historical dataset covering 1 mo in 2019. Similar datasets have been used to demonstrate that happiness decreases with per capita tweets (17) , to demonstrate that counts of users scale superlinearly with city size (32), and to assess regional variability in SWB (33), but to our knowledge, they have not been used to directly estimate associations between mental health disorders and city size. To measure the prevalence of depression from this corpus, we employed a machine learning technique to identify depressive symptoms from users' messages, emulating the Patient Health Questionnaire-9 (PHQ-9) commonly used by clinicians. The PHQ-9 consists of nine questions based on the nine criteria for diagnosing depression in the DSM-IV. In order to emulate the PHQ-9 questions, we used a previously determined lexicon of seed terms organized into nine topics to guide a Latent Dirichlet Allocation (LDA) (34) method to determine the degree to which each user's messages represent these topics (Materials and Methods and SI Appendix, Fig. S4 and Table S5 ). This technique has been found to have an accuracy (proportion of tweets correctly identified) of 68% and precision (1 − the false discovery rate of tweets with depressive symptoms) of 72% compared with expert assignment of tweets to PHQ-9 questions (34) . We estimated the scaling exponent β from each of these datasets via ordinary least squares (OLS) linear regression between the logarithm of total depression cases and the logarithm of population size (Materials and Methods and SI Appendix, Figs. S4 and S5 and Table S6 ). When pooling across datasets and years, we estimated a scaling exponent of β = 0.868 (95% CI = [0.843, 0.892]) (Fig. 2) , consistent with our simulation model's prediction of β = 0.859. Moreover, estimates of β are similar when calculated separately for each dataset (Table 1 and SI Appendix, Fig. S6 ). While the Twitter19 dataset suggests that this statistical relationship is consistent across cities with populations ranging from about 40,000 to 20 million (SI Appendix, Table S1), the BRFSS dataset only supports sublinear scaling of depression rates in cities larger than ∼0.5 million people (Materials and Methods and SI Appendix, Fig. S1 ). This discrepancy may be due to the fact that the BRFSS city-level data are only reported for cities with at least 500 respondents in order to ensure anonymity. We provide evidence that this cutoff artificially alters the joint distribution of depression prevalence estimates and city size in the BRFSS data, but importantly, we find no evidence of similar nonlinear shifts in the joint distribution in the Twitter19 dataset (Materials and Methods and SI Appendix, Figs. S1 and S2). As an additional sensitivity analysis, we performed a logistic regression to assess how conditioning on race, income, education, and rate of population change (i.e., migration) impacted the observed decrease in depression rates for larger cities. We did this with individual-level survey responses for each year of the BRFSS data. Similar to the scaling analysis above, the average odds ratio across all years for a 1-unit increase in the natural logarithm of city population was 0.89 (maximal 95% CI = [0.87, 0.93]) (Materials and Methods and SI Appendix, Table S7 ). Population change was also not significantly related to depression rate in the NSDUH and Twitter datasets (SI Appendix, Tables S8). Thus, we find general empirical support for the expectation that larger cities are associated with a decreased risk for depression even when conditioning on race, education, income, and migration. We found no consistent evidence that the rate of population change (i.e., migration rate) was associated with depression rates across all datasets, despite previous research associating growing cities with increased SWB (15) . This statistical relationship between depression and city size is consistent in larger cities across all four datasets and across In all cases, we observe sublinear (β < 1) scaling of total depression cases with city size. n indicates the number of cities included in each dataset. a decade, despite the different ways in which depressive symptoms are measured and the different ways that the data were collected. Importantly, these results demonstrate that depression rates are substantially lower in larger US cities, contrary to previous expectations but precisely in line with our theoretical model and simulations. Although the association between urbanization and mental health is foundational in the social sciences and in public health, it has remained challenging to characterize and assess quantitatively. This is particularly concerning as almost every nation worldwide continues to urbanize, with over 70% of the world's population expected to live in cities by 2050 (35) , and depressive disorders are already a leading global cause of disability (36) and economic losses (37) . Based on size alone, large cities bear the brunt of the social and economic burden of depressive disorders. Our findings suggest that on a relative basis, however, smaller cities are actually worse off. Consequently, the discrepancy between BRFSS data and Twitter19 data in small cities is particularly concerning. While our analysis suggests that this discrepancy stems from the way in which BRFSS city-level data are reported, it is important that future work develops accurate observation instruments for both smaller cities and finer geographic units within cities (i.e., neighborhoods). This will be particularly important as public health officials start to incorporate geographic patterns of mental health disorders into their allocation plans for mental health care resources. The convergence of recent findings from urban science with evidence and theory from mental health studies offers a window for creating more systematic approaches to understanding mental health in cities. In this respect, the sublinear scaling of total depression cases with population size in larger US cities is a completely unexpected result characterizing the sociogeographic distribution of depression. While the results presented here speak only to larger urban areas in the United States, they suggest that larger city environments and urbanization can, on average, naturally provide greater social stimulation and connections that may buffer against depression. Although urban scaling theory has been shown to generalize across cultures (21) and human history (3, (38) (39) (40) , it is critical for future work to examine whether the presented extension of urban scaling theory to depression generalizes to smaller cities and to other countries and cultures. While our theoretical model only considers the quantity of social connections, embedded in urban scaling theory is the general implication that the net (economic and social) benefit of these interactions is positive (21) . Future work on the link between social connectivity and mental health should consider and explicitly model urban gradients in the quality of such connections. Alongside quantity, the quality of social connections is a strong predictor of depression (41, 42) and SWB (41) . Although the evidence relating SWB to cities is mixed-some studies report no relationship between city size and SWB (15), some suggest higher SWB in larger cities (16) , some suggest lower SWB in larger cities (17) , and some suggest an inverted U (i.e., higher SWB for midsized cities) (18, 43)-the quality of social connections might hold the key to understanding discrepancies between city-level trends in SWB and depression rates. In particular, while positive and negative affects are similarly weighted in SWB measures (44) , depression is frequently characterized by more substantial and nuanced changes in negative affect (45, 46) . Thus, the results presented here suggest that the greater number of social connections in larger cities on the whole may provide a social buffer against negative affect and depression in the most vulnerable people (i.e., those with the smallest social networks). Conversely, increased positive affect is related to higherquality social connections independently of negative affect and depressed mood (47) . Thus, the alleged more callous and superficial social interactions in larger cities (1, 4) may explain decreases in positive affect and SWB but simultaneously, may still buffer individuals from depression by decreasing negative affect (i.e., these more numerous social interactions may impact negative and positive affects differently). Since individuals with the lowest SWB are at a significant risk for clinical depression (41, (48) (49) (50) , it is important for future work to examine how the rate of low-SWB scores varies between cities of different sizes. We must also recognize that the numerous factors that influence depression vary enormously within cities. These variations may influence individuals directly and also, indirectly through the local environments in which they live and work. For example, homophilic gradients of mobility have been observed in neighborhoods with similar levels of socioeconomic status (51) (52) (53) , so that city inhabitants from poorer (richer) communities tend to preferentially travel to similarly poor (rich) areas. In addition, recent research suggests that neighborhoods with higher overall socioeconomic status tend to be better integrated into their surroundings, affording residents better access to the rest of the city (51). Thus, it is crucial that future work examines the relationship between depression rates, mobility, and social connectivity in smaller populations, such as at the neighborhood level. In addition, looking within cities at these local and more finegrained levels is expected to reveal variations in the incidence of depression via other social groupings (12) . For example, several studies have associated high population density in social housing in Europe and the United States with higher incidence of depression in aging adults (19) , possibly mediated by a higher density of negative connections with neighbors, which can instill feelings of isolation, fear, and despair. In order to search for finer causal evidence, future work may employ a number of experimental designs such as sibling comparisons or stratification by confounding factors (54) . Examining scaling relationships of mental health outcomes with city size is a systematic way of investigating general urban effects on mental life, which places focus on collective influence on mental health disorders. The perspective of cities as interconnected networks that shape their inhabitants lives may also help to uncover environmental factors that influence other mental health disorders and overall well-being. This includes highly comorbid psychopathologies such as anxiety disorders and less comorbid ones such as schizophrenia, for which increased socialization may lead to different outcomes. The fact that important insights about the mechanisms of mental health disorders might be gleaned from such a general population-level analysis, which ignores the intricate and often personal details of mental health, is surprising and powerful. Data Sources and Processing. County populations in Fig. 2 are provided by the US Census Bureau and available online at https://www.census.gov/data/ tables/time-series/demo/popest/2010s-counties-total.html. We used delineation files provided by the US Office of Budget and Management to aggregate county-level data up to metropolitan statistical areas (MSAs). Each MSA represents a US Census definition of a functional city in the United States, circumscribing together a city and its suburbs, sometimes known as an integrated labor market in economic geography. These definitions are updated regularly and available at https://www.census.gov/programssurveys/metro-micro/about/delineation-files.html. The list of MSAs included in analysis in the text (Fig. 2) is enumerated in SI Appendix, Table S1 . As the surveys from which we obtained depression prevalence estimates are administered by different agencies (the Substance Abuse and Mental Health Services Administration administers the NSDUH, and the Centers for Disease Control and Prevention administers the BRFSS), collection and reporting methods differ substantially between these two data sources. The NSDUH is conducted in person, while the BRFSS is conducted over the phone. In addition, the two surveys differ in the questions they ask about depressive symptoms. The NSDUH asks participants whether or not they had a period of 2 or more weeks in which they experienced depressive symptoms in line with definitions in the DSM-IV (27) . In contrast, the BRFSS asks respondents if they have ever been told that they "have a depressive disorder (including depression, major depression, dysthymia, or minor depression)." (28) . In addition to these differences in questionnaire content and methods, the two data sources also differ in how they report data. The NSDUH reports age, ethnicity, and geography-adjusted prevalence estimates in 33 MSAs (27) . In contrast, the BRFSS reports age, gender, and socioeconomically adjusted prevalence estimates for any MSA with at least 500 respondents (28) , and consequently, the cities that are included in reports vary from year to year. The 2010 NSDUH estimates of the rate of major depressive episodes used in Fig. 2 were obtained from Table 38 of the Substance and Mental Health Services Administration 2005 to 2010 NSDUH. These data are available online at https://www.samhsa.gov/data/sites/default/files/ NSDUHMetroBriefReports/NSDUHMetroBriefReports/NSDUH Metro Tables. pdf. We multiplied estimated prevalence by 2010 estimated population to determine the estimate of total depression cases within each MSA. The 2011 to 2017 BRFSS city estimates of the prevalence of major depression used in Fig. 2 are available online at https://www.cdc.gov/brfss/ smart/Smart data.htm. As with the NSDUH data, we multiplied estimated prevalence by that year's estimated population to estimate of total depression cases within each MSA. One point of concern was that the cutoff of 500 respondents per city in the BRFSS data might artificially alter the joint distribution of prevalence and city size in a way that biases the estimate of β. One possibility is that larger cities are simply more likely to record enough responses to be included. However, since the BRFFS data include cities with populations as small as 20,285, whatever bias this 500-respondent cutoff may introduce likely has a more complex origin. In order to address this without knowing the source of potential biases in city inclusion, we employed nonparametric change point detection based on the minimum covariate discriminant (MCD) (55) in order to find the city size at which the joint distribution of city size and depression prevalence was different on either side of the change point. This was applied to the BRFSS data annually; results are consistent across years with the mean change point of 692,557 people (SD = 268,004 people). Specifically, we followed a procedure similar to that in ref. 56 . For each year of BRFSS data, we first ordered the data by population and then applied a python implementation of the MCD algorithm (57) with a sliding window. This resulted in a robust to outliers estimate of the mean within the window and the 2 × 2 robust covariance matrix between population and depression prevalence within the window. These two quantities allow for the estimation of the Mahalanobis distance between the robust mean and the data from the city that has the next smallest population to the smallest city included in the window (the left-out city). These distances follow a χ 2 distribution with degrees of freedom equal to the size of the window. Consequently, we marked the left-out city as a potential change point if the Mahalanobis distance was greater than the 97.5th percentile of the relevant χ 2 distribution. Finally, we calculated a moving average with window size 5 of marked change points. We considered a specific city size to be a change point if the moving average of marked change points was greater than 0.5. This was repeated for MCD window sizes from 5 to 25 data points in increments of two. Histograms of the detected change points over all window sizes are shown in SI Appendix, Fig. S1 . When applied to the Twitter19 dataset, change points are observed primarily at the ends of the population range (SI Appendix, Fig. S2 ). This is suggestive of finite edge effects rather than a systematic change in the joint distribution of depression rates and city size as in seen in the BRFSS data. Next, we used a k-means clustering implementation in python (57) to split the detected change points into two separate clusters based on the observation that the histograms of change points over all bin sizes for most years are roughly bimodally distributed. We used the two cluster centers as the final change points for each year of BRFSS data, resulting in a partition of the data into three sets. Scaling estimates for the largest cities are reported in the text, Fig. 2 , and Table 1 . When pooling all BRFSS data across all cities and years, we still find evidence that larger cities have lower depression rates than smaller cities β = 0.926 (95% CI = [0.903, 0.950]). Results are similar when β is estimated separately for each year of BRFSS data (SI Appendix, Table S4 ). When pooling data from the other two partitions, which contain smaller cities, we found no evidence that depression rates scale subor superlinearly with population β = 0.996 (95% CI = [0.956, 1.035]) (SI Appendix, Fig. S3 ). Results were similar when estimating β for each year separately (SI Appendix, Table S2 ). This lack of a city size effect for smaller cities in the BRFSS data may indicate that social network determinants of depression are overshadowed by other risk factor in smaller cities but may also be specific to biases introduced by the way in which the data were collected and reported. We further estimated the sensitivity of our β estimates among larger cities to variation in the change point. For each year of BRFSS data, we varied the change point 100 times according to a normal distribution with a mean of the change point used for that year in the main text and a variance equal to the variance in change points across years. We found that the estimates of β in larger cities are robust to these variations in the choice of change point (SI Appendix, Table S3 ). The geolocated Twitter10 dataset used in Fig. 2 and Table 1 is available online at http://www.cs.cmu.edu/∼ark/GeoText/. This dataset included 377,616 tweets from 9,475 users collected over a 1-wk period in March of 2010 (31) . Latitude and longitude coordinates for each tweet were converted to a county-level geographic identifier using the US Census Geocoder application programming interface (API) provided by the US Census Bureau available at https://github.com/fitnr/censusgeocode. If there was more than one coordinate per user, we used the mode, and in the case of a tie, we used the coordinate that appeared first in time. We then used delineation files provided by the US Office of Budget and Management to roll up county-level data to MSAs. The Twitter19 dataset used in Fig. 2 was collected via Twitter's academic research full search API (https://developer.twitter.com/ en/solutions/academic-research) and was deemed not human subjects research by the University of Chicago Institutional Review Board (IRB20-2049) due to the fact that all data are publicly available. Tweets that had available location tags (longitude and latitude) within US cities between 1 June and 1 July 2019 were collected. This included data from 572,208 users and 15,076,651 tweets. The query parameters for retrieving tweets are available online at https://github.com/enlberman/depression scaling. We note that while Twitter's opt-in policy for geolocation data changed in 2015 to require explicit consent to share precise global positioning system coordinates (30), we rely on provided coarse location data included with all geolocated tweets. We processed tweets following ref. 34 using standard text preprocessing (for example, deleting stop words) and processing steps specific to the Twitter platform (for example, deleting "#" in the hashtags). Then, we used a previously determined lexicon of seed terms related to depression symptoms organized into nine topics based on the PHQ-9 to guide an LDA model (34) . LDA allows for the discovery of underlying topics within collections of text data and has been utilized previously with short, semistructured text sources (e.g., refs. 58 and 59). This enabled us to find users who had topic cluster(s) related to nine PHQ-9 topics in their tweets over 1 wk. One point of concern was that individuals who have depressive symptoms may tweet differently from those who do not have them. Specifically, we worried that individuals with depressive symptoms would tweet less, leading to less reliable estimates from these individuals. This was the case; in the 2010 Twitter dataset, individuals with depressive symptoms tweeted 57.7 times on average, while individuals without depressive symptoms tweeted 37.6 times on average (t statistic = 25.7, P = 7e-141). In order to control for this, we performed a logistic regression to predict the presence of depressive language in users tweets from their number of tweets over the 1-wk collection period. We repeated this procedure excluding users who had fewer than a specified number of tweets for cutoffs from 0 to 110 tweets. As demonstrated in SI Appendix, Fig. S4 , the logistic regression model achieves significance for the 2010 Twitter dataset when individuals with fewer than 92 tweets are included. This indicates that people with depressive symptoms tend to tweet less but that among individuals who tweeted at least 92 times over the collection period, a logistic regression model cannot differentiate between individuals with and without depressive symptoms based on their number of tweets. Consequently, we excluded individuals with fewer than 92 tweets and then estimated depression prevalence as the proportion of users in each city whose tweets contained a nonzero signal for any of the PHQ-9 topics. In the Twitter19 dataset, users with depressive symptoms tweeted 45.2 times on average, and those without depressive symptoms tweeted 41.6 times on average (t statistic = 9.52, P = 2e-21). Since the quantities of text were similar in both groups (compared with the 20 tweet difference in the 2010 Twitter dataset), we used a cutoff of 15 tweets to ensure that the LDA algorithm had sufficient input. In addition, we excluded cities in which estimated depression rates were unrealistic at 0 or 100%. In order to test the sensitivity of the results to the minimum tweet threshold, we repeated the scaling analysis on the Twitter10 dataset with minimum tweet count cutoffs from 82 to 101 (SI Appendix, Table S5 ) and found that estimates of β were robust to these changes in exclusion criteria. Estimating the Scaling Exponent β. We performed OLS linear regression in order to calculate the scaling exponent β for depression cases. We verified that the residuals of the models in Table 1 are approximately normally distributed with both quantile-quantile plots of the residuals (SI Appendix, Fig. S5 ) and the Shapiro-Wilk test of normality (60) (SI Appendix, Table S6 ). We also verified that the residuals are not correlated with city size (SI Appendix, Fig. S6 ) (Spearman r minimum P value = 0.44). Conditioning on Race, Education, Income, and Population Change. We additionally assessed whether city size was associated with a decreased risk of depression after conditioning on race, education, income, and population change. To do so, we ran logistic regressions with the R package lme4 (61) on each year of the BRFSS data using the individual participant-level survey responses. We did this only for the 41 cities considered in the primary analysis. We used the BRFSS-provided categories for income, race, and education. Consequently, the income variable had six levels with a baseline of not reported or missing, followed by less than $15,000; $15,000 to $25,000; $25,000 to $35,000; $35,000 to $55,000; and greater than $50,000. The education variable had five levels with a baseline of not reported followed by no high school, graduated high school, attended college, and graduated college. The race variable had four levels with a baseline of White followed by Black, Asian, and other/multiracial. We additionally included the natural logarithm of the population of each respondent's city as a dependent variable. The independent variable indicated whether each respondent had ever been told they have depression. The model is defined as logit{y i = 1} = β 0 + β 1 log(population) + β 2 income i + β 3 education i + β 4 race i + β 5 ∆population/population. Results are summarized in SI Appendix, Table S7 , which was created with the R stargazer package (62) . The maximal 95% CI for the odds ratio of log city population was found by taking the union of 95% CIs across all years of data. Generative Network Model for Depression in Cities. The starting point for the simulations of depression cases was the log skew-normal degree distribution, which has been shown to match the degree distributions of cell phone-based social networks in cities (22) and theoretically, is the result of cumulative exposures to semirandom interactions taking place throughout cities' infrastructure networks. The log skew-normal distribution has the density function where ζ is the location parameter, α is the shape parameter, ω is the scale parameter, φ is the normal distribution probability density function, and Φ is the normal distribution cumulative density function. These parameters can be transformed into the more familiar mean (µ), variance (σ 2 ), and skewness (γ 1 ) via (63) γ 1 = 4 − π 2 (δ 2/π) 3 (1 − 2δ 2 /π) 3/2 . [6] We started with values of σ = 0.87, γ 1 = 0.2, and µ = 1.97 in line with a city of size N = 10, 000 (22) . We then let the mean of this distribution grow with population size according to µ(N) = 1.97 + δ · ln N 10, 000 , where δ = 1 6 0.167 [7] so that k ∼ N δ . For each simulated city with size N, we sampled uniformly from it on a log scale from 10 4 to 10 7 . We then sampled from the degree distribution N times to obtain a list of the social network degrees of all N simulated city inhabitants. From this list, we randomly assigned each simulated individual to be diagnosed with depression (or not) with a probability inversely proportional to their degree (probability of depression ∼1/k). Total depression cases in each simulated city were calculated as the sum of depressed individuals. Data Availability. City-level depression rates for the Twitter19 dataset have been deposited in Open Science Framework (64) . Under our data use agreement, the Twitter19 dataset can only be shared after aggregating up from individual user data. BRFSS data are available publicly online (28) . NSDUH data are publicly available online (27) . The Twitter10 dataset is publicly available online (31) . The experience of living in cities The metropolis and mental life" in The Urban Sociology Reader Introduction to Urban Science: Evidence and Theory of Cities as Complex Systems Growth, innovation, scaling, and the pace of life in cities Urbanism and tolerance: A test of some hypotheses drawn from Wirth and Stouffer Are cities bad for your mental health? Cities and mental health Schizophrenia and urbanicity: A major environmental influence-conditional on genetic risk Urbanisation and incidence of psychosis and depression: Follow-up study of 4.4 million women and men in Sweden The cognitive benefits of interacting with nature Human values, subjective well-being and the metropolitan region The epidemiology of chronic major depressive disorder and dysthymic disorder: Results from the national epidemiologic survey on alcohol and related conditions Age and the effect of economic hardship on depression Socio-economic status, family disruption and residential stability in childhood: Relation to onset, recurrence and remission of major depression Unhappy cities Does city size affect happiness The geography of happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place Life satisfaction in urbanizing China: The effect of city size and pathways to urban residency Psychological characteristics of high-rise residents Living the high life'? Residential, social and psychosocial outcomes for high-rise occupants in a deprived context The origins of scaling in cities The scaling of human interactions with city size Early pandemic COVID-19 case growth rates increase with city size Social network status and depression among adolescents: An examination of social network influences and depressive symptoms in a Chinese sample Social network determinants of depression Stress exposure across the life span cumulatively increases depression risk and is moderated by neuroticism Results from the 2010 National Survey on Drug Use and Health: Summary of national findings" (National Survey on Drug Use and Health Series H-41, HHS Publication 11-4658 Behavioral risk factor surveillance system survey data" (US Department of Health and Human Services Meta-analysis of stigma and mental health A longitudinal and geospatial analysis of COVID-19 tweets during the early outbreak period in the United States A latent variable model for geographic lexical variation Scaling laws in geo-located Twitter data Social networks data and subjective wellbeing. An innovative measurement for Italian provinces Semi-supervised approach to monitoring clinical depressive symptoms in social media United Nations Department of Economic and Social Affairs, Population Division Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: A systematic analysis for the global burden of disease study 2017 The economic burden of depression in the United States: How did it change between 1990 and Settlement scaling and increasing returns in an ancient society Settlement scaling and economic change in the central Andes The pre-history of urban scaling The relations among relatedness needs, subjective well-being, and depression of Korean elderly The relationship between social support networks and depression in the How does growing city size affect residents' happiness in urban China? A case study of the Bohai rim area Subjective well-being mediates the effects of resilience and mastery on depression and anxiety in a large community sample of young and middle-aged adults Feeling blue or turquoise? Emotional differentiation in major depressive disorder Daily affective dynamics predict depression symptom trajectories among adults with major and minor depression Positive affect and psychobiological processes relevant to health The Science of Well-Being: The Collected Works of Ed Diener (Social Indicators Research Series Life satisfaction and depression in a 15-year follow-up of healthy adults Well-being across America The local structures of human mobility in Chicago Neighborhood isolation in Chicago: Violent crime effects on structural isolation and homophily in inter-neighborhood commuting networks The hidden image of the city: Sensing community well-being from urban mobility" in Pervasive Computing Accounting for confounding in observational studies Minimum covariance determinant Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods Scikit-learn: Machine learning in Python Empirical study of topic modeling in Twitter A thought in the park: The influence of naturalness and low-level visual features on expressed thoughts Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests Fitting linear mixed-effects models using lme4 stargazer: Well-formatted regression and summary statistics tables The centred parameterization and related quantities of the skew-t distribution Evidence and theory for lower rates of depression in larger U.S. urban areas. Open Science Framework ACKNOWLEDGMENTS. This work was partially supported by NSF Grants DGE-1746045 (to K.E.S.), BCS-1632445 (to M.G.B.), and S&CC-1952050 (to M.G.B.). We additionally would like to thank Kaylah Thomas for helping us to understand the structure of the 2019 Twitter dataset.