key: cord-0898190-z6c68gdp
authors: Schurer, S.; Atalay, K.; Glozier, N.; Vera-Toscano, E.; Wooden, M.
title: Zero-COVID Policies: Melbourne's 112-Day Hard Lockdown Experiment Harmed Mostly Mothers
date: 2022-02-02
journal: nan
DOI: 10.1101/2022.01.30.22270130
sha: ebd36878b194edf07ff3ac46e4e83457a71356c8
doc_id: 898190
cord_uid: z6c68gdp

Lockdowns were used worldwide to mitigate the spread of SARS-CoV-2. We demonstrate that the 112-day hard lockdown in Melbourne, Australia, the longest among OECD jurisdictions, exclusively penalized families with young children. To identify the causal impact of lockdown, we interrogated nationally-representative longitudinal survey data and exploited quasi-experimental variation in Melbourne's lockdown, one that left other jurisdictions unaffected. Using difference-in-differences estimation, we found that, surprisingly, most vulnerable groups (the young, poor, lonely and those with previous mental health conditions) were left unscathed. However, we found mothers experienced significant, sizable declines in health and work hours, and increases in loneliness, despite feeling safer and being more active. Zero-COVID policies are not as harmful as may have been expected but came at high cost to mothers in society.

Of all the non-pharmaceutical interventions (NPI) introduced to mitigate the spread of SARS-CoV-2 (COVID- 19) , none has been so 'intrusive' and 'drastic' as the lockdown (1) . It is commonly understood as a restriction of the individual right to move freely which 'applies to all people', albeit some categorization systems label such mobility restrictions as 'partial lockdown', 'household confinement', or 'movements for non-essential activities forbidden' (2) . By the end of April 2020, half of humanity was in some form of lockdown: Almost 4 billion people across 90 countries were asked by their governments to stay at home (3) . By the end of 2021, some countries had ordered their residents into their eighth lockdown, while other places had experienced over 100 days sheltered in place.

Early on in the pandemic, stay-at-home orders with all their obvious personal costs, were accepted as an essential strategy to set up pandemic response systems and to prevent health services from being overwhelmed (4) (5) . During the second wave in 2020, however, an international scientific debate erupted (6) over whether community spread could only be controlled through maximum suppression (elimination), which would have required hard lockdowns and border closures (Zero-COVID strategy) (7) (8) , or whether less restrictive mitigation measures would achieve the same goal (9) . Evidence exists both in favor of elimination strategies to achieve their primary objective (10) (11) (12) (13) and against them (14) (15) (16) , while some proposed that less intrusive NPI may be more appropriate should infection numbers surge (1) .

However, Zero-COVID strategies cannot be used indefinitely, as lockdown requires many sacrifices from residents (17) . Such sacrifices include poorer mental health (18) (19) , increases in loneliness because of the social isolation (20) (21) (22) , and the abandonment of healthy or adoption of unhealthy behaviours to cope with that isolation. Importantly, a great concern was that lockdowns exacerbated social, economic and gender inequalities (23) (24) (25) (26) (27) , harming in causal impact. The reason is that lockdowns are not imposed exogenously. First, in many jurisdictions, such as the UK, the USA, and many European countries, lockdown was a lastresort policy measure, that occurred after a string of events, with the aim of flattening the curve of exponentially growing caseloads, patients in ICUs and infected humans dying from COVID- 19 . Thus, lockdown costs may be severely overestimated, as they are overshadowed by COVID-related trends of increasing morbidity and mortality. Second, lockdowns caused severe economic contraction, job loss, and business closures at the macroeconomic level. Thus, lockdown effects are potentially confounded by income shocks that threaten the financial sustainability of households. Third, lockdowns were implemented some time after lockdown announcements, which may have already changed behaviors pre-lockdown. This could lead to a severe underestimate of the effect of lockdown.

In this paper, we describe how we used methods from the economic policy evaluation literature (difference-in-difference estimation models) in combination with high-quality, nationally representative longitudinal survey data to overcome these identification issues. We studied a unique natural experiment that arose from Australia's second 2020 lockdown ( Fig.   1 ). On 9 July 2020, Australia's second largest city, Melbourne, which is also the capital of the state of Victoria, was locked down for 112 consecutive days, using an aggressive Zero-COVID strategy to return to zero infections (37-38), while other Australian jurisdictions remained open.

The Melbourne case is of paramount scientific importance. Melbourne endured the secondlongest consecutive hard lockdown in 2020 (after Buenos Aires) and holds the unenviable record of greatest cumulative time spent in lockdown worldwide. Melbourne's Zero-COVID achievement after this hard and long lockdown was hailed by the scientific community (39).

Thanks to this natural experiment, which provided a valid control group outside of Melbourne/Victoria, our study was the first to rigorously evaluate the causal impact of Fortuitously, during Melbourne's lockdown period, the 20 th wave of Australia's nationally representative Household, Income, and Labour Dynamics in Australia (HILDA) survey was undertaken (Fig. 1) . This household panel commenced in 2001 with a nationally representative sample of Australian households (see Methods and Materials). In line with data-collection protocols in previous years, almost all 2020 HILDA Survey participants (>93%) were interviewed between August and late October, which fell directly within the window of Melbourne's lockdown experiment. Another 3.4% were interviewed in November, a period that covers the potential spill-over effects of lockdown.

We used these data and knowledge of the exact rollout timing of the Melbourne lockdown to quantify the effect of lockdown on human life. We compared the outcomes in Melbourne (population 5.16 million) with the outcomes of Sydney (population 5.37 million), the most comparable city to Melbourne, which was not locked down. In the HILDA Survey data we had 9,441 individuals (n) or 60,108 person-year observations (pyo) who either live in Sydney or in Melbourne to tell the tale of two cities located at the extreme ends of available mobility restrictions.

From these data, we derived proxies that capture comprehensively and at the broadest level human life across four domains. For each domain, we identified three measures that have been widely used in previous public health and applied economic research (table S1) -(i) Health (all derived from the SF-36 inventory -see Materials and Methods): mental health, which refers to symptoms of anxiety and depression (0-100), general health (0-100), and bodily pain (0-100); (ii) Health behaviors: Body Mass Index (kg/m 2 ), frequency of physical activity (nonedaily), and frequency of alcohol consumption (none-daily); (iii) Perceptions of social connectedness: safety (very dissatisfied [0]-very satisfied [10] ), loneliness (strongly disagree [1] -strongly agree [7] ), and feeling part of the local community (very dissatisfied [0]-very satisfied [10] ); (iv) Labor supply and income: work hours (weekly, incl. 0), salary and wages from all jobs (weekly, in Australian Dollars), and government transfers excluding family benefits (weekly, in Australian Dollars). Time trends in these outcomes are reported in (figs. S1 and S2).

The choice of each outcome measure was based on three considerations: (i) it was consistently recorded annually for the past decade; (ii) its measurement period referred to the current time period, making it suitable for attributing its potential change to lockdown; and (iii) it can be assumed continuous so that it can be scaled to the same standardized unit, which facilitates comparisons of magnitude across outcomes. Sample sizes differed for each outcome but ranged between 45,528pyo (8,114n ) and 52,813pyo (8,932n) ( fig. S1 ). In addition, we All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint analyzed three outcomes collected in a 2020 COVID-19 module (5713n), which asked about individuals' past COVID-19 experience (positive test=1, otherwise=0) and perceptions of risk of both infection and serious illness (both scaled between 0% and 100%). A simple before and during lockdown comparison of outcomes for Melbournians could disguise the possibility that outcome changes were caused by Australia's first and nation-wide lockdown, which would have also affected individuals in other jurisdictions, or other unobservable factors. For this reason, we opted to compare outcomes for individuals before and during Melbourne's lockdown period with individuals who resided in Sydney, the most comparable city to Melbourne, where residents were able to move freely. Table 1 shows that Sydneysiders also experienced a decline in mental health, from 72.5 to 70.6 points (P<0.001), and in perceptions of feeling part of a community, from 6.7 to 7.0 (P<0.001), while their BMIs increased from 25.7 to 26.1 (P=0.008). Although Sydneysiders did not change the frequency of physical activity as did Melbournians (P=0.247), their perceptions of safety increased from 8.1 to 8.2 (P=0.005). In stark contrast to Melbournians, Sydneysiders did not change their workhours (P=0.912), but they increased their incomes by 9% from $717.8 per week to $788.5 (P=0.005). Thus, the small income changes with workhours remaining constant were most likely due to an increase in the receipt of weekly government transfers by 37% from $61.6 to $84.1 (P=0.008).

These findings raised three questions: First, did lockdown cause the changes in the above identified dimensions of human life in Melbourne, or were these driven by other factors that All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint also impacted human life in Sydney? Second, were the changes observed for Melbourne significantly larger in magnitude than the changes in Sydney? Third, if they were larger, were the differences large enough in magnitude to be socially and clinically relevant? b. Causal Impact of Lockdown

To answer these questions, we estimated a difference-in-differences model that compared outcomes in Melbourne before and during the lockdown with outcomes in Sydney during the same time period (see Methods and Materials). In this econometric model, we controlled for city-level differences and city-specific non-linear time trends in outcomes (since 2011). We furthermore controlled for factors that may have changed over time for individuals: age group, marital status, highest level of education, number of adults in households, number of children in household aged less than 25, and mode and month of interview. Controlling for the interview timing is potentially important, as lockdown may have shifted the interview timing of those residing in Melbourne in 2020. Although we found that the interview schedule distribution hardly changed from previous years, there was a small difference noteworthy for the 2020 interviews: The interviews in Melbourne were completed slightly earlier than in Sydney ( fig.   S2 ), while in 2019 rollout dates were identical. This likely reflects Melbournians, locked inside their homes, being easier to reach.

Given the longitudinal nature of the data, we were also able to control for unobserved individual factors that were permanent to the survey participant, but that may have affected outcomes or the way participants reported them. To make the estimates representative for the Australian population, survey-component specific sample weights were applied in each regression. Standard errors were clustered at the household level to adjust for repeated observations within households. All estimates were expressed in terms of standard deviations (sd) away from the standardized zero mean (beta coefficients).

The model yields a causal estimate of lockdown under the assumption that outcomes in Melbourne and Sydney would have had the same trend in the absence of lockdown. It is common in the literature to test this assumption within a so-called event-study framework, which presents estimated treatment effects before and after policy implementation. We demonstrated that there were no anticipation, or phase-in, effects immediately before the policy was implemented (2018-2019), although for some outcomes trends differed over the full ten years of data ( fig. S5 ). It is for this reason that we controlled flexibly for city-specific time trends (see Materials and Methods). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in As much of the discourse has focused upon the inequality of the burden of lockdown, we examined the heterogeneity in the treatment effect by gender, policy-relevant subgroups, (stratified by gender), and exposure length to lockdown. The policy relevant sub-groups were (table S2) : (i) whether the household had young children (a proxy for time pressure on and overwork for carers); (ii) whether the respondent lived alone (a proxy for lack of social interaction and loneliness); (iii) whether the respondent had a mental health problem in previous year (a proxy for health care needs and frailty); (iv) whether the household ranked in the bottom household income quintile (a proxy for poverty); and (v) whether the respondent lived in an apartment or semi-detached house (a proxy for lack of over-crowding). To estimate the potential cumulative effects of lockdown, we stratified analyses pragmatically by exposure to lockdown length at the time of the interview: <40 days, 40-70 days, 71-112 days, and days after lockdown was lifted (>112 days). In supplementary material, we present results by age groups.

This econometric model revealed that on average, lockdown had few significant impacts on any domain of human life ( fig. 2 , tables S3-S4). Regarding the health domain, lockdown led to a significant, but small decline in mental health for women (-0.105sd, P=0.043) and a similar, but only marginally significant, decline for men (-0.09sd, P=0.087). No significant effects were found for general health or bodily pain.

Regarding health behaviours, lockdown had no significant effects on BMI for either women (BMI: -0.041sd, P=0.201) or men (BMI: -0.038sd, P=0.206), but we found tentative evidence that lockdown increased the frequency of alcohol consumption of both men (0.071sd, P=0.120) and women (0.073sd, P=0.054), although the estimates had a large degree of uncertainty and were negligibly small in magnitude (<0.1sd ). Most notable was that lockdown significantly increased the frequency of physical activity for women (0.285sd, P<0.001).

In the social connectedness domain we found only one effect: lockdown significantly reduced feelings of being part of the local community for men (-0.204sd, P<0.001). Lockdown did not affect current salaries and wages or the amount of government transfers received, but it significantly reduced workhours by -0.121sd (P=0.004) for men and -0.129sd (P=0.006) for women.

To ensure that our results were not a statistical artefact, we conducted a series of robustness checks that are recommended for differences-in-differences models (see Materials and Methods). The treatment effects were not produced by the 267 survey respondent who were S7 ). It was also reassuring that we obtained no significant treatment effects when conducting a All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in S8 ).

Lockdown had few significant impacts on the average population even when stratified by gender, and even when significant they were mostly small in magnitude (<0.2sd). It is possible, however, that average effects by gender disguised harm to specific groups of the population.

We thus re-estimated our model on each outcome by subgroups that we deemed to be most vulnerable to lockdown effects. This heterogeneity analysis revealed that the most vulnerable groups were mothers -women with children younger than age 15 (Fig. 3) . Mothers experienced statistically significant and medium-sized declines in mental health (-0.267sd, P= 0.014) and general health (-0.228sd, P=0.029), and medium-sized increases in feelings of loneliness (0.277sd, P=0.020). These negative effects occurred despite increases in feelings of safety (0.264sd, P=0.008) and the frequency of physical activity (0.282sd, P=0.016). Their workhours decreased by 0.223sd (P=0.022), without affecting current salaries (0.000sd, P= 0.998) or government transfers (0.035sd, P=0.787).

In contrast to mothers, fathers did not experience the same significant lockdown impacts.

The only significant impact among fathers was found for the frequency of alcohol consumption, which increased by 0.350sd (P=0.002), a medium-sized effect. Fathers also experienced a small drop in mental health (-0.18sd) and an increase in feelings of loneliness (0.16sd), but such effects were not statistically significant. Formal testing of equality of treatment effects between mothers and fathers showed that treatment effect sizes differed significantly in magnitude only for physical activity (P=0.032), feelings of safety (P=0.032), and marginally for alcohol consumption (P=0.075) (table S5). This suggests that fathers may have experienced ill mental health and feelings of loneliness, similar to mothers, although we have uncertainty about the significance of such effects.

We found significant lockdown penalties for men who lived in mobility-restricted dwellings during lockdown. These men felt less socially connected, drank more frequently alcohol, and were harmed in both their hours worked and weekly incomes. In figure 3 we reported lockdown treatment effects by vulnerable groups. We showed that lockdown predominantly harmed mothers with young children. Fathers were also affected to some degree, but not in a statistically significant way.

This does not mean that men were entirely unaffected by lockdown, but they may have been affected in different ways than women with children. The only systematic impact among men All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint was found for men who lived during lockdown in mobility-restricted or smaller area dwellings.

Men who lived in apartments or semi-detached houses experienced a significant reduction in workhours (-0.166sd, P=0.024), weekly income (-0.171sd, P=0.040), and perceptions of being part of their community (-0.210sd, P=0.034). This group also increased with moderate effect size the frequency of alcohol consumption (0.206sd) although the effect was not statistically significant (P=0.130).

The question arises whether mobility-restricted living space caused such lockdown penalties, or whether men who live in such dwellings have other characteristics that may have made them more vulnerable economically, socially and behaviorally. As we found no systematic penalties for other men in vulnerable groups -eg the poor, the ones who live alone, or those with previous mental health problems -we can say that it was not poverty, loneliness or health issues that explained the penalties for men in mobility-restricted dwellings. Another possibility is that men who lived in mobility-restricted dwellings were relatively young and healthy, but lockdown harmed them because their lives were most interrupted socially and economically because of less experience in the labor market. Complementary analyses by age groups showed that declines in work hours were indeed driven by younger men (age<35).

However, income declines were driven by older men (age 55 or above). It was also both younger and older men who experienced significant declines in feeling part of the local community ( figure S9 ). Hence, there seemed to be heterogeneity among men who live in mobility-restricted dwellings.

Supplementary age group analyses revealed that, with the exception of alcohol use frequency which was driven by middle-aged men ( fig. S9E ), such effects occurred in the younger age groups (figs. S9I and S9J). We found no ill-health effects of lockdown for the younger or older age groups (figs. S9A and S9B).

We also examined whether the length of exposure to lockdown exacerbated the lockdown penalties or benefits. This dose-response analysis revealed that longer exposure to lockdown had no consistent effect across the outcomes; instead, we found heterogeneous trends over exposure time. Fig. 4 (women) and Fig. 5 (men) present our findings, stratified by the presence of young children. For presentation, treatment effects are reported only for selected outcomes and expressed in their original scale to highlight the magnitude of the effects. We tested for equality of estimates between groups at the beginning (<40 days) and the end of the lockdown (71-112 days) (table S6). Because of smaller sample sizes (e.g., the number of observations for All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in 53n, outside of lockdown: 12n), our estimates had large confidence intervals, which affected the reliability of the statistical inference.

Surprisingly, the mental health penalties of lockdown for mothers faded over time: After plummeting by 6 points (-0.34sd) relative to 2019 values in the early days of lockdown, their mental health scores had returned to their 2019 levels by the end of the lockdown (Fig. 4A,   P=0 .108). In contrast, their general health worsened over the lockdown period. At the beginning of lockdown, we found no difference in general health relative to 2019 levels, but by the end of the lockdown general health was 6 points below 2019 levels (Fig. 4B, P=0 .030).

BMIs tended to increase for mothers, with insignificant effects at the beginning of lockdown but a 1 BMI point increase at the end of lockdown that was significantly different from the 2019 level. However, the difference in BMIs between beginning and the end of lockdown was not statistically significant (P=0.282). Workhours per week significantly declined over time as well, with zero treatment effects at the beginning and a 5-hour drop by the end (Fig. 4H , P=0.023). Other relevant outcomes were affected uniformly throughout lockdown (alcohol, physical activity, feeling safe).

For fathers, longer lockdown exposure had few additional effects. Their BMIs increased from beginning to the end of lockdown (P=0.045). As seen in mothers, fathers reduced their workhours, relative to 2019 levels, only at later stages of the lockdown. Their workhours at the beginning of lockdown were no different to 2019 levels, but fathers worked 5 hours less per week relative to the 2019 levels by the end of lockdown (P=0.033). Unlike mothers, the amount of government transfers increased from a zero difference to a difference of A$40 per week relative to 2019 levels, while differences of this treatment effect between beginning and end of lockdown were at the margin of significance (P=0.108).

We found more cumulative effects of lockdown for men without children. For them, general health (P=0.049), physical activity (P=0.027), and feelings of being part of a local community (P=0.058) progressively declined.

Our findings can only be interpreted as causal within the assumptions of the difference-anddifference model. One could argue that the common trend assumption may have been violated if we believed that the onset of the pandemic in Melbourne -roughly 100+ infections per day -could have caused anxiety about the prospects of getting seriously ill, sadness about the actual loss of loved ones, and direct health problems from COVID-19 infection. In supplementary All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint analyses (table S7) we show that such concerns were largely unfounded. While Melbournian men and women were more likely than Sydneysiders to think that they would get infected by 15.2 (P<0.001) and 9.0% (P<0.001), respectively, which is consistent with expectations of an outbreak with community transmission, they did not have a higher probability of an actual COVID-19 infection at the time of the interview, nor did they say they were more likely to get seriously ill if infected.

We also ruled out that Melbournians entered the second lockdown period significantly more harmed financially by the first lockdown that affected all Australian residents. A difference-in-difference model, which used the same specification as our benchmark model, but used previous financial year incomes (which included incomes up until June 2020) as the outcome variable, showed that Melbournians did not experience greater declines in income relative to the previous financial year than Sydneysiders (table S7). We thus conclude that the quasi-experimental variation exploited in this study was unlikely to have been confounded by differences in pre-lockdown experiences.

We concluded from our analysis that Zero-COVID policies are not as harmful as may have been expected, but came at high cost to mothers in society, at least in the short run. Women with young children were the group that bore the brunt of the extended lockdown, with estimated penalties that were not small (>0.2sd). Why were mothers so much more strongly affected than other groups? One argument is that mothers carried a greater mental load than usual during lockdown. Mothers are known to work more hours in the household, even in the absence of lockdown. They are more responsible for scheduling, planning and organising family activities (39). This cognitive load is likely to become a mental load when this daily process is combined with stress and worry. Mothers during lockdown were more likely to take responsibility for home schooling, which would have added additional workload (31-33).

Mothers may also have been affected by the regular presence of their partners. We showed that fathers increased the frequency at which they drank alcohol during lockdown, a risky health behavior. This combined presence of drinking partners in combination with social isolation may have increased family discord, or exposed women to more violence in the home. Concerns over increased interpersonal violence were reported early in context of the costs of lockdown (41) and heightened case numbers were expected for Australia by July 2020 (42). However, our analysis confirmed that mothers unequivocally felt safer during lockdown. In supplementary analyses we were able to show that mothers and fathers were not significantly All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint less satisfied with their partners (fig. S10). Thus, we ruled out that the harm experienced by mothers was driven by greater risks and partner dissatisfaction in the home. More likely, mothers were simply burdened by additional work hours in the home, which is consistent with a non-compensated, potentially voluntary, reduction in work hours which we observed in our data.

Can our findings be generalized? Our findings were representative of one city in the world which returned to zero new COVID-19 infections through a hard and, at the time, uniquely long lockdown, and which benefitted from relatively strong population compliance. It may be therefore hard to generalize our findings to other jurisdictions and lockdown settings. However, our study design enabled us to rule out the many confounding factors, and we have shown that for mothers the lockdown penalties were mainly present throughout lockdown. We would expect similar results in other countries where lockdown is accompanied by school closures, as was the case in Melbourne. Given the minimal effects observed in other reportedly vulnerable groups this suggests that policy and regulatory responses within lockdowns need to focus on mitigating the effects on mothers of school-aged children. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint Figure 2 . Lockdown treatment effects for 4 domains of human life (expressed in std. dev.), by sex Note: Presented are estimates (and 95% confidence intervals) obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. Treatment effects are interpreted as outcome change in Melbourne (2019-2020) relative to outcome change in Sydney (2019-2020), holding constant individual-specific effects, city-specific mean differences (base: Sydney), time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome. Human impact is measured across four domains of human life: [1] Health (mental health, general health, bodily pain); [2] Health behaviors; [3] Social connectedness; [4] Household finances (see Table S1 , supplement, for definitions). Each outcome is standardized to mean 0 and standard deviation 1. Standard errors are clustered at the household level to account for the fact that all household members (aged 15 or above) were interviewed and that full households were mobility restricted in the same location. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; 20 Figure 3 . Lockdown treatment effects (expr. in std. dev.), by sex and socially relevant sub-groups Note: Presented are estimates (and 95% confidence intervals) obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. The treatment effect is interpreted as standard deviation (SD) differences from mean outcome (standardized to 0). See table note of Figure 2 for details of model specification and Note: Presented are estimated treatment effects of lockdown for different exposure length to lockdown. Spikes represent 95% confidence intervals. Estimates are expressed in the original scale, obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. We used the same specification as the benchmark model (see Figure 2 notes), but the treatment indicator (wave 20 × Melbourne) is interacted with 4 dummy variables, each of which presents a different exposure length to lockdown: < 40 days, 40-70 days, 71-112 days, and the days after the lockdown was lifted. We provide significance levels of a test of equality of effects at beginning (<40 days) and last period of lockdown (71-112): + p<.15 * p<.10 ** p<.05. Figure 5 . Lockdown treatment effect (expressed in original unit) for men for selected outcomes, by length of exposure and family status Note: Presented are estimated treatment effects of lockdown for different exposure length to lockdown. Spikes represent 95% confidence intervals. Estimates are expressed in the original scale, obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. We used the same specification as the benchmark model (see Figure 2 notes), but the treatment indicator (wave 20 × Melbourne) is interacted with 4 dummy variables, each of which presents a different exposure length to lockdown: < 40 days, 40-70 days, 71-112 days, and the days after the lockdown was lifted. We provide significance levels of a test of equality of effects at beginning (<40 days) and last period of lockdown (71-112): + p<.15 * p<.10 ** p<.05. i

Information on Australia's pandemic strategy and rollout dates of lockdowns is available in a report of the Australian Institute of Health & Welfare (1) ( Table 1. 2) and a report by the Fair Works Commission (2) . The JobKeeper program, a fiscal stimulus package that was rolled out early during the first lockdown in Australia, is available in (3). Melbourne's lockdown was facilitated by both voluntary and mandatory mobility restrictions, according to Facebook Data for Good to assess mobility trends in Victoria (4). On 1 July, lockdowns were announced for specific postcodes, decreasing mobility substantially in Melbourne (−29%). Stage 4 restrictions were introduced in Greater Melbourne on 1 August. Mobility decreased substantially by −52%.

Mobility was significantly less restricted outside the Greater Melbourne area, and it is for this reason we focus the analysis on Greater metropolitan areas (-15% on 1 July 2021 and -34% on 1 Aug 2020, see (4)).

We sourced data from the HILDA Survey. This household panel commenced in 2001 with a nationally representative sample of Australian households (5) The first-wave sample comprised 13,969 participants from 7,682 households, which was then followed on an annual basis with all members of those initial households aged 15 years or older, or persons who joined the household later. Individuals gave oral informed consent for participation in the study. Additionally, consent was sought from parents or guardians before seeking the involvement of household members aged less than 18 years. Ethics approval was granted by the Human Research Ethics Committee of the University of Melbourne in 2001 and has been updated or renewed on annual basis since (ID no. 1955879).

In 2011, the sample was refreshed with an additional 2,153 households. Sample loss and attrition were low, with re-interview rates rising from 87% in wave 2 to over 95% by wave 8 and remaining above that level in subsequent waves (6) .

The data collection process begins every year in August and ends in the following February. Prior to wave 20, more than 90% of interviews were conducted face-to-face. During the pandemic year, 96% of all person interviews in wave 20 were conducted over the phone.

In addition to an individual interview, respondents were asked to complete a separate selfcompletion questionnaire (SCQ). Prior to wave 20, this was a paper form that was either collected by the interviewer or returned in the mail. In wave 20, an online option was provided, without affecting return rates (in 2020, 91.9%; in 2019, 92.1%).

We use 10 waves of data spanning the period 2011 to 2020. After restricting this sample to individuals living in either metropolitan Melbourne or Sydney, the maximum sample size comprised 9,441 individuals (n) or 60,108 person-year observations (pyo) ( ii We derived from the Medical Outcomes Study Short Form (SF-36 inventory) three measure of health, one of the most widely used self-completion measures of health. These are Mental Health, General Health and Bodily Pain. Previous research has shown that these items in HILDA have good psychometric properties (7) . These measures were collected in HILDA in the Sefl-Completion Questionnaire, starting in Wave 1.

We used methods from the policy evaluation literature (8) (9) (10) to estimate the causal impact of lockdown on human life outcomes. We exploited the fact that lockdown was implemented only in Melbourne/Victoria and nowhere else in the country, and that individuals could not leave Melbourne/Victoria easily to avoid lockdown rules. The treatment group comprises of individuals who were located in the greater metropolitan area of Melbourne during lockdown. The control group comprised of individuals who were located in the greater metropolitan area of Sydney during the time when Melbourne was locked down. We chose city areas as focus of the analysis, because lockdowns were more likely to be enforced strictly in those areas.

Our models are based on a difference-in-difference specification, which compares changes over time in the treatment group with changes over time in the control group, while holding time trends in outcomes constant in both treatment and control group. We regress outcome Yit on control variables and the treatment indicator:

where equals 1 if the respondent lived in Melbourne in year "t" and 0 in Sydney. includes year fixed effects leaving 2019 as the base category. Our coefficient of interest is included in 3 which reports the interaction between our "treatment" variable (i.e., being Melbourne) and the wave (i.e., time) dummies. For example, if the longer lockdown has any impact on mental health such interaction should come out significant at year 2020. Additionally, in order to comply with the parallel trends assumption, the interaction coefficient should not be significant in the waves prior to the pandemic. Otherwise, this would indicate that mental health trends had already followed different trajectories across cities prior to the divergence in containment policies.

is a vector of control variables including age as categorical variable, number of individuals, number of children, education, marital status (all dummy variables). We also control, the month and mode of interview.

As an alternative specification, we estimate a fixed effect model in which we defined treatment group as people live in Melbourne in 2020 and control group as people live in Sydney in 2020. Formally, we estimate

where equals 1 if the respondent lived in Melbourne in 2020 (when HILDA collected wave 20 information) and 0 in Sydney. Note that in this model, it is not necessary to include the treatment dummy ( ) alone since it is fixed over time and therefore will be absorbed by the individual fixed effects.

includes year fixed effects leaving 2019 as the base category. In this model the parameter of interest is 2 . Our results from this model are similar to the above model presented in the paper. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in iii All models use survey-component specific sample probability weights to make the samples representative for the population in each wave. Standard errors are clustered at the household level, to adjust for non-independent observations per household.

All estimates are reported as standardized coefficients, which is interpreted as a percent standard deviation difference in outcomes before and during lockdown for the treatment group relative to the control group.

As is common in the difference-in-difference literature, one needs to test the common trend assumptions of the model. This can be done by presenting so-called event-study graphs, alternative parametrization of trends for treatment and control groups, and so-called placebo tests (10). All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in (2) we reported the standard error of the interaction effect. In column (3) we reported are p-values of a test of equality of estimated coefficients between women and men with children, namely that lockdown effects were the same for fathers and mothers. The estimates are obtained from a difference-indifference model in which we regressed a specific outcome on a treatment group indicator of having been located in Melbourne during lockdown in 2020. Treatment effects are interpreted as outcome change in Melbourne (2019-2020) relative to outcome change in Sydney (2019-2020), holding constant individual-specific effects, city-specific mean differences (base: Sydney), time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome.

All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in . Treatment effects are interpreted as outcome change in Melbourne (2019-2020) relative to outcome change in Sydney (2019-2020), holding constant individual-specific effects, cityspecific mean differences (base: Sydney), time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome.

All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Figure 2 for model specification. Standard errors in parentheses * p < .10, ** p < 0.05, *** p < 0.01 All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; Figure S2 . Rollout dates of personal interview and self-completion questionnaire (SCQ), 2019 versus 2020 Note: The self-completion questionnaire (SCQ) in HILDA is either filled out online/in-paper during the scheduled interview, or before or after the interview. .8 1 01mar2020 01may2020 01jul2020 01sep2020 01nov2020 01jan2021 01mar2021

[D] SCQ Rollout 2020

Melbourne Sydney Figure S5 . Test of the common trend assumption necessary for reliable difference-in-difference estimates of the treatment effect (expressed in std. dev.), by sex [E] Freq. of Alc. Con. Table S1 , supplement, for definitions). Each outcome is standardized to mean 0 and standard deviation 1. Standard errors are clustered at the household level to account for the fact that all household members (aged 15 or above) were interviewed and that full households were mobility restricted in the same location.

-.3

-. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint xxxv Figure S7 . Robustness check: Use as control group Brisbane, Perth and Adelaide instead of Sydney Note: Presented are estimates (and 95% confidence intervals) obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. Treatment effects are interpreted as outcome change in Melbourne (2019-2020) relative to outcome change in Perth, Adelaide, and Brisbane (2019-2020), holding constant individual-specific effects, city-specific mean differences, time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome. Human impact is measured across four domains of human life: [1] Health (mental health, general health, bodily pain); [2] Health behaviors; [3] Social connectedness; [4] Household finances (see Table S1 , supplement, for definitions). Each outcome is standardized to mean 0 and standard deviation 1. Standard errors are clustered at the household level to account for the fact that all household members (aged 15 or above) were interviewed and that full households were mobility restricted in the same location. Lockdown effect (Std) [1] Health [2] Health behavior [3] Social connect. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint xxxvi Figure S8 . Robustness check: Placebo treatment -use Sydney as treatment group and Perth, Adelaide, and Brisbane as control groups Note: This is a so-called placebo test: We should find no treatment effects in this set up. Presented are estimates (and 95% confidence intervals) obtained from separate difference-in-difference regressions of a specific outcome on a treatment group indicator. Treatment effects are interpreted as outcome change in Sydney (2019-2020) relative to outcome changes in Perth, Adelaide, and Brisbane (2019-2020), holding constant individual-specific effects, city-specific mean differences, time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome. Human impact is measured across four domains of human life: [1] Health (mental health, general health, bodily pain); [2] Health behaviors; [3] Social connectedness; [4] Household finances (see Table S1 , supplement, for definitions). Each outcome is standardized to mean 0 and standard deviation 1. Standard errors are clustered at the household level to account for the fact that all household members (aged 15 or above) were interviewed and that full households were mobility restricted in the same location. Lockdown effect (Std) [1] Health [2] Health behavior [3] Social connect. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint xxxvii Figure S9 . Lockdown treatment effects (in std. dev.), by age groups Note: Presented are difference-in-difference estimates and 95% confidence intervals of the Melbourne lockdown effect, separately estimated for each age group (age <25, 25-34, 35-44, 45-54, 55-64, 65+). Treatment effects are interpreted as outcome change in Melbourne (2019-2020) relative to outcome change in Sydney (2019-2020), holding constant individual-specific effects, city-specific mean differences (base: Sydney), time trends (wave indicators), and city-specific time trends (specified as interactions between city indicator and wave indicator). Each regression applies sample weights to make the sample representative of the population, whereby sample weights are chosen either for the person or the self-completion questionnaire, depending on the outcome. Human impact is measured across four domains of human life: [1] Health (mental health, general health, bodily pain); [2] Health behaviors (); [3] Social connectedness; [4] Household finances (see Table S1 , supplement, for definitions). Each outcome is standardized to mean 0 and standard deviation 1. Standard errors are clustered at the household level to account for the fact that all household members (aged 15 or above) were interviewed and that full households were mobility restricted in the same location. 

All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint xxxviii Figure S10 . Lockdown treatment effect on satisfaction with partners (original scale), by length of exposure and family status Note: Presented are estimated treatment effects of lockdown for different exposure length to lockdown. Spikes represent 95% confidence intervals. The outcome variable is Satisfaction with partner, where the score can rank between 0 [low] and 10 [high]. Estimates are expressed in the original scale, obtained from separate difference-in-difference regressions of the outcome on treatment group indicators. We used the same specification as the benchmark model (see Figure 2 notes), but the treatment indicator (wave 20 × Melbourne) is interacted with 4 dummy variables, each of which presents a different exposure length to lockdown: < 40 days, 40-70 days, 71-112 days, and the days after the lockdown was lifted. perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted February 2, 2022. ; https://doi.org/10.1101/2022.01.30.22270130 doi: medRxiv preprint

Ranking the effectiveness of worldwide COVID-19 government interventions

A structured open dataset of government interventions in response to COVID-19

Coronavirus: Half of humanity on lockdown in 90 countries

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Herd immunity for COVID-19

Scientific consensus on the COVID-19 pandemic: We need to act now

Elimination could be the optimal response strategy for covid-19 and other emerging pandemic diseases

Covid-19: Group of UK and US experts argues for "focused protection" instead of lockdowns

Impact of lockdown on Covid-19 case fatality rate and viral mutations spread in 7 countries in Europe and North America

SARS-CoV-2 elimination, not mitigation, creates best outcomes for health, the economy, and civil liberties

When do shelter-in-place orders fight COVID-19 best? Policy heterogeneity across states and adoption time

) achieving elimination of community transmission of SARS-CoV-2

Evaluating the effects of shelter-in-place policies during the COVID-19 pandemic

Impact of lockdown on COVID-19 prevalence and mortality during 2020 pandemic: Observational analysis of 27 countries

Government mandated lockdowns do not reduce COVID-19 deaths: Implications for evaluating the stringent New Zealand response

Can 'zero COVID' countries continue to keep the virus at bay once they reopen? Successful strategies used in Asia and the Pacific may not be sustainable in the long run

A systematic review and meta-analysis of longitudinal cohort studies comparing mental health before versus during the COVID-19 pandemic

Economic hardship and mental health complaints during COVID-19

The psychological impact of quarantine and how to reduce it: rapid review of the evidence

Loneliness, worries, anxiety, and precautionary behaviours in response to the COVID-19 pandemic: A longitudinal analysis of 200,000 Western and Northern Europeans

COVID-19, lockdowns and well-being: Evidence from Google Trends

When lockdown policies amplify social inequalities in COVID-19 infections: evidence from a cross-sectional population-based survey in France

Labor Supply & Income Sydney vs Melbourne: Mean Outcome for Females Note: All outcomes are in its original unit

Frequency of alcohol consumption

Feeling Safe 0-10

Note: Estimates (and 95% confidence intervals) obtained from difference-in-difference regressions of a specific outcome on a treatment group indicator. Each outcome is standardized to mean 0 and standard deviation 1. Depicted are differences in trends in Melbourne relative to 2019 and relative to Sydney from 2011 until 2020. For ease of illustration

[E] Freq. Alcohol Con. [E] Freq. Alcohol Con.