key: cord-0567824-lyntvazf
authors: Ralph, Paul; Baltes, Sebastian; Adisaputri, Gianisa; Torkar, Richard; Kovalenko, Vladimir; Kalinowski, Marcos; Novielli, Nicole; Yoo, Shin; Devroey, Xavier; Tan, Xin; Zhou, Minghui; Turhan, Burak; Hoda, Rashina; Hata, Hideaki; Robles, Gregorio; Fard, Amin Milani; Alkadhi, Rana
title: Pandemic Programming: How COVID-19 affects software developers and how their organizations can help
date: 2020-05-03
journal: nan
DOI: nan
sha: 06ea3b3da4c140c0047802cfa981dfaa856aa234
doc_id: 567824
cord_uid: lyntvazf

Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This paper seeks to understand the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales. The questionnaire ran in 12 languages, with region-specific advertising strategies. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Findings include: (1) developers' wellbeing and productivity are suffering; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity; (4) women, parents and people with disabilities may be disproportionately affected. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.

RMSEA = 0.051, SRMR = 0.067). Findings include: (1) developers' wellbeing and productivity are suffering; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity; (4) women, parents and people with disabilities may be disproportionately affected. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.

Keywords Software development · Work from home · Crisis management · Disaster management · Emergency management · Wellbeing · Productivity · COVID-19 · Pandemic · Questionnaire · Structural equation modeling 1 Introduction

In December 2019, a novel coronavirus disease emerged in Wuhan, China. While symptoms vary, COVID-19 often produces fever, cough, shortness of breath, and in some cases, pneumonia and death. By April 30, 2020, The World Health Organization (WHO) recorded more than 3 million confirmed cases and 217,769 deaths (WHO, 2020a) . With wide-spread transmissions in 214 countries, territories or areas, the WHO declared it a Public Health Emergency of International Concern (WHO, 2020b) and many jurisdictions declared states of emergency or lockdowns (Kaplan et al., 2020) . Many technology companies told their employees to work from home (Duffy, 2020) .

Quarantine work !== Remote work. Ive been working remotely with success for 13 years, and Ive never been close to burn out. Ive been working quarantined for over a month and Im feeling a tinge of burn out for the first time in my life. Take care of yourself folks. Really.

-Scott Hanselman (April 19, 2020) Thinking of this situation as a global natural experiment in working from home-the event that would prove once and for all that working from home, well, works-would be naïve. This is not normal working from home. This is working from home, unexpectedly, during an unprecedented crisis. The normal benefits of working from home do not apply (Donnelly and Proctor-Thomson, 2015) . Rather than working in a remote office or well-appointed home office, some people are working in impromptu in bedrooms, at kitchen tables and on sofas while partners, children, siblings, parents, roommates, and pets distract them. Others are spending all day alone in a studio or one-bedroom apartment. With schools and daycare closed, many parents must juggle work with not only childcare but also home schooling or monitoring remote schooling activities and keeping children engaged. Some professionals have the virus or are caring for family members with the virus.

While numerous studies have investigated remote work, few investigate working from home during disasters. There are no modern studies of working from home during a pandemic of this magnitude because there has not been a pandemic of this magnitude since before there was a world wide web. Therefore, software companies have limited evidence on which to base attempts to support their workers through this crisis, which raises the following research question.

Research questions: How is working from home during the COVID-19 pandemic affecting software developers' emotional wellbeing and productivity?

In addressing this question, this paper generates and evaluates a theoretical model for explaining and predicting changes in wellbeing and productivity while working from home during a crisis. Moreover, we provide recommendations for professionals and organizations to support employees who are working from home due to COVID-19 or future disasters. This paper is organized as follows. Section 2 provides needed background on pandemics, productivity and working from home. Section 3 introduces our hypotheses and nomological model. Section 4 describes the design of the pandemic programming questionnaire and our sampling strategy. Next, we describe our analysis and results (Section 5), followed by the study's limitations and implications (Section 6). Section 7 concludes the paper with a summary of its contributions.

To fully understand this study, we need to review three areas of related work:

(1) pandemics, bioevents and disasters; (2) working from home; (3) productivity. Madhav et al. (2017) defines pandemics as "large-scale outbreaks of infectious disease over a wide geographic area that can greatly increase morbidity and mortality and cause significant economic, social, and political disruption" (p. 35). Pandemics can be very stressful not only for those who become infected but also for those caring for the infected and worrying about the health of themselves, their families and their friends (Kim et al., 2015; Prati et al., 2011) . In a recent poll, "half of Canadians (50%) report[ed] a worsening of their mental health" during the COVID-19 lockdown (ARI, 2020) .

A pandemic can be mitigated in several ways including social distancing (Anderson et al., 2020) . "Social distancing refers to a set of practices that aim to reduce disease transmission through physical separation of individuals in community settings" (Rebmann, 2009, p. 120-14) . It may include public facility shutdowns, home quarantine, cancelling large public gatherings, working from home or reducing the number of workers in the same place at the same time, maintaining a distance of at least 1.5-2m between people, and other attempts to reduce contact rate (Rebmann, 2009; Anderson et al., 2020) .

The extent to which individuals comply with recommendations varies significantly and is affected by many factors. People are more likely to comply when they have more self-efficacy; that is, confidence that they can stay at home or keep working during the pandemic, and when they perceive the risks as high (Teasdale et al., 2012) . However, this "threat appraisal" depends on: the psychological process of quantifying risk, sociocultural perspectives (e.g. one's worldview and beliefs; how worried one's friends are), "illusiveness of preparedness" (e.g. fatalistic attitudes and denial), beliefs about who is responsible for mitigating risks (e.g. individuals or governments) and how prepared one feels (Yong et al., 2017; Yong and Lemyre, 2019; Prati et al., 2011) .

People are less likely to comply when they are facing loss of income, personal logistical problems (e.g. how to get groceries), isolation, and psychological stress (e.g. infection fears, boredom, frustration, stigma) (DiGiovanni et al., 2004) . Barriers to following recommendations include job insecurity, lack of childcare, guilt and anxiety about work not being completed, and personal cost of following government advice (Teasdale et al., 2012; Blake et al., 2010) .

For employees, experiencing negative life events such as disasters is associated with absenteeism and lower quality of workdays (North et al., 2010) . Employers therefore need work-specific strategies and support for their employees. Employers can give employees a sense of security and help them return to work by continuing to pay full salaries on time, reassuring employees they they are not going to lose their job, having flexible work demands, implementing an organized communication strategy, and ensuring access to utilities (e.g. telephone, internet, water, electricity, sanitation) and organisational resources (North et al., 2010; Donnelly and Proctor-Thomson, 2015; Blake et al., 2010) .

Work-specific strategies and support are also needed to ensure business continuation and survival. The disruption of activities in disasters simultaneously curtails revenues and reduces productive capacity due to the ambiguity and priorities shifting in individuals, organizations and communities (Donnelly and Proctor-Thomson, 2015) . As social distancing closes worksites and reduces commerce, governments face increased economic pressure to end social distancing requirements prematurely (Loose et al., 2010) . Maintaining remote workers' health and productivity is therefore important for maintaining social distancing as long as is necessary (Blake et al., 2010) . Pérez et al. (2004) defines teleworking (also called remote working) as "organisation of work by using information and communication technologies that enable employees and managers to access their labour activities from remote locations, such as home-based teleworking, mobile teleworking, and telecenters or teleworking centers" (p. 280). Telework can help restore and maintain operational capacity and essential services during and after disasters (Blake et al., 2010) , especially when workplaces are inaccessible. Indeed, many executives are already planning to shift "at least 5% of previously on-site employees to permanently remote positions post-COVID 19" (Lavelle, 2020) .

However, many organisations lack appropriate plans, supportive policies, resources or management practices for practising home-based telework. In disasters such as pandemics where public facilities are closed and people are required to stay at home, their experience and capacity to work can be limited by lack of dedicated workspace at home, caring responsibilities and organisational resources (Donnelly and Proctor-Thomson, 2015) .

In general, working from home is often claimed to improve productivity (Davenport and Pearlson, 1998; McInerney, 1999; Cascio, 2000) and teleworkers consistently report increased perceived productivity (Duxbury et al., 1998; Baruch, 2000) . Interestingly, Baker et al. (2007) found that organisational and job-related factors (e.g. management culture, human resources support, structure of feedback) are more likely to affect teleworking employees' satisfaction and perceived productivity than work styles (e.g. planning vs. improvising) and household characteristics (e.g. number of children).

However, individuals' wellbeing while working remotely is influenced by their emotional stability (that is, a person's ability to their control emotions when stressed). Working from home gives people with high emotional stability more autonomy and fosters their wellbeing. In contrast, working from home can exacerbate physical, social and psychological strain in employees with low emotional stability (Perry et al., 2018) , and the COVID-19 pandemic has not been good for emotional stability (ARI, 2020).

Research on working from home has been criticized for reliance on selfreported perceived productivity, which may inflate the benefits of working from home (Bailey and Kurland, 2002) ; however, objective measures often lack construct validity (Ralph and Tempero, 2018) and perceived productivity correlates well with managers' appraisals (Baruch, 1996) . (The perceived productivity scale we use below correlates well with objective performance data; cf. Section 4.3).

Productivity is the amount of work done per unit of time. 1 Measuring time is simple but quantifying the work done by a software developer is not. Some researchers (e.g. Jaspan and Sadowski, 2019) argue for using goal-specific metrics; others reject the whole idea of measuring productivity (e.g. Ko, 2019) not least because people tend to optimize for whatever metric is being used, a phenomenon known as Goodhart's law (Goodhart, 1984; Chrystal and Mizen, 2003) .

Furthermore, simple productivity measures such as counting the number of commits or the number of modified lines of code in a certain period suffer from low construct validity (Ralph and Tempero, 2018) . The importance and added value of a commit does not necessarily correlate with its size. Similarly, some developers might prefer very dense one-line solutions, while others like to arrange their contributions in several lines. Comparing the two above-mentioned solutions by counting lines of code must yield biased results. Nevertheless, large companies including Microsoft still use controversial metrics such as number of pull requests as a "proxy for productivity" (Spataro, 2020) , and individual developers also use them to monitor their own performance (Baltes and Diehl, 2018) . Copious time tracking tools exist for that purpose-some specifically tailored for software developers. 2 While researchers have adapted existing scales to measure related phenomena like happiness (e.g. Graziotin and Fagerholm, 2019) , there is no widespread consensus about how to measure developers' productivity or the main antecedents thereof. Many researchers use simple, unvalidated productivity scales; for example, Meyer et al. (2017) used a single question asking participants to rate themselves from "not productive" to "very productive." (The perceived productivity scale we use below has been repeatedly validated in multiple domains; cf. see Section 4.3).

The discussion above suggests numerous hypotheses, as follows. Here we hypothesize about "developers" even though our survey was open to all software professionals because most respondents were developers (see Section 5.3). These hypotheses were generated contemporaneously with questionnaire design-before data collection began.

Hypothesis H1: Developers will have lower wellbeing while working from home due to COVID-19. Stress, isolation, travel restrictions, business closures and the absence of educational, child care and fitness facilities all take a toll on those working from home. Indeed, a pandemic's severity and the uncertainty and isolation it induces create frustration, anxiety and fear (Taha et al., 2014; DiGiovanni et al., 2004; Teasdale et al., 2012) . It therefore seems likely that many developers will be experiencing reduced emotional wellbeing.

Hypothesis H2: Developers will have lower perceived productivity while working from home due to COVID-19. Similarly, stress, moving to an impromptu home office, and lack of child care and other amenities may have a negative impact on many developers' productivity. Many people are likely more distracted by the people they live with and their own worrisome thoughts. People tend to experience lower motivation, productivity and commitment while working from home in a disaster situation (Donnelly and Proctor-Thomson, 2015) . Assuming Hypotheses H1 and H2 are supported, we want to propose a model that explains and predicts changes in wellbeing and productivity ( Figure 1 ). Hypotheses H1 and H2 are encapsulated in the change in wellbeing and change in perceived productivity constructs. The model only makes sense if wellbeing and productivity have changed since developers began working from home.

Hypothesis H3: Change in wellbeing and change in perceived productivity are directly related. We expect wellbeing and productivity to exhibit reciprocal causality. That is, as we feel worse, we become less productive, and feeling less productive makes us feel even worse, in a downward spiral. Many studies show that productivity and wellbeign covary (cf. DallOra et al., 2016) . Moreover, Evers et al. (2014) found that people with increasing health risks have a lower wellbeing and higher dissatisfaction in life, leading to higher rates of depression and anxiety. On the other hand, decreasing health risk will increase physical and emotional wellbeing and productivity.

Hypotheses H4 and H5: Disaster preparedness is directly related to change in wellbeing and change in perceived productivity. Disaster preparedness is the degree to which a person is ready for a natural disaster. It includes behaviors like having an emergency supply kit and complying with directions from authorities. We expect lack of preparedness for disasters in general and for COVID-19 in particular to exacerbate reductions in wellbeing and productivity, and vice versa (cf. Paton, 2008; Donnelly and Proctor-Thomson, 2015) .

Hypotheses H6 and H7: Fear (of the pandemic) is inversely related to change in wellbeing and change in perceived productivity. Fear is a common reaction to bioevents like pandemics. Emerging research on COVID-19 is already showing a negative effect on wellbeing, particularly anxiety (Harper et al., 2020; Xiang et al., 2020) . Meanwhile, fear of infection and public health measures cause psychosocial distress, increased absenteeism and reduced productivity (Shultz et al., 2016; Thommes et al., 2016) .

Hypotheses H8 and H9: Home office ergonomics is directly related to change in wellbeing and change in perceived productivity. Here we use ergonomics in its broadest sense of the degree to which an environment is safe, comfortable and conducive to the tasks being completed in it. We are not interested in measuring the angle of a developer's knees and elbows, but in a more general sense of their comfort. Professionals with more ergonomic home offices should have greater wellbeing and be more productive. Donnelly and Proctor-Thomson (2015) found that availability of a dedicated work-space at home, living circumstances, and the availability of organisational resources to work relate to the capacity to return to work after a disaster and employees' productivity.

Hypothesis H10: Disaster preparedness is inversely related to fear (of the pandemic). It seems intuitive that the more prepared we are for a disaster, the more resilient and less afraid we will be when the disaster occurs. Indeed, Ronan et al.'s 2015 systematic review found that programs for increasing disaster preparedness had a small-to medium-sized negative effect on fear. People who have high self-efficacy and response-efficacy (i.e. perceive themselves as ready to face a disaster) will be less afraid (Roberto et al., 2009) .

On March 19, 2020, the first author conceived of a survey to investigate how COVID-19 affects developers, and recruited the second and third authors for help. We created the questionnaire and it was approved by Dalhousie University's research ethics board in less than 24 hours. We began data collection on March 27th. We then recruited authors 5 through 17, who translated and localized the questionnaire into Arabic, (Mandarin) Chinese, English, French, Italian, Japanese, Korean, Persian, Portuguese, Spanish, Russian and Turkish, and created region-specific advertising strategies. Translations launched between April 5 and 7, and we completed data collection on April 14. Next, we recruited the fourth author to assist with the data analysis, which was completed on April 29th. The manuscript was prepared primarily by the first four authors with edits from the rest of team. This section details our approach and instrumentation.

A comprehensive replication package including our (anonymous) dataset, instruments and analysis scripts is stored in the Zonodo open data archive. 3

This study's target population is software developers anywhere in the world who switched from working in an office to working from home because of COVID-19. Of course, developers who had been working remotely before the pandemic and developers who continued working in offices throughout the pandemic are also important, but this study is about the switch, and the questions are designed for people who switched from on-site to at-home work. In principle, the questionnaire was open to all sorts of software professionals, including designers, quality assurance specialists, product managers, architects and business analysis, but we are mainly interested in developers, our marketing focuses on software developers, and therefore most respondents identify as developers (see Section 5.3).

We created an anonymous questionnaire survey. We did not use URL tracking or tokens. We did not collect contact information.

Questions were organized into blocks corresponding to scale or question type. The order of the items in each multi-item scale was randomized to mitigate primacy and recency effects. The order of blocks was not randomized because our pilot study (Section 4.4) suggested that asking the questionnaire was more clear when the questions that distinguish between before and after the switch to home working came after those that did not.

The questionnaire used a filter question to exclude respondents who do not meet the inclusion criteria. Respondents who had not switched from working in an office to working from home because of COVID-19 simply skipped to the end of the questionnaire. It also included not only traditional demographic variables (e.g. age, gender, country, experience, education) but also how many adults and children (under twelve) participants lived with, the extent to which participants are staying home and whether they or any friends or family had tested positive for COVID-19.

The questionnaire used validated scales as much as possible to improve construct validity. A construct is a quantity that cannot be measured directly. Fear, disaster preparedness, home office ergonomics, wellbeing and productivity are all constructs. In contrast, age, country, and number of children are all directly measurable. Direct measurements are assumed to have inherent validity, but latent variables have to be validated to ensure that they measure the right properties(cf. Ralph and Tempero, 2018) .

The exact question wording can be seen in our replication pack. This section describes the scales and additional questions.

Emotional wellbeing (WHO-5). To assess emotional wellbeing, we used the WHO's five-item wellbeing index (WHO-5). 4 Each item is assessed on a six-point scale from "at no time" (0) to "all of the time" (5). The scale can be calculated by summing the items or using factor analysis. The WHO-5 scale is widely used, widely applicable, and has high sensitivity and construct validity (Topp et al., 2015) . Respondents self-assessed their wellbeing twice: once for the four weeks prior to beginning to work from home, and then again for the time they have been working from home.

Perceived Productivity (HPQ). To assess perceived productivity we used items from the WHO's Health at Work Performance Questionnaire (HPQ). 5 The HPQ measures perceived productivity in two ways: (1) using an eight-item summative scale, with multiple reversed indicators, that assesses overall and relative performance; and (2) using 11-point general ratings of participants' own performance and typical performance of similar workers. These scales are amenable to factor analysis or summation. Of course, people tend to overestimate their performance relative to their peers, but we are comparing participants to their past selves not to each other. HPQ scores are closely related to objective performance data in diverse fields (Kessler et al., 2003) . Again, respondents self-assessed their productivity for both the four weeks prior to working from home, and for the time they have been working from home.

Disaster Preparedness (DP). To assess disaster preparedness, we adapted Yong et al.'s (2017) individual disaster preparedness scale. Yong et al. developed their five-item, five-point, Likert scale based on common, important behaviors such as complying with government recommendations and having emergency supplies. The scale was validated using a questionnaire survey of a "weighted nationally representative sample" of 1084 Canadians. We adapted the items to refer specifically to COVID-19. It can be computed by summing the responses or using factor analysis.

Fear and Resilience (FR). The Bracha-Burkle Fear and Resilience (FR) checklist is a triage tool for assessing a patient's reaction to a bioevent (e.g. infectious disease pandemics, bioterrorism). The FR checklist places the patient on a scale from intense fear to hyper-resilience (Bracha and Burkle, 2006) . We dropped some of the more extreme items (e.g. "Right now are you experiencing shortness of breath?") because respondents are at home taking a survey, not arriving in a hospital emergency room. The FR checklist is a weighted summative scale so it has to be computed manually using Bracha and Burkle's formula rather than using factor analysis. It has multiple reversed indicators.

Ergonomics. We could not find a reasonable scale for evaluating home office ergonomics. There is comparatively less research on the ergonomics of home offices (Inalhan and Ng, 2010) and ergonomic instruments tend to be too narrow (e.g. evaluating hip angle). Based on our reading of the ergonomics literature, we made a simple six-item, six-point Likert scale concerning distractions, noise, lighting, temperature, chair comfort and overall ergonomics.

Again, we evaluated the scale's face and content validity using a pilot study (see Section 4.4) and statistically evaluated convergent and discriminant validity ex post in Section 5.2.

Organizational Support (OS). This study seeks to produce actionable advice for software companies regarding how to support their employees. Consequently, we need to elicit respondent's beliefs about helpful actions. We could not find any appropriate instrument, so the first author interviewed three experienced developers with experience in both co-located and distributed teams as well as office work and working from home. Interviewees brainstormed actions companies could take to help, and we used open-coding (Saldaña, 2015) to organize their responses into five themes:

1. Equipment: providing equipment employees need in their home office (e.g. a second monitor) 2. Reassurance: adopting a tone that removes doubt and fear (e.g. assuring employees that lower productivity would be understood) 3. Connectedness: encouraging virtual socializing (e.g. through video chat) 4. Self-care: providing personal services not directly related to work (e.g. resources for exercising or home-schooling children) 5. Technical infrastructure and practices: ensuring that remote infrastructure (e.g. VPNs) and practices (e.g. code review) are in place.

We generated a list of 22 actions (four or five per theme) by synthesizing the ideas of interviewees with existing literature on working from home, distributed development and software engineering more generally. For each action, respondents indicate whether their employer is taking the action and whether they think it is or would be helpful. Organizational support is not a construct in our theory per se because we have insufficient a priori information to produce a quantitative estimate, so we analyze these answers separately.

We solicited feedback from twelve colleagues: six software engineering academics and six experienced software developers. Pilot participants made various comments on the questionnaire structure, directions and on the face and content validity of the scales. Based on this feedback we made numerous changes including clarifying directions, making the question order static, moving the WHO-5 and HPQ scales closer to the end, dropping some problematic questions, splitting up an overloaded question, and adding some open response questions. (Free-text answers are not analyzed in this paper; open response questions were included mainly to inform future research; see Section 6.3).

We advertised our survey on social and conventional media, including Dev.to, Dveloppez.com, DNU.nl, eksisozluk, Facebook, Hacker News, Heise Online, InfoQ, LinkedIn, Twitter, Reddit and WeChat. Upon completion, participants were provided a link and encouraged to share it with colleagues who might also like to take the survey. Because this is an anonymous survey, we did not ask respondents to provide colleagues' email addresses or send messages on their behalf.

We did not offer cash incentives for participation. Rather, we offered to donate US$500 to an open source project chosen by participants (in one of the open response questions).

We considered several alternatives, including scraping emails from software repositories and stratified random sampling using company lists, but none of these options seemed likely to produce a more representative sample.

Instead, we focused on increasing the diversity of the sample by localizing the survey and promoting it in more jurisdictions. We translated the survey into Arabic, (Mandarin) Chinese, English, French, Italian, Japanese, Korean, Persian, Portuguese, Spanish, Russian and Turkish. We capitalized on each authors' local knowledge to reach the more people in their jurisdiction. Rather than a single, global campaign, we used a collection of local campaigns.

Each localization involved small changes in wording. Only a few significant changes were needed:

-The Chinese version had to be re-implemented in a different questionnaire system (wjx.cn) because Google Forms is not available in China. -Because the lockdowns in China and Korea are ending, we reworded some questions from "since you began working from home" to "while you were working from home." -The Portuguese version promised to donate to a specific open-source project in Brazil that is related to COVID-19 (which we have done).

We received 2668 total responses of which 439 did not meet our inclusion criteria and 4 were effectively blank (see below) leaving 2225 responses that fulfilled our inclusion criteria. This section describes how the data was cleaned and analyzed.

The data was cleaned as follows.

1. Delete responses that did not meet our inclusion criteria. 2. Delete almost empty rows, where the respondent apparently answered the filter question correctly, then skipped all other questions.

3. Delete the timestamp field (to preserve anonymity), the consent form confirmation field (because participants could not continue without checking these boxes so the answer is always "TRUE") and the filter question field (because all remaining rows have the same answer). 4. Add a binary field indicating whether the respondent had entered text in at least one long-answer question (see Section 5.2) 5. Remove all free-text responses to a separate file (to preserve anonymity). 6. Recode the raw data (which is in different languages with different alphabets) into a common quantitative coding scheme; for example, from 1 for "strongly disagree" to 5 for "strongly agree" The recoding instructions and related scripts are included in our replication package. 7. Split select-multiple questions into one binary variable per checkbox. (Google forms inanely saves this data as a comma-separated list of the text of selected answers). 8. Add a field indicating the language of the response. 9. Combine the responses into a single data file. 10. Calculate the FR scale according to its formula (Bracha and Burkle, 2006) .

We evaluated construct validity using established guidelines (Ralph and Tempero, 2018) . First, we assessed content validity using a pilot study (Section 4.4). Next, we assessed convergent and discriminant validity using a principle component analysis (PCA) with Varimax rotation and Kaiser normalization. Bartlett's test is significant (chi − square = 13229; df = 253; p < 0.001) and our KMO measure of sampling adequacy is high (0.874), indicating that our data is appropriate for factor analysis.

As Table 1 shows, the items load well but not perfectly. The bolded coefficients suggest possible issues with Change in productivity (∆ Productivity) 7 and 9, as well as Ergonomics 1. We dropped items one at a time until the loadings stabilize, starting with ∆ Productivity 7, followed by ∆ Productivity 9. As shown in Table 2 , dropping these two indicators solved the problem with Ergonomics 1, so the latter is retained.

We evaluate predictive validity by testing our hypotheses in Section 5.4.

Response bias. There are two basic ways to analyze response bias. The firstcomparing sample parameters to known population parameters-is impractical because no one has ever established population parameters for software professionals. The second-comparing late respondents to early respondentscannot be used directly here because we do not know the time between each respondent learning of the survey and completing it. However, we can do something similar: we can compare respondents who answered one or more open response questions (more keen on the survey) with those who skipped the open response questions (less keen on the survey). Table 3 , only number of adult cohabitants and age are significant, and in both cases, the effect size (η 2 ) is very small. This is consistent with minimal response bias.

Respondents were disproportionately male (81% vs. 18% female and 1% nonbinary) and overwhelmingly employed full-time (94%) with a median age of 30-34. Participants were generally well-educated (Fig. 3) . Most respondents, 53%, live with one other adult, while 18% live with no other adults and the rest live with two or more people. 27% live with one or more children under 12. 8% indicate that they may have a disability that affects their work. Mean work experience is 9.3 years (σ = 7.3). Mean experience working from home is 1.3 years (σ = 2.5); however, 58% of respondents have no experience working from home.

Participants hail from 53 countries (Table 4 ) and organizations ranging from 0-9 employees to more than 100,000 (Fig 2) . Many participants listed multiple roles but 80% included software developer or equivalent among them, while the rest were other kinds of software professionals (e.g. project manager, quality assurance analyst).

Seven participants (< 1%) tested positive for COVID-19 and six more (< 1%) live with someone with COVID-19; 4% of respondents indicated that a close friend or family member had tested positive, and 13% were currently or recently quarantined. 

Hypothesis H1: Developers will have lower wellbeing while working from home due to COVID-19. The WHO5 wellbeing scale is a five-item summative scale. Participants answered the scale twice-once referring to the 28-day period before switching to work from home and once referring to the period while working from home. To assess the effect of switching to work from home, we sum each scale and compare their distributions. Since the summed scales deviate significantly from a normal distribution (K-S test; p < 0.001), we compare the distributions using the Wilcoxon signed rank test. Hypothesis H1 is supported (Wilcoxon signed rank test Z = 9.107; p < 0.001). Effect size for the Wilcoxon signed rank test is calculated as z/ √ n where n is the total number of observations over both data points, which gives an effect size of 0.137.

Hypothesis H2: Developers will have lower perceived productivity while working from home due to COVID-19. Like the wellbeing scale, participants answered the HPQ productivity scale twice. We sum the scales (omitting items 7 and 9; see Section 5.2) and compare their distributions. Again, the summed scales deviate significantly from a normal distribution (K-S test; p < 0.001), so we use the Wilcoxon signed rank test.

Hypothesis H2 is supported (Wilcoxon signed rank test Z = 10.614; p < 0.001; effect size= 0.164).

To test our remaining hypotheses, we use structural equation modeling (SEM). Briefly, SEM is used to test theories involving constructs (also called latent variables). A construct is a quantity that cannot be measured directly (Ralph and Tempero, 2018) . Fear, disaster preparedness, home office ergonomics, wellbeing and productivity are all constructs. In contrast, age, country, and number of children are all directly measurable.

To design a structural equation model, we first define a measurement model, which maps each reflective indicator into its corresponding construct. For example, each of the five items comprising the WHO5 wellbeing scale is modeled as a reflective indicator of wellbeing. SEM uses confirmatory factor analysis to estimate each construct as the shared variance of its respective indicators.

Next, we define the structural model, which identifies the expected relationships among the constructs. The constructs we are attempting to predict are referred to as endogenous, while the predictors are exogenous.

SEM uses a path modeling technique (e.g. regression) to build a model that predicts the endogenous (latent) variables based on the exogenous variables, and to estimate both the strength of each relationship and the overall accuracy of the model. 6 As mentioned, the first step in a SEM analysis is to conduct a confirmatory factor analysis to verify that the measurement model is consistent (Table  5) . Here, the latent concepts Ergonomics and DisasterPreparedness are captured by their respective exogenous variables. Fear is not included because it is computed manually (see Section 4.3). ∆Wellbeing is the difference in a participant's emotional wellbeing before and after the corona outbreak. This latent concept is captured by five exogenous variables, DeltaW1, . . . , DeltaW5. Similarly, ∆Productivity represents the difference in perceived productivity, before and after the corona outbreak.

The confirmatory factor analysis converged (not converging would suggest a problem with the measurement model) and all of the indicators load well on their constructs. The lowest estimate, 0.716 for DP2, is still quite good. The estimates for DeltaP2 through 6 are negative because these items were reversed (i.e. higher score = worse productivity). Note that factor loadings greater than one do not indicate a problem because they are regression coefficients, not correlations (Jöreskog, 1999) .

Having reached confidence in our measurement model, we construct our structural model by representing all of the hypotheses stated in Section 3 as regressions (e.g. ∆Wellbeing ∼ DisasterPreparedness + Fear + Ergonomics).

In principle, we use all control variables as predictors for all latent variables. In practice, however, this leads to too many relationships and prevents the model from converging. Therefore, we evaluate the predictive power of each control variable, one at a time, and include it in a regression only where it makes at least a marginally significant (p < 0.1) difference. Here, using a higher than normal p-value is more conservative because we are dropping predictors rather than testing hypotheses. Country (of residence) and language (of questionnaire) are not included because SEM does not respond well to nominal categorical variables (see Section 5.6).

Since the exogenous variables are ordinal, the weighted least square mean variance (WLSMV) estimator was used. WLSMV uses diagonally weighted least squares to estimate the model parameters, but it will use the full weight matrix to compute robust standard errors, and a mean-and variance-adjusted test statistic. In short, the WLSMV is a robust estimator which does not assume a normal distribution, and provides the best option for modelling ordinal data in SEM (Brown, 2006) . We use the default Nonlinear Minimization subject to Box Constraints (NLMINB) optimizer.

For missing data, we use pairwise deletion: we only keep those observations for which both values are observed (this may change from pair to pair). By default, since we are also dealing with categorical exogenous variables, the model is set to be conditional on the exogenous variables.

The model was executed and all diagnostics passed, that is, lavaan ended normally after 97 iterations with 212 free parameters with n = 1377. We evaluate model fit by inspecting several indicators (cf. Hu and Bentler, 1999 In summary, all diagnostics indicate the model is safe to interpret (i.e. χ 2 fit = 10.8, CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Figure 4 illustrates the supported structural equation model. Numbers are path coefficients, which indicate the relative strength and direction of relationships. Arrows indicate hypothesized causal direction. Based on this model, Hypotheses H1-H3, H5, H6, H8-H10 are supported; Hypotheses H4 and H7 are not supported. That is, change in wellbeing and change in perceived productivity are directly related; change in perceived productivity depends on home office ergonomics and disaster preparedness; change in wellbeing depends on ergonomics and fear; and disaster preparedness is inversely related to fear.

Inspecting the detailed SEM results (Table 6 ) reveals many interesting patterns. Many direct effects are obvious, for example:

-People who live with small children have significantly less ergonomic home offices. This is not surprising because the ergonomics scale included items related to noise and distractions. -Women tend to be more afraid. This is consistent with studies on the SARS epidemic, which found that women tended to perceive the risk as higher (Brug et al., 2004) . -People with disabilities are less prepared for disasters, have less ergonomic offices and are more afraid. -People who live with other adults are more prepared for disasters.

-People who live alone have more ergonomic home offices.

-People who have COVID-19 or have family members, housemates or close friends with COVID-19 tend to be more afraid, more prepared, and have worse wellbeing since working from home. -People who are more isolated (i.e. not leaving home at all, or only for necessities) tend to be more afraid.

It is more difficult to interpret indirect effects. For example, changes in productivity and wellbeing are closely related. Hypothesis H4 may be unsupported because change in productivity may be mediating the effect of disaster preparedness on change in wellbeing. Similarly, Hypothesis H7 may not be unsupported because change in wellbeing is mediating the relationship between fear and change in productivity. Furthermore, control variables including gen-der, children and disability may have significant effects on wellbeing or productivity that are not obvious because they are mediated by another construct. Moreover, some variables have conflicting effects. For example, disability has not only a direct positive effect on productivity but also an indirect negative effect (through fear). So, is the pandemic hitting people with disabilities harder? More research is needed to explore these relationships.

Above, we mentioned omitting language and country because SEM does not respond well to nominal categorical variables. We tried anyway, and both language and country were significant predictors for all latent variables, but, as expected, including so many binary dummy variables makes the model impossible to interpret. While our analysis suggests that country, language (and probably culture) have significant effects on disaster preparedness, ergonomics, fear, wellbeing and productivity, more research is need to understand the nature of these effects (see Section 6.3). Table 7 shows participants' opinions of the helpfulness of numerous ways their organizations could support them. Several interesting patterns stand out from this data:

-Only action #1-paying developer's home internet charges-is perceived as helpful by more than half of participants and less than 10% of companies appear to be doing that. -The action most companies are taking (#20, having regular meetings) is not perceived as helpful by most participants. -There appears to be no correlation between things developers believe would help and things employers are actually doing. -There is little consensus among developers about what their organizations should do to help them.

This study shows that software professionals who are working from home during the pandemic are experiencing diminished emotional wellbeing and productivity, which are closely related. Furthermore, poor disaster preparedness, fear related to the pandemic, and poor home office ergonomics are exacerbating this reduction in wellbeing and productivity. Moreover, women, parents and people with disabilities may be disproportionately affected. In addition, dissensus regarding what organizations can do to help suggests that no single action is universally helpful; rather, different people need different kinds of support. *number of respondents who indicated that this practice is or would be helpful and number of respondents who indicated that their organizations are following this recommendation (n=2225) 6 Discussion

Organizations need to accept that expecting normal productivity under these circumstances is unrealistic. Pressuring employees to maintain normal productivity will likely make matters worse. Furthermore, companies should avoid making any decisions (e.g. layoffs, promotions, bonuses) based on productivity during the pandemic because any such decision may be prejudiced against protected groups. The best way to improve productivity is to help employees maintain their emotional wellbeing. However, no single action appears beneficial to everyone, so organizations should talk to their employees to determine what they need.

Helping employees improve the ergonomics of their work spaces, in particular, should help. However, micromanaging foot positions, armrests and hip angles is not what we mean by ergonomics. Rather, companies should ask broad questions such as "what do you need to limit distractions and be more comfortable?" Shipping an employee a new office chair or noise cancelling headphones could help significantly.

Meanwhile, professionals should try to accept that their productivity may be lower and stop stressing about it. Similarly, professionals should try to remember that different people are experiencing the pandemic in very different ways-some people may be more productive than normal while others struggle to complete any work through no fault of their own. It is crucial to support each other and avoid inciting conflict over who is working harder. If a member of a protected group feels discriminated against due to low productivity at this time, we recommend contacting your local human rights commission or equivalent organization.

The above recommendations should be considered in the context of the study's limitations.

Sampling bias. Random sampling of software developers is rare (Amir and Ralph, 2018) because there are no lists of all the software developers, projects, teams or organizations in the world or particular jurisdictions (Baltes and Ralph, 2020) . We therefore combined convenience and snowball sampling with a strategy of finding a co-author with local knowledge to translate, localize and advertise the questionnaire in a locally effective way. On one hand, the convenience/snowball strategy may bias the sample in unknown ways. On the other hand, our translation and localization strategy demonstrably increased sample diversity, leading to one of the largest and broadest samples of developers ever studied, possible due to a large, international and diverse research team. Any random sample of English-speaking developers is comparatively ethnocentric.

Response Bias. Meanwhile, we found minimal evidence of response bias (in Section 5.2), however, because the questionnaire is anonymous and Google Forms does not record incomplete responses, response bias can only be estimated in a limited way. Someone could have taken the survey more than once or entered fake data. Moreover, some internet service providers in Iran block Google services, but developers tend to use proxies to bypass these restrictions.

Additionally, large responses from within a single country could skew the data but we correct for country, company size and language to mitigate this.

Construct validity. To enhance construct validity, we used validated scales for wellbeing, productivity, disaster preparedness and fear/resilience. Posthoc construct validity analysis suggests that all four scales, as well as the ergonomics scale we created, are sound (Section 5.2). However, perceived productivity is not the same as actual productivity. Although the HPQ scale correlates well with objective performance data in other fields (Kessler et al., 2003) , it may not in software development. Similarly, we asked respondents their opinion of numerous potential mechanisms for organizational support. Just because companies are taking some action (e.g. having regular meetings) or respondents believe in the helpfulness of some action (e.g. paying their internet bills), does not mean that those actions will lead to measurable improvements in productivity or wellbeing.

There is much debate about whether 5-point responses such as used in the WHO5 scale should be treated as ordinal or interval. CFA and SEM are often used with these kinds of variables in social sciences despite assuming at least interval data. Some evidence suggests that CFA is robust against moderate deviations from normality, including arguably-ordinal questionnaire items (Flora and Curran, 2004, cf.) . We tend not to worry about treating data as interval as long as, in principle, the data is drawn from a continuous distribution. Additionally, due to a manual error, the Italian version was missing organizational support item 11: "My team uses a build system to automate compilation and testing." The survey may therefore under-count the frequency and importance of this item by up to 10%.

Conclusion validity. We use structural equation modeling to fit a theoretical model to the data. Indicators of model fit suggest that the model is sound. Moreover, SEM enhances conclusion validity by correcting for multiple comparisons, testing the entire model as a whole (instead of one hypothesis at a time) and measurement error (by inferring latent variables based on observable variables). SEM is considered superior to alternative path modeling techniques such as partial least squares path modeling (Rönkkö and Evermann, 2013) . While a Bayesian approach might have higher conclusion validity (Furia et al., 2019) , none of the Bayesian SEM tools (e.g. Blaavan) we are aware of support ordered categorical variables. The main remaining threat to conclusion validity is that the structural model is overfit to the data. More research is needed to determine whether the model overstates any of the supported effects.

Internal validity. To infer causality, we must demonstrate correlation, precedence and the absence of third variable explanations. SEM demonstrates correlation. Inferring causal relationships from cross-sectional data is fraught because we cannot tell the direction of causality or control for all possible confounds. However, many of our propositions only make sense in one direction.

For example, having COVID-19 may reduce one's productivity, but feeling unproductive cannot give someone a specific virus. Other relationships make a lot more sense in one direction than the other. For example, having a more ergonomic office might make you more productive, but being more productive does not make your office more ergonomic. We can be more confident in causality where precedence only makes sense in one direction. That said, while we include numerous control variables (e.g. age, gender, education level), other third variable explanations cannot be discounted. Developers who work more overtime, for example, might have lower wellbeing, worse home office ergonomics, and reduced disaster preparedness.

For researchers, this paper opens a new research area intersecting software engineering and crisis, disaster and emergency management. Although many studies explore remote work and distributed teams, we still need a better understanding of how stress, distraction and family commitments affect developers working from home during crises, bioevents and disasters. More research is needed on how these events affect team dynamics, cohesion and performance.

More specifically, the dataset we publish alongside this paper can be significantly extended. An enormous amount of quantitative data is available regarding different countries, and how those countries reacted to the COVID-19 pandemic. Country data could be merged with our dataset to investigate how different contexts, cultures and political actions affect developers.

Moreover, the crisis continues. More longitudinal research is needed to understand its long-term effects on software professionals, projects and communities.

This study taught us two valuable lessons about research methodology. First, collaborating with a large, diverse, international research team and releasing a questionnaire in multiple languages with location-specific advertising can generate a large, diverse, international sample of participants.

Second, Google Forms should not be used to conduct scientific questionnaire surveys. It is blocked in some countries. It does not record partial responses or bounce rates, hindering analysis of response bias. URL parameter passing, which is typically used to determine how the respondent found out about the survey, is difficult. Exporting the data in different ways gives different variable orders, encouraging mistakes. Responses are recorded as (sometimes long) strings instead of numbers, overcomplicating data analysis. We should have used a research focused survey tool such as Qualtrics 7 .

The COVID-19 pandemic has created unique conditions for many software developers. Stress, isolation, travel restrictions, business closures and the absence of educational, child care and fitness facilities are all taking a toll. Working from home under these conditions is fundamentally different from normal working from home. This paper reports the first large-scale study of how working from home during a pandemic affects software developers. It makes several key contributions:

-Evidence that productivity and wellbeing are suffering; -Evidence that productivity and wellbeing are closely related; -A model that explains and predicts the effects of the pandemic on productivity and wellbeing; -A ranked list of suggestions for supporting developers working from home; -Some indication that the pandemic may disproportionately affect women, parents and people with disabilities.

Furthermore, this study is exceptional in several ways: (1) the questionnaire used previously validated scales, which we re-validated using both principal components analysis and confirmatory factor analysis; (2) the questionnaire attracted an unusually large sample of 2225 responses; (3) the questionnaire ran in 12 languages, mitigating cultural biases; (4) the data was analyzed using highly sophisticated methods (i.e. structural equation modelling), which rarely have been utilized in software engineering research; (5) the study investigates an emerging phenomenon, providing timely advice for organizations and professionals; (6) the study incorporates research on emergency and disaster management, which is rarely considered in software engineering studies.

We hope that this study inspires more research on how software development is affected by crises, pandemics, lockdowns and other adverse conditions. As the climate crisis unfolds, research intersecting crisis management and software engineering will be increasingly needed.

There is no random sampling in software engineering research

How will country-based mitigation measures influence the course of the covid-19 epidemic?

Worry, gratitude & boredom: As covid-19 affects mental, financial health, who fares better

A review of telework research: Findings, new directions, and lessons for the study of modern work

Satisfaction and perceived productivity when professionals work from home. Research & Practice in Human Resource Management

Towards a theory of software development expertise

Sampling in software engineering research: A critical review and guidelines

Self performance appraisal vs direct-manager appraisal: A case of congruence

Teleworking: benefits and pitfalls as perceived by professionals and managers

Employment and compliance with pandemic influenza mitigation recommendations

Utility of fear severity and individual resilience scoring as a surge capacity, triage management tool during large-scale, bioevent disasters

Sars risk perception, knowledge, precautions, and information sources, the netherlands

Managing a virtual workplace

Goodhart's law: Its origins, meaning and implications for monetary policy. Central banking, monetary theory and practice

Characteristics of shift work and their impact on employee performance and wellbeing: A literature review

Two cheers for the virtual office

Factors influencing compliance with quarantine in toronto during the 2003 sars outbreak

Disrupted work: home-based teleworking (hbtw) in the aftermath of a natural disaster

Big tech firms ramp up remote working orders to prevent coronavirus spread

Telework and the balance between work and family: Is telework part of the problem or part of the solution? the virtual workplace

Examining relationships between multiple health risk behaviors, well-being, and productivity

An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data

Bayesian data analysis in empirical software engineering research

Happiness and the productivity of software engineers

Functional fear predicts public health compliance in the covid-19 pandemic

Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives

Teleworker's home office: an extension of corporate office? Facilities

No single metric captures productivity

A third of the global population is on coronavirus lockdown here's our constantly updated list of countries and restrictions

The world health organization health and work performance questionnaire (hpq)

Public risk perceptions and preventive behaviors during the 2009 h1n1 influenza pandemic. Disaster medicine and public health preparedness

Why we should not measure productivity

Gartner cfo survey reveals 74% intend to shift some employees to remote work permanently

Economic and policy implications of pandemic influenza

Working in the virtual office: Providing information and knowledge to remote workers

Retrospecting on work and productivity: A study on self-monitoring software developers' work

The business of healing: Focus group discussions of readjustment to the post-9/11 work environment among employees of affected agencies

Community resilience: Integrating individual, community and societal perspectives

A technology acceptance model of innovation adoption: the case of teleworking

Stress in remote work: two studies testing the demand-control-person model

A social-cognitive model of pandemic influenza h1n1 risk perception and recommended behaviors in italy

Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering

Infectious disease disasters: Bioterrorism, emerging infections, and pandemics

Raising the alarm and calming fears: Perceived threat and efficacy during risk and crisis

Disaster preparedness for children and families: a critical review

A critical examination of common beliefs about partial least squares path modeling

The role of fear-related behaviors in the 2013-2016 west africa ebola virus disease outbreak

Helping our developers stay productive while working remotely

Intolerance of uncertainty, appraisals, coping, and anxiety: The case of the 2009 h 1 n 1 pandemic

The importance of coping appraisal in behavioural responses to pandemic flu

Absenteeism impact on local economy during a pandemic via hybrid sir dynamics

The who-5 well-being index: a systematic review of the literature

pdf?sfvrsn=2ba4e093_2 WHO (2020b) Statement on the second meeting of the international health regulations (2005) emergency committee regarding the outbreak of novel coronavirus (2019-ncov

The covid-19 outbreak and psychiatric hospitals in china: managing challenges through mental health service reform

Getting canadians prepared for natural disasters: a multi-method analysis of risk perception, behaviors, and the social environment

Risk perception and disaster preparedness in immigrants and canadian-born adults: Analysis of a national survey on similarities and differences

Acknowledgements This project was supported by Dalhousie University. Thanks to Brett Cannon, Alexander Serebrenik, Klaas Stol and all of our pilot participants for their support. Thanks also to all of media outlets who provided complementary advertising, including DNU.nl, eksisozluk, InfoQ and Heise Online. Finaly, thanks to everyone at Empirical Software Engineering for fast-tracking COVID-related research.