key: cord-0445239-cwefugky
authors: Russo, Daniel; Hanel, Paul H.P.; Berkel, Niels van
title: Understanding Developers Well-Being and Productivity: A Longitudinal Analysis of the COVID-19 Pandemic
date: 2021-11-19
journal: nan
DOI: nan
sha: ffad559596a1650ca67da8f5c5ae618d7b4d5a59
doc_id: 445239
cord_uid: cwefugky

COVID-19 has likely been the most disruptive event at a global scale the world experienced since WWII. Our discipline never experienced such a phenomenon, whereby software engineers were forced to abruptly work from home. Nearly every developer started new working habits and organizational routines, while trying to stay mentally healthy and productive during the lockdowns. We are now starting to realize that some of these new habits and routines may stick with us in the future. Therefore, it is of importance to understand how we have worked from home so far. We investigated whether 15 psychological, social, and situational variables such as quality of social contacts or loneliness predict software engineers' well-being and productivity across a four wave longitudinal study of over 14 months. Additionally, we tested whether there were changes in any of these variables across time. We found that developers' well-being and quality of social contacts improved between April 2020 and July 2021, while their emotional loneliness went down. Other variables, such as productivity and boredom have not changed. We further found that developers' stress measured in May 2020 negatively predicted their well-being 14 months later, even after controlling for many other variables. Finally, comparisons of women and men, as well as between developers residing in the UK and USA, were not statistically different but revealed substantial similarities.

T HE COVID-19 pandemic and the subsequent lockdowns have likely been the among the most disruptive events that most software engineers faced during their lifetime. Suddenly, professionals started to work from home, potentially alongside family members. This peculiar situation is unprecedented in computer science history; thus, we have virtually no information about the impact of lockdowns on the well-being and productivity of software professionals.

The only related evidence comes from the effects of quarantined people in previous epidemic outbreaks, which suggests that isolation and lockdown measures are a huge burden to individuals' well-being [1] and productivity [2] . Indeed, well-being and productivity are two crucial aspects of our lives, particularly during extraordinary events: Wellbeing is a fundamental human right, according to the Universal Declaration of Human Rights whereas productivity provides us with the earnings to ideally maintain or improve our lifestyle.

Health professionals already identified some relevant predictors of well-being during harmful events [1] , [3] . However, this research is often cross-sectional (i.e., not longitudinal), only includes a limited number of predictors, focusses on wellbeing while ignoring productivity. The software engineering community also reacted quickly to this event by performing a large study which found that home office ergonomics, disaster preparedness, and fear are correlated with well-being and productivity [4] . Nevertheless, this was also conducted crosssectionally and with only a few predictors. Pre-pandemic research on remote work [5] might provide some indications. However, it is unlikely that such research is still relevant during a global pandemic, with professionals locked down in their houses without childcare or usual welfare support provided during non-pandemic times.

For these reasons, we believe it is essential to investigate the well-being and productivity of software professionals continuously and longitudinally across the entire COVID-19 pandemic (as of Summer 2021). By doing so, we aimed to achieve several goals. First, identify relevant predictors of both well-being and productivity of software engineers working from home in a stressful context such as a lockdown. Second, test for causal relations between the identified variables and if well-being predicts productivity or vice versa (i.e., scholars found that they are interrelated but could not find a causal association [6] , [7] , [8] ). Third, test whether well-being, productivity, and other relevant variables such as loneliness, social contacts, and need fulfillment changed of the course of 14 months since the beginning of the first lockdown in spring 2020. Fourth, provide data-driven recommendations about possible future lockdowns. Fifth, understand how to improve developers' work-life balance while working from home in a post-pandemic setting and contribute to the nascent literature about the future of work. Hence, we formulate our research questions as follows:

Research Question 1: How have well-being, productivity, and other relevant social and psychological variables changed throughout the COVID-19 pandemic?

Research Question 2: Which variables predict Well-being and Productivity over time?

To answer our research questions, we surveyed 192 globally distributed software engineers four times over a period of 14 months. We assessed their well-being and productivity, alongside 15 we grounded our investigation in organizational [9] and psychological [10] theories, which are relevant for people's well-being and productivity. For example, self-determination theory [10] assumes that human motivation can be divided into three basic needs which are also linked with work motivation [11] : the needs for autonomy, competence, and relatedness. Additionally, we also included evidence from the remote work literature [12] , [13] , [14] , and recommendations by health and work authorities [15] , [16] , [17] .

We analyzed our data using a range of different statistical approaches tailored to the specific questions. Specifically, to test whether well-being, productivity, and 15 variables including loneliness, needs, and social contacts have changed, we used 17 within-subject ANOVAs. To test whether well-being and productivity would be predicted over time by any of the 15 carefully selected variables (see section III-B), we used six cross-lagged panel models. To assess whether there are any mean differences between women and men and participants living in the UK and USA, we used a series of between-subject t-tests. Results suggest that developers' well-being and their quality of social contacts increased throughout the pandemic (i.e., between April 2020 and July 2021), while their emotional loneliness decreased. Productivity remained unchanged. Further, only stress at time 2 predicted developers' well-being at time 4. Finally, we found no mean differences between women and men or people living in the UK and USA for any of the 17 variables we measured across all four waves. This article has the following structure. Section II discusses the related work of well-being and productivity in the related work literature, as also recent advancements in the software engineering community. The Research Design and Analysis is then described in Section III. Following, in Section IV, we discuss the results of our analyses, as the implications and recommendations for professionals and software houses in Section V. Finally, we conclude our work by outlying future research directions in Section VI.

Following the abrupt onset of the COVID-19 pandemic and subsequent lockdowns, COVID-19 related research has expanded rapidly. Health scientists started investigating countermeasures to reduce the spread and impact of the virus and studied the psychological and physiological effects on people living in lockdown conditions. Also, in the software engineering community, the effect of the pandemic on software developers has gained increased attention. After describing the state of the art of the research on Well-Being and Productivity in Remote Work, we focus on the software engineering contributions.

There is a consensus that lockdown measures have a negative impact on well-being [1] , [18] . In particular, research shows that living in a lockdown can result in increased experiences of anger, depression, emotional exhaustion, fear of infecting others or getting infected, insomnia, irritability, loneliness, low mood, post-traumatic stress disorders, and stress [19] , [20] , [21] , [22] , [23] , [24] . Additionally, fears of e.g., infection [25] , [26] , lack of supplies or not being treated [27] , and misleading or contradictory information [28] can result in significantly increased stress levels. Moreover, the psychological effects of being locked down may appear years after [1] .

On the other hand, pre-COVID research shows that remote working is associated with an improved work-life balance, creativity, productivity, reduced stress, and low carbon emissions due to the absence of commuting [29] , [13] , [14] , [30] , [31] , [32] . Nevertheless, there are also some apparent drawbacks related to remote work, such as deteriorating collaboration and communication, loneliness, feeling of being constantly 'online,' decreasing motivation, and distractions at home [33] . Besides such aspects, forecasts suggest that remote work will increase on a large scale in the next years [29] , [34] .

For this reason, research opportunities are extensive, also in the years to come. There are plenty of open questions, such as which variables and the extent to which these variables influence well-being and productivity in combination. Studying, e.g., the stress in remote work, without considering all the variables involved, provides little overall guidance for software engineering teams because it is unclear whether stress is more strongly related to well-being than, for example, loneliness or anxiety. Therefore, the presented paper studies these variables together rather than separately to identify the variable(s) most strongly associated with well-being and productivity.

Overall, the software engineering community has been quite active in researching pandemic-related aspects. We identified relevant work through Scopus and arXiv (considering that this research topic is highly contemporary, some papers are still under review).

The first works in this research area are from the late 90s with broader use of the internet. Pounder (1998) [35] was the first relevant contribution we identified, with an essay about security problems linked to telework. In the early 2000s, Guo (2001) [36] performed two qualitative surveys on software process improvement related to the distinctive nature of teleworking. Similarly, Higa et al. (2000) [37] studied how e-mail usage influences telework.

Afterward, there has been a twenty-year gap, with only two exceptions. James & Griffiths (2014) [38] developed a mobile execution environment to support a secure and portable working from home setting. Ford et al. (2019) [39] interviewed three transgender software engineers to explore the interplay of gender identity and remote work.

Following the start of the pandemic and the first lockdown, two research groups performed survey studies. [4] performed a cross-sectional study of over two thousand globally distributed developers working from home during the pandemic where an a priori research model derived by literature was validated through Structural Equation Modeling. Russo et al. (2020) [8] went in the opposite direction. Rather than having a top-down model to validate, they employed an exploratory approach looking at the most relevant variables related to either well-being or productivity and analyzed the data through a longitudinal design.

Microsoft has also been active in understanding the effects of the pandemic on its employees. Ford et al. (2020) [40] surveyed Microsoft's developers twice. They found that the quality of family life and time improved, although remote work introduced a lack of focus, poor work-life boundaries, and communication and sync issues. Similarly, Miller et al. (2021) [41] performed two surveys in which they collected information about working from home and team-related issues. They found that communication and interaction with colleagues are relevant predictors of developers' satisfaction and team productivity. Butler & Jaffe (2021) [42] conducted a 10-week diary study. Identified challenges from remote work were meetings, overwork, and physical and mental health. However, Microsoft developers appreciated more family time and work flexibility.

More recent studies focus on particular aspects of remote work. For example, Cucolas , & Russo (2021) [43] , with a Mixed-Methods research design, investigated how Scrum software development adapted to working from home. According to their results, the home-working environment is the most crucial variable for a software project's success. Also, self-determination theory [10] (i.e., the need for autonomy, competence, and relatedness) is a valuable theoretical lens to improve working from home conditions, as they are linked with well-being [44] , for example. Finally, Machado et al. (2021) [45] surveyed 233 Brazilian software professionals and investigated gender differences. They concluded that the pandemic affected women more negatively than men. In contrast, Russo et al. did not found any meaningful gender differences [8] .

From a content perspective, half of the papers are concerned with specific topics related to remote work i.e., security [35] , [38] , process [36] , work productivity [37] , and inclusion [39] . Where the other half focused on well-being and productivity aspects of remote work [40] , [4] , [8] , [42] , [45] , [46] and productivity related to project characteristics [47] , [43] .

To design our research, we followed the ACM SIGSOFT Empirical Standards for Longitudinal Studies [48] . Consequently, we asked carefully recruited software professionals to complete the same survey four times, over a period of 14 months. Wave 1 was collected between 26-30 April 2020, wave 2 between 10-13 May 2020, wave 3 between 24 February and 3 March 2021, and wave 4 between 29 June and 5 July 2021. Wave 1 and 2 were only two weeks apart since we were initially only interested in the stability of predictors of well-being and productivity. Wave 3 was collected in late winter 2021 when the number of COVID-19 cases in most Western countries decreased again, and wave 4 was when a significant part of people in Western countries had received an offer to get vaccinated. Unique randomized IDs were assigned to participants to preserve their anonymity and track their participation across all four waves.

The sample size was initially determined to be able to detect a small-to-medium effect size of f = .15 for a repeatedmeasurement (within-subject) ANOVA, using a power of .80 and a corrected α− level of .004 (see section III-C1 for a justification of the lower α−level ). A power analysis using G*Power [49] revealed that we would need a sample size of at least 102 participants who participated in all four data collection waves. We selected participants from a pool of over 500 software engineers as previously identified [50] . These informants have been selected through a multi-screen process, where we assessed for representativeness through prescreening (both in terms of computer programming experience and profession, but also task quality on the data collection platform), competence screening (competency-based questions on software design and programming), and quality screening (attention checks). Through additional screening questions, we subsequently narrowed this pool down to 192 professionals. In particular, we looked for informants who were working from home during the pandemic for at least 50% of their time and did not live in countries with jeopardized COVID regulations (e.g., Germany Table I . We ensured high data quality by recruiting participants from the data collection platform Prolific Academic [51] and compensated participants above the USA's minimum wage. Additionally, none of our participants failed any attention checks or completed the survey in a concise time, which further ensures the quality of our data. The survey was run using the platform Qualtrics.

To collect the data, we attained ourselves to the ethical guidelines of the Declaration of Helsinki [52] . All participants were at least 18 years old and expressed their consent to participate in the study each time. Also, they were free to withdraw at any point. The lead author also completed formal training in research ethics for engineering and behavioral sciences.

Well-being and productivity are two complementary variables of a healthy working environment. Not surprisingly, they are correlated [8] . Especially in exceptional times, such as a pandemic, organizations should prioritize employees' mental and physical well-being if they want to be productive. On the other hand, as suggested by Russo et al. [8] , contributing to the organization's value is important for the sense of belonging or achieving of every developer. Therefore, productivity does also contribute to professionals' well-being [8] .

Consequently, productivity and well-being are our two outcome variables (i.e., dependent variables). To identify relevant predictors (or our independent variables) of our dependent variables, we started from the insights of Russo et al. [8] . Namely, we included in this analysis only the 15 (out of 50) predictors which correlated with at least one of the outcome variables (i.e., r≥|.30|) [8] . This was done to keep the number of predictor variables to a manageable amount. All variables were measured using self-reported measures, which is very common in the literature [4] , [50] . The internal consistency of the scales was quantified with Cronbach's α and ranged from satisfactory to very good. Values above .60 and .70 are desirable for exploratory and confirmatory research, respectively [53] .

To measure the identified variables we only used either validated scales or adapted items from scales used in previous publications with high reliabilities. The only exceptions were 'productivity', 'quality and quantity of communication with colleagues and line managers', and 'daily routines' for which we created our own items because we could not find existing scales suitable for our purposes. Responses were mostly given on 5-, 6-, or 7-point response scales with higher values indicating a higher score on each variable. Every scale is briefly subsequently described with its name, reference, and reliability metrics (i.e., Cronbach's alpha) across all the four data collection waves. In this paper, we use the terms 'wave' and 'time' interchangeably. For a detailed descriptions of the items see Russo et al. [8] .

Well-being. We measured well-being with the 5-item Satisfaction with Life Scale [54] . Participants were asked to report their well-being using items such as "I was satisfied with my life in the past week" on a 7-point Likert scale (1: Strongly disagree, 7: Strongly agree). The Cronbach's α values to measure internal consistency for all four data collection waves were the following α time1 = .90, α time2 = .90, α time3 = .92, α time4 = .94.

Productivity. There is no agreement among researchers on how productivity can be measured. For example, measuring productivity in an allegedly objective way by using function points [55] has been criticized as detrimental in the long run [56] . Further, the objective approach is barely feasible if participants work in different areas since comparisons across work are very challenging. Therefore, other researchers advocated using self-reports [57] , which has apparent shortcomings such as subjectivity. In the present research, we developed a subjective approach to reduce social desirability by making the survey anonymous. Specifically, we operationalized productivity as a function of time spent working and efficiency per hour, compared to a typical, pre-pandemic week. The reason for this choice is that we wanted to investigate productivity while working remotely as compared to being in the office. Since our measure does not allow to compute internal consistency, we instead computed test-retest reliability by correlating the productivity scores at time 1 with those at time t2 (r it = .50, p < .001).

Boredom was measured with the Boredom Proneness Scale [3] , [58] ; α 1 = .87, α 2 = .87, α time3 = .92, α time4 = .90.

Self-blame and behavioral disengagement, two coping strategies, were measured with the respective subdimensions of the Brief COPE scale [59] . Cronbach's α's for self-blame were α 1 = .75, α 2 = .71, α time3 = .92, α time4 = .92, and for behavioral disengagement α 1 = .76, α 2 = .71, α time3 = .89, α time4 = .91.

Distractions at home was measured with a 2-item scale we developed (α 1 = .64, α 2 = .63, α time3 = .75, α time4 = .65.

Generalized anxiety was measured with an adapted version of the 7-item Generalized Anxiety Disorder scale [60] ; α 1 = .93, α 2 = .93, α time3 = .94, α time4 = .95.

Emotional and social loneliness were measured with the De Jong Gierveld Loneliness Scale [61] . Emotional loneliness' Cronbach's α-levels were: α 1 = .68, α 2 = .69, α time3 = .68, α time4 = .73, and for social loneliness:

Autonomy, competence, and relatedness were measured with the psychological needs scale [62] . Need for autonomy's Cronbach's α-levels were: α 1 = .72, α 2 = .76, α time3 = .77, α time4 = .78; for Competence: α 1 = .77, α 2 = .65, α time3 = .77, α time4 = .79; and for Relatedness:

Quality of social contacts were measured with 3-items, two of which were adapted from the social relationship quality scale [63] and one was developed by us, α 1 = .73, α 2 = .77, α time3 = .76, α time4 = .84.

Quality and quantity of communication with colleagues and line managers were measured with a self-developed 3-item scale (α 1 = .88, α 2 = .92, α time3 = .93, α time4 = .94).

Stress was measured with the Perceived Stress Scale [64] ;

Daily Routines were measured by a self-developed 5-item scale (α 1 = .75, α 2 = .78, α time3 = .81, α time4 = .78.

Extraversion was measured with a subscale of the Brief HEXACO Inventory [65] ; α 1 = .71, α 2 = .69, α time3 = .75, α time4 = .61.

In total, we used three different types of analyses, which seemed most appropriate to us, to answer our research question and to perform additional exploratory analysis. Below, we briefly describe and justify each of them.

Raw data, R-code to reproduce our analyses, and the zeroorder correlations for all 17 variables, separately per wave and across all data collection waves, are included in the supplemental materials.

1) Changes along the COVID-19 Pandemic: To test whether any change between the four data collection waves occurred, we ran a series of 17 repeated-measures ANOVAs, one per variable. This allowed us to test if, for example, software engineer's wellbeing increased, decreased, or remained the same. Additional to the common descriptive (means and standard deviations) and inferential statistics (F-value 1 and p-value), we report as an effect size how many participants report a higher, lower, or equal level of any variable at time 4 compared to time 1. Given the number of 17 tests of variables, which are, however, mostly correlated with each other, we set our α-level to .004. That is, we only consider findings to be significant if p < .004. This threshold is, in our view, neither conservative nor liberal. However, we acknowledge that other researchers might prefer a more conservative or liberal threshold. We, therefore, report the exact p-values, which allows researchers to select different thresholds.

2) Exploring causality: To test whether any of the 15 predictor variables predict our two outcome variables at time 4, well-being and productivity, we ran six cross-lagged panel models in which we regressed well-being or productivity at time 4 onto all predictors, and, crucially, both outcomes. It is essential to also include, for example, well-being at time 1 as a predictor for well-being at time 4, because otherwise, we might erroneously conclude that, for instance, anxiety is related to only the aspects of anxiety that are correlated with well-being. We realize that there are different views about inferring causality between two variables, A and B. While some have a stringent view on causality, which requires being able to rule out that any third variable is responsible for the association between A and B (for an overview see [66] ), others argue that it is sufficient to show that A is correlated with B and A is measured before B [67] . A middle point is to argue that A measured at time 1 needs to predict B at time 2 while controlling for B measured at time 1, to be able to state that A causally predicts B [68] . We use this view and go a step further by also controlling for a range of other variables. This approach has two advantages over a series of models with only one predictor (e.g., stress) and one outcome (e.g., wellbeing), which are also common in the literature. First, using only one variable as predictor as opposed to 15 would have resulted in many more models and we would had therefore needed to control for many comparisons. Second, by controlling for many related variables, our approach is conservative as it focuses on the unique impact of each predictor variable. For example, by simultaneously including anxiety, stress, loneliness alongside other variables as predictors of well-being into the same model, we focus on the unique impact of each predictor on well-being. Further, we only focused on well-being and productivity at time 4 because it is the most recent wave, and it is crucial to allow the outcomes to vary (most measured variables are stable over time [8] ). If, for example, well-being at time 1 would be very highly correlated with well-being at a subsequent time, there would be minimal variance for the other predictors to explain because well-being at time 1 would already explain most of the variance. Thus, we ran two cross-lagged panel models (one with well-being and one with productivity as predictor) with variables measured at time 1, two with variables measured at time 2, and two with variables measured at time 3 as independent variables. Given the total amount of six comparisons, we set our α-threshold to .008. Note that we are adjusting the α-threshold based on the number of comparisons per type of analysis. For example, we ran six cross-lagged panel models, but 17 repeated measures ANOVAs. Thus, the α-threshold had to be different.

3) Between-group comparisons: Additionally, we compared women and men, and people living in the United Kingdom and the USA (these were the two countries from which relatively most of our participants came from) across all 17 variables and all 4 time points, resulting in 2 × 4 × 17 = 136 betweensubject t-tests. We, therefore, adjusted our alpha−threshold to .0005. To address recent calls to report effect sizes that display similarities to avoid a one-sided focus on potentially small differences [69] , we also report the effect size Percentages of Common Responses (PCR) alongside the more common effect size Cohen's d. PCR is a measure of overlap between two groups (e.g., women and men) and ranges from 0 (no overlap/similarities) to 100 (both groups overlap perfectly).

In the first step, we tested for construct validity by correlating all 17 variables with each other separately for each data collection wave. The zero-order Pearson correlations across all waves were as expected. For example, well-being correlated negatively with stress, loneliness, and boredom, and positively with need for autonomy, competence, and relatedness, which is in line with the literature [70] , [71] , [8] . Details of those tests are in the Supplementary Materials.

The results of the 17 repeated-measures ANOVA are displayed in Table II . Four of the ANOVAs were significant. Well-being (Fig 1) , quality of social contacts (Fig 5) , and selfblame (Fig 3) increased and emotional loneliness decreased (Fig 4) . Behavioral disengagement was higher at time 2 than at time 1 but went down to the starting point at times 3 and 4. For well-being, for example, 77 developers reported higher levels at time 4 than at time 1, 38 lower levels, and 9 an equal amount of well-being (cf. Tab. II). In contrast, productivity remained stable over time (Fig 2) . 2

We ran six cross-lagged panel models to test which variable causally explains well-being and which productivity at time 4. In the first model, we used well-being as outcome and all 17 variables listed in Table II In the fourth model, we used productivity at time 4 as outcome and all 17 variables listed in Table II measured at whereas the box at each time point shows the range in which the middle 50% of the data falls. A productivity score of one indicates that productivity has not changed compared to pre-pandemic levels, scores > 1 that productivity increased and scores of < 1 that productivity decreased. time 2 as predictors. The overall model was not significant, R = .17, adj.R 2 = .04, F (17, 116) = 1.37, p = . 17 .

In the fifth model, we used well-being as outcome and all 17 variables measured at time 3 as predictors. The overall model was significant, R 2 = 52, adj.R 2 = .43, F (17, 88) = 5.58, p < .001. Non-surprisingly, well-being at time 3 predicted well-being at time 4, B = 0.38, SE = .13, p = .005. However, none of the other variables was significant.

In the sixth and final model, we used productivity at time 4 as outcome and all 17 variables measured at time 3 as predictors. The overall model was significant, R 2 = .37, adj.R 2 = .24, F (17, 86) = 2.96, p < .001. Productivity at time 3 predicted productivity at time 4, B = 0.46, SE = .11, p < .001.

Finally, we compared women and men (results are summarized in Table III ) and people living in the United Kingdom and the USA in Table IV (these were the two countries from 

Building on the collected evidence and the previous literature, we discuss the implications of our investigation for software professionals and organizations. Furthermore, we explain the intrinsic limitations of this study and how we tried to cope with those.

The readers should be aware that our findings are based on group-level inferences, which do not always generalize to the individual level. For example, the results of the withinsubject ANOVAs inform us whether the average of a variable changed over time, not whether all individuals changed in the same direction. As can be seen in Table II , while the wellbeing of 77 developers increased between time 1 and 4, the well-being of 38 developers dropped. Thus, it is only more likely (i.e., approximately twice as likely) to find a developer whose well-being increased instead of dropped. Interestingly, the change over time was not always linear. For example, emotional loneliness first went down between Time 1 and Time 2, then slightly up again at Time 3, and finally down again. This might be because many countries started to (announce plans to) open up again around the time when we collected the second wave. In contrast, the third wave was collected in February 2021: In the UK and USA, for example, in the winter 2020/21 the deaths of many more people was associated with COVID-19 compared to spring 2020. Similarly, many situational factors or variables we have not measured, such as the perceived severity of local lockdowns or loss of a loved one (e.g., because of COVID-19), would likely have explained additional variance in developers' well-being and productivity. Nevertheless, we aimed to provide generalizable evidence with this longitudinal study. However, qualitative investigations (e.g., [41] , [40] , [42] ) add to a nuanced understanding of individual phenomena. When drawing company guidelines, these and other studies should also be considered since our recommendations will not be exhaustive.

Based on our results, we provide recommendations for the software engineering community (cf. also Table V) .

We found that developers' well-being increased over time. We have no pre-pandemic data, so we can not assess how the lockdown initially impacted software professionals. It could be that their well-being went down in Spring 2020 and is now bouncing back to pre-pandemic times. This reasoning would be in line with previous research showing that people's well-being usually bounces back after a significant negative [72] . While research from the start of the pandemic (i.e., spring 2020) indicates that developers' well-being decreased initially [4] , our findings provide a more positive outlook that developers' well-being bounced back. Our findings also suggest that working from home does not negatively impact developers' well-being, as otherwise well-being would not have increased as much. This supports new company policies implementing hybrid or full remote work settings. Productivity remained constant during the pandemic. Although we report a slight increase in productivity over the four data collection waves (as plotted in Figure 2 ), and more people reported an increase in productivity compared to those who reported a decrease (cf . Table II) , the mean differences are non-significant (p = .13), indicating that the observed increase could very well be random and might not replicate. Since measuring productivity is non-trivial, we followed a previous study example [8] by measuring productivity as a self-reported function compared to the pre-pandemic period. We therefore conclude that the productivity level of software professionals did not change not only throughout the lockdown but also compared to the pre-pandemic time. This finding also contradicts previous research suggesting that the lockdown is detrimental for productivity [48] , possibly because of differences in the research design (cross-sectional vs. longitudinal design) and operationalization of productivity (Ralph et al. [48] used a different measure of productivity). Alternatively (or additionally), we collected our sample approximately one months after Ralph et al. [48] and predominantly from countries which were relatively underrepresented in the sample of Ralph et al., who recruited most of their participants from Germany, Russia, and Brazil. Our results substantiate our previous conclusion that a hybrid or full remote working environment would not per se harm the productivity levels of developers.

Even though all typical welfare support (e.g., childcare, schools, sports facilities) was closed, software engineers showed a high level of adaptation by keeping the same productivity levels and steadily increasing their well-being levels. Consequently, in a post-pandemic working from home context, with all support facilities normally running, working from home is very unlikely impacted developers' well-being negatively. Qualitative findings support this argument, suggesting that working from home significantly improved work-life balance [40] . Similarly, a large-scale cross-sectional study observed that 89% of the surveyed professionals would like to continue to work remotely (especially in a hybrid fashion), also in the future to come [73] . However, previous research regarding the impact of working from home on productivity is mixed. Some studies found that working from home is positively or unrelated to productivity [47] , [74] , [75] , [8] , whereas other research found that working from home has some negative effects [76] , [77] , [78] ).

Software professionals felt less lonely and improved their social contacts. During the first lockdown in Spring 2020, many people had to abruptly reduce their social interactions [79] . As a consequence, this increased the sense of loneliness and isolation. Nevertheless, also, in this case, developers showed a high level of resilience. Indeed, we report a significant decrease in emotional loneliness and an increase in the quality of social contact. This means that software engineers increasingly reached out to their social contacts when they felt lonely, thereby coping well with the challenging conditions of the pandemic. Similarly, the quality of their relationships increased. This is important because having a reliable social support network is an essential coping mechanism, especially in hard times and in moments of high stress [80] , [81] .

These findings are relevant for organizations planning to implement a hybrid or remote work policy. Software engineers showed a high level of resilience when coping with unexpected events. At the same time, their social network was a crucial support while working from home. This insight is also supported by previous research, where communication was found a relevant predictor for developers' satisfaction during the lockdown [41] . Consequently, a proactive company policy of employees' inclusion would sustain their well-being levels. This would require a particular effort from the middle management (because they are the direct company interface for each employee) to ensure that every team member can express herself and maintain stimulating and nurturing relationships with their peers, since even interacting with weak social ties (i.e., acquaintances) can improve people's well-being [82] .

Moreover, we also found that self-blame increased. This finding was unexpected and might relate to the phenomenon known as survivor's guilt [83] , which has been observed, for example, among caretakers of cancer patients and is positively associated with remorse [84] . We speculate that self-blame is positively associated with survivor's guilt (e.g., of not having been affected by and remorse and might be stronger among those developers who experienced loss (e.g., a relative who died because of COVID-19). Mindful organizations might offer employees psychological support to address the guilt and remorse specifically.

Stress is our only significant factor detrimental to wellbeing. Although this result is not surprising per se, it is critical. It provides evidence that stress and stress factors, in general, were the most significant harm when working from home during the pandemic. Therefore, to effectively sustain the employees' well-being, it should be a company priority to reduce stress levels. There are different approaches that organizations can implement to tackle this crucial issue. According to Halpern [85] , flexibility has been very effective in reducing work-related stress for both men and women. Moreover, a high level of flexibility leads to high employee commitment which reduced organization costs and missed deadlines. Similarly, Coetzer and Rothmann found that a high level of organizational control over employees' tasks was negatively related to organizational commitment and stress. Pay structure and job insecurity were also found to be highly stressful for knowledge workers [86] . On this aspect, career management along with professional expectation management are considered to be critical pillars of human resource management to decrease stress levels [87] . Additionally, mindful practices, also in the workplace or at home, can reduce stress and enhance sleep quality [88] . Overall, organizations have several tools to reduce developers' stress by providing a high degree of autonomy in both task and work schedules and manage fair and transparent expectations (along with job stability and fair pay).

We have not observed differences between women and men. No significant mean differences were found between men and women in all surveyed variables. This suggests that the lockdown did not affect one gender more in our sample. This is surprising since abundant other research found that women are more impacted by the pandemic, especially career-wise [89] , [90] . For instance, a recent Brazilian cross-sectional survey study concluded that women suffered more the lockdown due to a higher involvement in housekeeping duties compared to men [45] . In contrast, we found a very high level of similarity between genders. We believe that this discrepancy arises because the women in our sample are not representative. For example, an inclusion criterion of ours was that they work at least 20h/week. So we have a very specific group of women in our sample (probably with fewer kids or with someone who takes care of their kids so that they can work). This result is also encouraging because similarities between women and men, even if not representative, can increase women's sense of belonging and a higher likelihood to pursue a career in a men-dominated field [91] . Also, for this reason, software companies should not use gender-biased communication when offering home support, implying that most works at home are on women.

We found no mean differences between the UK and USA. We found high levels of similarities in how software professionals were impacted in the USA and the UK. This might be the result of the reliance of national health authorities on the World Health Organization, making lockdown measures fairly uniform between both countries. Consequently, global software companies can homogeneously plan policies in case of a future disastrous event across countries. However, they should take individual differences into account, as in both countries, developers are reporting higher well-being and productivity and those reporting lower well-being and productivity (i.e., the within-country variability outweighed the between-country variability). Also, we have no evidence suggesting that there should be any difference between the USA and the UK in working from home policies.

In the following section, we discuss the most relevant limitations of this work.

Reliability. For this investigation, we employed a four wave longitudinal design. Informants have been identified through a multi-stage selection screening to ensure they were representative of the software engineering population. Also, we computed an a priori power analysis to identify the minimum number of participants required to provide reliable conclusions. The internal consistencies (i.e., Cronbach's α) ranged from satisfactory to very good.

Construct validity. For this study we used 15 variables previously identified in the literature that are related to wellbeing and productivity. For any variable, we used a dedicated measurement instrument. Construct validity was assessed by correlating all variables with each other, separately in each wave. The correlations were in the expected directions and in line with the literature [70] , [8] , [71] .

Conclusion validity. We draw the conclusions based on a number of statistical analyses: within-subject ANOVA, crosslagged panel model, and between-subject t-tests. To increase the trustworthiness of our results, we adjusted our alpha-thresholds to reduce the risk of false positives (i.e., Type I errors). In terms of data collection, some variations might have been out of our control since lockdown measures were not uniform in different countries. To address this issue, we only selected participants living in countries that during the first wave had similar regulations (we excluded, e.g., Sweden, Denmark). Nevertheless, minor variations in terms of rules happened during the pandemic in the different countries we had no control over. However, we report very similar results when looking at between-country mean differences. Our conclusions are reproducible since we made the anonymized raw data and R analysis code openly available on Zenodo.

Internal validity. We only found one instance in which one predictor (stress) causally predicted an outcome (well-being) over time. This might be because of our conservative approach (e.g., correcting for multiple comparisons and controlling for many other related variables). Of importance, we recognize that there is an ongoing debate on what constitutes causality. Therefore, we are aware that some readers might dislike the term 'causality' and prefer instead a 'softer' term such as 'predicted over time. Our study relies on self-reported measures, limiting the validity due to potential response biases. Although our informants have been initially identified in other work [50] , we applied several quality checks also after each time point. Additionally, we searched for inaccurate or unlikely responses (of which we found none, which ensures data quality). The attrition rate across the four waves is comparable to other longitudinal studies across a similar timespan [93] , [94] . Due to the evolving nature of the pandemic, data collection has been performed based on the information available by that point in time. As a consequence, the time spans are not homogeneous but represent moments of the pandemic where data collection seemed to be representative of the pandemic trend. This might have affected the variability of our data.

External validity. The primary aim of our longitudinal analysis was to maximize internal validity by finding significant effects. Thus, we did not look to work with a representative sample of the software engineering population (e.g., such as Russo & Stol did with N ≈ 500 to generalize their findings [50] ).

In this investigation, we performed a four-wave longitudinal study over 14 months from the start of the COVID-19 pandemic in April 2020 to July 2021, involving 192 software developers. We analyzed how well-being and productivity of software engineers and 15 related social and psychological variables changed over time. Similarly, we explored causal relations among our variables and performed gender and country-based between-group comparisons.

We found that well-being, quality of social contacts, and selfblame increased over time while emotional loneliness decreased. We further found that only stress measured at time 2 causally 

Developers' well-being increased Well-being consistently increased across all four time points, indicating that they bounced back from the negative impact the pandemic likely had on their well-being initially.

Developers showed a high level of resilience when working from home and improved their well-being. Software companies can extensively implement (hybrid) working from home practices.

Productivity remained unchanged

Developers' productivity has not changed across all four time points.

Working from home is not per se detrimental for productivity. If organizations keep reasonable work expectations, professionals will be as productive at home as in the office.

This suggests that developers managed to reduce their loneliness, presumably by improving the quality and quantity of their social interactions.

Active inclusion policies should be set in place for employees working from home. Mainly middle-management should focus on individual employees performance and their level of integration and communication with the team.

Stress decreases wellbeing levels

Stress at time 2 negatively predicted well-being at time 4. This suggests that stress can have a longlasting impact on developers' well-being.

Reducing professionals' stress levels should be the key priority of every organization. Practices such as flexibility, clear expectation, career management, transparent and fair pay structure, as well as mindfulness exercises can be effective.

Self-blame increased Levels of self-blame increased over time.

Software organizations might offer to employees psychological support to investigate the reasons for self-blame.

Men and women are similar across all measured variables This is in line with the gender similarity hypothesis [92] that women and men are across most variables (e.g., well-being related, ability, personality) more similar than different.

When planning for home-support policies, organization should not use biased communication implying that women are most affected. This can increase the feeling of fitting in, which in turn can increase girls' and women's intention to pursue a career in a men-dominated field [91] .

No country difference (USA vs UK) when dealing with the pandemic

Our findings indicate that people living in the UK and the USA were impacted and 'recovered' from the initial shock of the pandemic to a similar extent.

Especially during another disastrous event, organizations can plan the same remote work strategies across both countries.

predicted well-being at time 4. Finally, we found that women and men and people living in the UK and USA did not differ for any of the variables we measured across all four data collection waves. The significance of our conclusions lies in the extensiveness of our investigation (i.e., over one year) during most of the COVID-19 pandemic (as of 2021). We carefully selected our informants after an a priori power analysis to ensure the trustworthiness of our results and adjusted our alpha level to avoid false-positive results and misleading recommendations. So far, this is the most complete longitudinal analysis involving software engineers to understand the effects of the COVID-19 pandemic on their well-being and productivity. Moreover, our results are relevant in case of another disastrous event, but they also help the software engineering community to provide better-informed recommendations for future Working from Home policies after the pandemic.

Future works will therefore focus on a prolonged assessment of the working conditions of our pool even after the pandemic. Also, more nuanced understandings of phenomena we could not explain (i.e., increased behavioral disengagement at time 2) is necessary to include more relevant variables to understand the underlying mechanisms or qualitative research designs, for example.

The complete replication package is openly available under CC BY 4.0 license on Zenodo, DOI: https://doi.org/10.5281/ zenodo.5713923.

The psychological impact of quarantine and how to reduce it: rapid review of the evidence

Defining the epidemiology of Covid-19 -studies needed

Boredom proneness-the development and correlates of a new scale

Pandemic programming: How COVID-19 affects software developers and how their organizations can help

Disrupted work: home-based teleworking (hbtw) in the aftermath of a natural disaster

Employee wellbeing, productivity, and firm performance

Improving employee wellbeing and effectiveness: systematic review and meta-analysis of webbased psychological interventions delivered in the workplace

Predictors of well-being and productivity among software professionals during the COVID-19 pandemic-a longitudinal study

Motivation to work

Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being

Self-determination theory and work motivation

Monotasking or multitasking: Designing for crowdworkers' preferences

The impact of telework on emotional experience: When, and for whom, does telework improve daily affective well-being?

Does working from home work? evidence from a chinese experiment

Mental wellbeing while staying at home

Questions and answers on novel coronavirus

Smitsomme-sygdomme-A-AA/Coronavirus/Spoergsmaal-og-svar/ Questions-and-answers

Getting the most from remote working

Using behavioral science to help fight the coronavirus

Posttraumatic stress disorder in parents and youth after health-related disasters

Sars control and psychological effects of quarantine, toronto, canada

The experience of sars-related stigma at amoy gardens

The relevance of psychosocial variables and working conditions in predicting nurses' coping strategies during the sars crisis: an online questionnaire survey

Understanding, compliance and psychological impact of the sars quarantine experience

Survey of stress reactions among health care workers involved with the sars outbreak

Public risk perceptions and preventive behaviors during the 2009 h1n1 influenza pandemic

A social-cognitive model of pandemic influenza h1n1 risk perception and recommended behaviors in italy

Knowledge, attitudes, and practices among members of households actively monitored or quarantined to prevent transmission of ebola virus disease

The factors affecting household transmission dynamics and community compliance with ebola control measures: a mixed-methods study in a rural village in sierra leone

State of remote work 2019

A within-person examination of the effects of telework

Teleworking: benefits and pitfalls as perceived by professionals and managers

Managing a virtual workplace

The 2020 state of remote work

Is working remotely effective? Gallup research says yes

Homeworking: No longer an easy option?

Special requirements for software process improvement applied in teleworking environments

Understanding relationships among teleworkers'e-mail usage, E-Mail richness perceptions, and E-Mail productivity perceptions under a software engineering environment

A secure portable execution environment to support teleworking

How remote work can foster a more inclusive environment for transgender developers

A tale of two cities: Software developers working from home during the COVID-19 pandemic

how was your weekend?" software development teams working from home during covid-19

Challenges and gratitude: A diary study of software engineers working from home during covid-19 pandemic

The impact of working from home on the success of scrum projects: A multi-method study

Affirming basic psychological needs promotes mental well-being during the covid-19 outbreak

Gendered experiences of software engineers during the covid-19 crisis

Socially connected and covid-19 prepared: The influence of sociorelational safety on perceived importance of covid-19 precautions and trust in government responses

How does working from home affect developer productivity?-a case study of baidu during COVID-19 pandemic

Empirical standards for software engineering research

Statistical power analyses using g* power 3.1: Tests for correlation and regression analyses

Gender differences in personality traits of software engineers

Prolific.ac--a subject pool for online experiments

World medical association declaration of helsinki: ethical principles for medical research involving human subjects

Multivariate data analysis

The satisfaction with life scale

A systematic review of productivity factors in software development

Why we should not measure productivity

Software developers' perceptions of productivity

A short boredom proneness scale: Development and psychometric properties

You want to measure coping but your protocol's too long: Consider the brief COPE

A brief measure for assessing generalized anxiety disorder: the gad-7

A 6-item scale for overall, emotional, and social loneliness: Confirmatory tests on survey data

The balanced measure of psychological needs (bmpn) scale: An alternative domain general measure of need satisfaction

Relationship quality profiles and well-being among married adults

Perceived stress in a probability sample of the United States

The 24-item brief HEXACO inventory (BHI)

Thinking clearly about correlations and causation: Graphical causal models for observational data

Testing for causality: a personal viewpoint

A critique of cross-lagged correlation

A new way to look at the data: Similarities between groups of people are large and important

Beyond the hedonic treadmill: Revising the adaptation theory of well-being

Why loneliness is hazardous to your health

Does happiness adapt? a longitudinal study of disability with implications for economists and judges

New Zealanders' attitudes towards working from home

Why working from home will stick

Home sweet home: Working from home and employee performance during the covid-19 pandemic in the uk

A self-determination theory approach to understanding stress incursion and responses

Work from home & productivity: Evidence from personnel & analytics data on it professionals

Working from home: Its effects on productivity and mental health

Productivity of working from home during the covid-19 pandemic: Evidence from an employee survey (japanese)

Can psychological traits explain mobility behavior during the covid-19 pandemic?

Assessing coping strategies: a theoretically based approach

Social interactions and well-being: The surprising power of weak ties

Survivor guilt

Survivor's guilt in caretakers of cancer

How time-flexible work policies can reduce stress, improve health, and save money

Occupational stress of employees in an insurance company

Career management and survival in the workplace: Helping employees make tough career decisions, stay motivated, and reduce career stress

Mindfulness in motion: A mindfulness-based intervention to reduce stress and enhance quality of sleep in scandinavian employees

The covid-19 pandemic has increased the care burden of women and families

Where are the women? gender inequalities in covid-19 research authorship

Why do women opt out? sense of belonging and women's representation in mathematics

The gender similarities hypothesis

Value stability and change during self-chosen life transitions: Selfselection versus socialization effects

Understanding the process of moralization: How eating meat becomes a moral issue

ACKNOWLEDGMENT This work was supported by the Carlsberg Foundation under grant agreement number CF20-0322 (PanTra -Pandemic Transformation).