key: cord-0844770-ncgql8mt authors: Bao, Lingfeng; Li, Tao; Xia, Xin; Zhu, Kaiyu; Li, Hui; Yang, Xiaohu title: How does working from home affect developer productivity? — A case study of Baidu during the COVID-19 pandemic date: 2022-03-14 journal: Sci DOI: 10.1007/s11432-020-3278-4 sha: 76d39a259281a225e06239951827c273a3af8d69 doc_id: 844770 cord_uid: ncgql8mt Nowadays, working from home (WFH) has become a popular work arrangement due to its many potential benefits for both companies and employees (e.g., increasing job satisfaction and retention of employees). Many previous studies have investigated the impact of WFH on the productivity of employees. However, most of these studies usually use a qualitative analysis method such as surveys and interviews, and the studied participants do not work from home for a long continuing time. Due to the outbreak of coronavirus disease 2019 (COVID-19), a large number of companies asked their employees to work from home, which provides us an opportunity to investigate whether WFH affects their productivity. In this study, to investigate the difference in developer productivity between WFH and working onsite, we conduct a quantitative analysis based on a dataset of developers’ daily activities from Baidu Inc., one of the largest IT companies in China. In total, we collected approximately four thousand records of 139 developers’ activities of 138 working days. Out of these records, 1103 records are submitted when developers work from home due to the COVID-19 pandemic. We find that WFH has both positive and negative impacts on developer productivity in terms of different metrics, e.g., the number of builds/commits/code reviews. We also notice that WFH has different impacts on projects with different characteristics including programming language, project type/age/size. For example, WFH has a negative impact on developer productivity for large projects. Additionally, we find that productivity varies for different developers. Based on these findings, we get some feedback from developers of Baidu and understand some reasons why WFH has different impacts on developer productivity. We also conclude several implications for both companies and developers. Working from home (WFH) is a work arrangement in which employees do not need to work at a central place (e.g., office building, warehouse, or store). WFH has various names, such as remote work, teleworking, or telecommuting. These terms are used differently and interchangeably from study to study [1] [2] [3] . Nowadays, since WFH is facilitated by many tools such as virtual private networks, cloud computing, and online meeting software, more and more companies allow their employees to work from home. A survey in 2018 from OWL labs shows that 52% of employees work from home at least once a week and 56% of companies allow remote work 1) . WFH can offer some benefits to both companies and employees, for instance, when employees can work from home, they feel more trusted and are better able to balance work and life responsibilities, which can increase employee retention and make them happier and more productive. WFH is also adopted by many IT companies, for instance, a recent report mentioned that Twitter announced staff can continue WFH permanently 2) . Developers can perform their daily tasks (e.g., writing code, debugging, building projects, and code review) as usual by remotely accessing resources of companies when working from home. WFH might have different impacts on productivity, which is a big concern of software developer organizations [4] . Understanding the difference of developer productivity when working from home and the reasons behind it can help improve the management of companies and projects, increase the job satisfaction of developers, and make developers more productive. The survey of OWL labs reported that employees who work remotely at least once a month are 24% more likely to feel productive in their roles than those who do not or cannot work remotely. On the contrary, WFH might have a negative impact on productivity. For example, it would decrease the efficiency of developer communication, which plays an important role in software development [5] . Many studies have investigated the impacts of WFH on productivity [6] [7] [8] [9] [10] . However, most of these studies use a qualitative approach (e.g., survey or interview) based on the feedback from general workers (not only developers). Additionally, the studied participants usually do not work from home for a long continuing time. In this study, we aim to investigate the impacts of developer productivity when working from home for a long time in a quantitative way. Due to the outbreak of coronavirus disease 2019 (COVID-19) 3) , which is an infectious disease caused by a newly discovered coronavirus, a large number of IT companies ask their employees to work from home, which provides us an opportunity to investigate how their productivity is affected when working from home for a long continuing time. In this study, we collect the data from Baidu, Inc., China, which contains the development activities of 107 developers in 70 working days. There is a part of records in this dataset on which developers work from home due to the COVID-19 pandemic. We compare developer productivity when working from home with working onsite in terms of multiple aggregated values such as mean, median of several metrics (e.g., the number of builds, commits, and code reviews). We summarize our findings and contributions as follows: • To the best of our knowledge, we are the first to investigate the impacts of WFH on developer productivity based on developers' daily activities. We find that WFH has both positive and negative impacts on developer productivity in terms of different metrics, such as the number of builds/commits/code reviews. • We investigate the impacts of WFH on projects with different characteristics including program language and project type/age/size and find that WFH has different impacts on different kinds of projects. For example, WFH has a negative impact on developer productivity for large projects. We also find that productivity varies for different developers. • We conclude the reasons why developers have different productivity when working from home and provide implications based on our findings and the feedback from Baidu. Paper structure. The remainder of the paper is structured as follows. Section 2 describes the dataset and research questions in this study. Section 3 presents the results of the analysis for the six research questions. Section 4 discusses implications and threats to validity. Section 5 briefly reviews related work. Section 6 concludes the paper and discusses future directions. In this section, we first present the dataset from Baidu. Then, we describe the research questions and their corresponding motivations. We collected a dataset of developers' daily activities from Baidu, Inc., which is the world's largest Chinese language Internet search provider 4) , the largest knowledge and information centered Internet platform company in China, and a world-leading artificial intelligence (AI) company. 2) https://www.bbc.com/news/technology-52628119 (accessed at August 7, 2020). 3) https://www.who.int/health-topics/coronavirus. 4) https://www.baidu.com/. As the world's largest Chinese language Internet search provider, Baidu responds to a huge amount of search queries from more than 100 countries and regions every day, serving as the most important way for netizens to access Chinese language information. With its mission to "make the complicated world simpler through technology", Baidu promotes constant technological innovation and is committed to being a world-leading technology company that understands its users and helps them grow. The dataset we get from Baidu contains 107 developers' daily activities from eight projects in 70 working days. Table 1 presents the overview of the dataset. As shown in this table, the time of WFH is from 2020/02/03 to 2020/03/01 because Baidu asked all its employees to work from home after the outbreak of the COVID-19 pandemic in China 5) . We refer to this part of records as to DATA WFH. On the other hand, we also got a list of records during which developers work onsite, i.e., from 2019/12/23 to 2020/02/02 (referred to as DATA ONSITE). The development activities in the dataset DATA WFH and DATA ONSITE are similar since they happened very closely. This dataset has a total of 2428 records. Among these records, there are 1325 and 1103 records that belong to DATA ONSITE and DATA WFH, respectively. Each record in the dataset has several metrics that represent the activities of a developer in one day. Table 2 presents the fields of a record. Each record has a date (data partition) on which a developer's activities are reported. The developers' names and their projects in the record are encrypted into unique IDs due to the security and privacy policy of Baidu so that we can still track records over time. Each record has the following numeric metrics: commit count, line inserted, line deleted, review count, build count, release count, compile count. Although these numeric metrics are dependent on various factors such as developer experience, programming languages, and styles [11] , many previous studies have used similar quantitative metrics such as lines of code to measure developer productivity [12] [13] [14] . Therefore, we believe these metrics can potentially indicate developer productivity. Owing to the security policy of Baidu, the numeric metrics are standardized by the following formula: z = X−µ σ , where z is the standardized value, X is the real value of a metric in a record, µ is the mean of a metric in the dataset, and σ is the standard deviation of a metric in the dataset. Thus, the standardized values do not affect the distribution of a metric and the findings in this study since the findings are based on the comparison between the values of metrics when developers work from home and work onsite. Moreover, the standardized values can be positive or negative. Additionally, there are two other fields, i.e., job status build and job status release, which represent the status of builds/releases performed by a developer. There are four possible values for the status of builds/releases, including success, failed, canceled, or NULL. Compared to the other three statuses, a successful build or release means that a developer is more productive on that day. Table 3 presents the information about the eight projects in the dataset. These projects are created in two different years, i.e., 2017 and 2018. There are three types of projects, i.e., APP (application software), SERVER (server software such as web services, API libraries), and SDK (software development kits). Among these projects, four projects are written in C++ while the other four projects are written in 5) Wuhan is the first city in China to be lockdown since 2020/01/23, and the whole country started lockdown from 2020/01/27, which is during the Spring Festival (2020/01/25) -the most important holiday in China. After the holiday, Baidu asked its employees to work from home. Java. Additionally, we count the number of developers who have records of development activities in the dataset for each project. We also report that these projects have different numbers of developers, for example, project P5 has the most number of developers (i.e., 53) and P3 has the least number of developers (only 4 developers). In this section, we present six research questions we address in our study. RQ1. Are there any significant differences in developer productivity between working from home and working onsite? Motivation. In this RQ, we want to investigate whether WFH can affect developers' productivity compared to working onsite. Given the dataset from Baidu, we measure the overall productivity of all developers by combining their activities together, then compare the overall productivity when working from home and working onsite. RQ2. Do different programming languages affect developer productivity when working from home? Motivation. Previous studies show that programming languages have an important impact on developers' activities, such as programming comprehension [15] and being a long-time contributor of open source projects [16, 17] . WFH might have different impacts on developers using different programming languages. For example, since C++ projects in Baidu are usually larger and require more computing resources than Java projects, developers often need to build and debug these C++ projects on a powerful machine remotely. Meanwhile, for most Java projects, developers can write code and debug it in their own computers at home. Thus, in this RQ, we want to investigate whether developers using different programming languages have different productivities when working from home. RQ3. Do different project types affect developer productivity when working from home? Motivation. As shown in Table 3 , eight projects in the dataset have three different types, i.e., APP, SERVER, and SDK. The projects with different project types could have different project management methods and styles of schedules, different development, and communication tools, which might have a potential impact on developers' productivity [18, 19] . WFH changes the way of project management and development, which have different impacts on developer productivity for projects with different types. For example, many APP projects develop mobile apps, which usually rely on some specific framework and have predefined code styles and specifications. While the software developed by SERVER projects is usually applied in much more complicated scenarios and depends on many different frameworks and programming languages. Thus, it might be more different for developers when working from home to build, test, and debug a SERVER project than an APP project. Thus, we want to investigate whether different project types have an impact on developer productivity when working from home. RQ4. Do different project ages affect developer productivity when working from home? Motivation. Different project ages might affect developers' activities. For example, our previous study found that developers in older projects spend more time on program comprehension activities than those of projects in the beginning stage [15] . WFH might expand such effects caused by project age, for example, for an older project, developers need to read the source code and documents more frequently since such a project usually has more maintenance tasks; but they cannot access these resources and communicate with colleagues easily when working from home, which might lower their productivity. Thus, we want to investigate whether different project phases have an impact on the productivity of developers when working from home in this RQ. RQ5. Do different project sizes affect developer productivity when working from home? Motivation. Different project sizes (measured by the number of developers of a project in this study) might have an impact on developers' productivity [20] . Zhou et al. [21] found that size as a factor has been always considered as a confounding effect in different approaches in software engineering. Due to different project sizes, projects might have different ways of project management and software development, which would be affected by WFH differently. For example, it might be more difficult to communicate with each other in a large project when working from home, which might decrease the productivity of the project. RQ6. Do individual developers have different productivity when working from home? Motivation. Compared with working onsite, developers might have different productivity when working from home due to some personal factors, e.g., experience, personality, habit, and skills. For example, developers are much easier to be interrupted by some other stuff when working from home. In RQ1 we use the aggregated activity data of developers while in this RQ we use the daily activities of individual developers to investigate their productivity. Identifying developers who have different performances when working from home and the potential reasons behind it can help project leaders manage the projects. Thus, in this RQ we want to investigate whether individual developers have different productivity when working from home. In this section, we present the results of these six research questions one by one. 3.1 RQ1. The overall developer productivity when working from home Methodology. To compare developer productivity when working from home with that when working onsite, we first group the records in the dataset by day, and compute several aggregate values including mean, median, sum, max, and min for each numeric metric, as shown in Table 3 . Thus, we can know a more accurate distribution of developers' productivity in terms of each metric. For example, although the mean of the submitted commits (commit count) in one day indicates the average workload of developers in one day, sometimes a few experienced/core developers usually contribute more commits than junior/periphery developers, the mean of commits might still be very high. For all the metrics except for sum, we calculate these aggregate values of each metric of all developers for each day. For sum, we only consider the developers whose records are both in DATA WFH and DATA ONSITE, i.e., the sum of DATA WFH and DATA ONSITE includes the same number of developers. For the two non-numeric features job status build and job status release, we compute the success rate of build/release in one day, that is, the ratio of the number of times that a build/release is successful. For each aggregation function of a metric, we have two groups, i.e., the days on which developers work from home and work onsite, respectively. The number of data points of a group is equal to the number of days of the corresponding group. Then, we apply the Wilcoxon rank-sum test [22] to investigate whether the difference is statistically significant in terms of one kind of aggregate values of each metric. We also compute Cliff's delta [23] 6) to quantify the amount of difference between the two groups. Consequently, we can compare developer productivity when working from home with working onsite in terms of different metrics. Results. Since we only have standardized values for these metrics in the dataset, we use violin plots to show the distributions of these metrics when working from home and working onsite, as shown in Figure 1 . Notice that the values of these metrics have been standardized (see Subsection 2.1), their values can be negative. From this figure, these metrics have fewer wide distributions when developers work from home than those when developers work onsite except for the number of lines deleted every day. For example, the range of the mean of commit count by day when working onsite is approximately from −0.4 to 0.6, while the range of the mean of build count by day when WFH is approximately from −0.5 to 1.8. This indicates that developers might have more stable productivity when working from home than when working onsite. Table 4 presents p-values and Cliff's deltas (δ) for DATA WFH compared with DATA ONSITE in terms of different aggregate values of each metric in one day. There are some cases whose values are "-" in these tables, which is caused by the fact that two compared lists are completely the same. We find Moreover, we find that WFH has different impacts on developer productivity in terms of different metrics. For example, in terms of build count, its median values for developers when WFH are less than these when working onsite. On the contrary, the maximum value of build count for developers when WFH is larger than these when working onsite. This might indicate that a small number of developers perform more builds when working from home, while most of the developers perform fewer builds. As shown in the results of RQ6 (see Subsection 3.6), this might be because some developers can be more productive when working from home. Overall, WFH has different impacts on developer productivity in terms of different metrics. Methodology. For both the records in DATA WFH and DATA ONSITE, we divide them into two groups: those from the projects using C++ and Java. For each group, we use the same method in the RQ1, then investigate whether the difference in developer productivity between WFH and working onsite is statistically significant in terms of the aggregate values of each metric. Results. Table 5 presents the cases in which the WFH values are significantly different from those of working on site, and the effect sizes of WFH are not negligible compared with these of working onsite for C++ and Java projects. The column 'Positive'/'Negative' means that WFH has a positive/negative impact on developer productivity in terms of an aggregate value of a metric. "-" means there are no cases in which WFH values are significantly different from those of working onsite for a metric. We find that for C++ projects, there are both positive and negative cases. For example, the mean of commit count, release count, and review count are belong to positive cases while the sum of compile count, lines deleted, and lines inserted are belong to negative cases. On the other hand, all the cases of Java projects belong to negative cases. This indicates that WFH has more negative impacts on Java projects than C++ projects. WFH has both positive and negative impacts on developer productivity for C++ projects in terms of different metrics but has a negative impact on developer productivity for Java projects. Methodology. We split the records in the dataset into three parts based on the project type, i.e., APP, SERVER, and SDK. For each part, we use the same approach as RQ1 and RQ2 to investigate the difference in developer productivity between WFH and working onsite for projects with different types. Results. Table 6 presents the cases in which the WFH values are significantly different from those of working onsite, and the effect sizes are not negligible for APP, SERVER, and SDK projects. For APP projects, there are some positive and negative cases, which indicate that WFH might have both positive and negative impacts on developer productivity. For example, in terms of mean, median, sum and max of commit count and review count, their values for developers when WFH are significantly larger than these when working onsite; on the contrary, in terms of mean of build count and compile count, their values for developers when WFH are significantly less than these when working onsite. For SDK and SERVER projects, most of the cases are negative, which indicates that WFH has a negative impact on developer productivity. This might be because SDK and SERVER projects usually have more components than APP projects, which requires more collaboration and communication with the other developers. According to the feedbacks from Baidu (see Subsection 4.1), WFH has a negative impact on collaboration and communication, which decreases developer productivity. WFH has both positive and negative impacts on developer productivity for APP projects and decreases developer productivity for SDK and SERVER projects. Methodology. As the projects in the dataset are created in two different years, i.e., 2017 and 2018, we split the records into two groups based on the year in which a project is created. Then, we investigate the difference in developer productivity between WFH and working onsite for projects of different ages. Results. Table 7 presents the cases in which WFH values are significantly different from those of working onsite, and the effect sizes are not negligible for 2017 and 2018 projects. As shown in Table 7 , for projects created in 2017, there are more positive cases than negative cases. For example, the mean of commit count, release count, and review count are belong to positive cases while only the mean of compile count is belong to negative cases. On the other hand, all the cases of projects created in 2018 belong to negative cases. According to the feedback from Baidu, compared to projects created in 2017, projects created in 2018 are less mature and might have more tasks and schedules. It is not easy for a project to complete some kinds of tasks or schedules when working at home, for example, recruiting a new developer usually takes more time when working at home, but a newer project usually needs more new developers than these older projects. WFH has a positive effect on developer productivity for projects created in 2017 but a negative impact on developer productivity for projects created in 2018. 3.5 RQ5. The impact of project size Methodology. As shown in Table 3 , these eight projects have different numbers of developers. According to the feedback from Baidu, we regard the projects P4 and P5 with more than 20 developers as large projects, and the other 6 projects as small projects. Then, we split the records into two groups based on project size. Then, we investigate the difference in developer productivity between WFH and working onsite for projects with different sizes. Results. Table 8 presents the cases in which WFH values are significantly different from those of working onsite, and the effect sizes are not negligible for small and large projects. We find that there are more positive cases than negative cases for small projects and most of the cases in large projects are belong to negative cases except for the sum of build count. The reason might be that it is more difficult for a large project to adjust its structure and schedule after WFH and it is more difficult for large projects to collaborate and communicate with others when working from home. WFH has a larger impacts on developer productivity for large projects than small projects. Methodology. For each developer in our dataset, we have two kinds of records, i.e., those of WFH and working onsite. Then, we investigate whether their productivity of WFH is significantly different from those of working onsite in terms of each metric. We also compute Cliff's delta [23] to quantify the amount of difference. For the records of working onsite, we only use DATA ONSITE according to the findings in RQ1. Results. Table 9 presents the individual developers whose WFH productivity is significantly different from ONSITE productivity in terms of at least one aggregation metric. The second column in this table is the index of a developer in their project. Out of 139 developers in the whole dataset, the productivity of 21 developers when WFH is significantly different from when working onsite. On the other hand, the productivity of the majority of developers (84.8%) in terms of all the metrics is not significantly different from when working onsite. Table 10 presents the number of developers for who WFH has a positive or negative impact on their productivity in terms of a metric. As shown in this table, only in terms of compile count, there is no developer whose productivity when WFH is significantly larger than that when working onsite. We also Table 9 Developer productivity of WFH is significantly different from that of working onsite in terms of different features notice that the productivity of several developers when WFH is significantly larger than when working onsite in terms of all metrics, e.g., D1 of the project P1 and D7 of the project P5. For these developers, the company should encourage them to work from home for more time since remote work can improve their productivity. On the contrary, some developers are less productive when working from home, e.g., D2 of the project P1 and D5 of the project P5. For these developers, remote work is not encouraged since their productivity decreases when working from home. The productivity of most developers working from home is similar to that of working onsite. For a small portion of developers, WFH has different impacts on their productivity. In this section, we first present the feedback from Baidu, then provide implications of our findings. At the end of this section, we discuss some threats to validity. Based on our findings, we perform a simple survey to get some feedback from developers in the studied projects. In the survey, we first collect some demographic information such as the developer role and the main programming language. Then, we ask responders whether they agree that WFH has an impact on productivity in a 5-point Likert scale (strongly disagree, disagree, neutral, agree, strongly agree). Finally, we have an open question to ask them about the factors that might affect developer productivity when working from home. Many of them agreed that WFH can have both positive and negative impacts on developer productivity. Some also agree that there is no difference in productivity when working from home. The followings are some of the feedback we collected. WFH improves developer productivity. • It is the first time for some developers to work from home, so they feel very excited and have a lot of energy to do their work. • Developers can focus on their own work and not be disturbed by colleagues. • After working from home, the company asked developers to write daily reports instead of weekly reports. Daily reports can help developers recall their daily work and push them to work harder in the second day if their tasks are not completed. • WFH decreases the cost of transportation and saves a lot of time for developers. • WFH might increase developers' working time because there is no switch between workplace and home and developers can work very early in the morning or very late in the evening. • WFH gives developers better work-life balance so that developers can work in better condition. WFH decreases developer productivity. • There is much other stuff (e.g., looking after children or pets, cooking by themselves) to interrupt developers' work and take a lot of their time. • Some developers without self-discipline cannot focus on work when working from home. Unlikely onsite, they might be too relaxed at home since there is no colleague around them. • Although video conferencing tools or telephone are now very convenient for communication, the efficiency of collaboration still decreases due to WFH. There is no difference in developer productivity when working from home. • There are no barriers for many developers to complete their daily tasks (e.g., writing code, building projects, code review) when working from home. • There is no difference of project schedule between WFH and working onsite since developers can know the schedule using an online project schedule tool. • Current video conferencing tools are very powerful, for example, they usually support screen sharing. So, there is a very slight difference between meeting at a meeting room of the company and online. WFH has different impacts on overall developer productivity. Many previous studies show that WFH has a positive effect on the productivity of workers [6] [7] [8] [9] [10] 24, 25] . Some other studies also show that WFH might have a negative impact on employee productivity. For example, Kazekami [25] found that long working time of WFH would decrease teleworker productivity. In our study, we use a quantitative analysis method to show that WFH has different impacts on developer productivity. From the feedback of Baidu, the difference in developer productivity might be caused by many reasons. We also find that the productivity of most developers is not significantly different from when working onsite (RQ6), so we think WFH can be considered as a choice of work arrangement for employees because WFH offers many other benefits except productivity, such as saving costs for the company and the flexibility of working time for developers. A project needs to prepare for WFH according to its own characteristics. In this study, we find that developer productivity might be associated with the characteristics of a project including programming language and project type/age/size. For example, WFH might have a positive effect on developer productivity for small project but does not affect developer productivity for large projects (see RQ2 in Subsection 3.5). Thus, we believe that adopting the WFH policy for a project should be based on its own characteristics, e.g., programming languages and project size. When starting the WFH policy, a project needs to prepare some resources to reduce the risks of decreasing developer productivity. For example, a large project should consider the communication cost of WFH and prepare the relevant tools to facilitate the communication of teams. Using different strategies of WFH for individual developers. We find that the productivity of most developers in this study does not change when working from home. Still there exist some developers whose productivity of WFH is different from these of working onsite. Thus, some approaches based on some development metrics can be used to identify whether the productivity of a developer increases or decreases when working from home. Once the productivity of a developer decreases, the project manager needs to identify the reasons behind it. If the developer is not suitable to work from home, he should be asked to work in the company. On the other hand, if the productivity of a developer increases, the project team should allow them to continue working from home. For researchers, to improve individual developers' productivity when working from home, more empirical studies are required to investigate more factors that affect their productivity, e.g., personality, moods, and their working environment at home. Additionally, some machine learning models based on developers' daily activities can be built to predict whether their productivity will change when working from home. Threats to internal validity. First, there might exist errors in our code and experiment setting. We have written a python script to process and analyze the dataset provided by Baidu. We double-check our code, however, there may exist some errors that we do not notice. The second internal validity is that we use some quantitative metrics of software development (e.g., the number of builds and commits) in the dataset to measure developer productivity. These metrics such as lines of code have been used to measure developer productivity [12] [13] [14] . Hence, we think these metrics can potentially indicate the productivity of developers and we also use multiple aggregate values (such as mean and median) of these metrics by day to measure the productivity. Third, there might be many other factors (e.g., the workload in different times) that affect developer productivity. It is difficult to exclude all other factors in the study. To minimize this threat, we use the records of developers' activities of WFH and ONSITE in the same year (i.e., 2020) for comparison. Finally, the metrics in the dataset are standardized due to the security policy and privacy of Baidu. But we focus on the difference between the productivity of developers when working from home and working onsite. Thus, we believe that the comparison results using standardized values do not affect the findings in the study. Threats to external validity relate to the generalizability of our findings. In this study, the dataset we used is from Baidu. The number of projects and developers is limited. Thus, it is unclear whether the same results still hold for other developers from other companies. However, since Baidu is one of the largest IT companies in China, we believe that our findings in this study have typical and common characteristics to some extent. Additionally, we analyze 138 working days of software development activities from 139 developers. These developers are from eight different projects with different characteristics such as programming languages and project types. Another threat to external validity relates to the generalizability of the metrics used to measure the productivity of developers. These metrics used in this study are very general and often used in software development [17, 26] . In the future, to reduce these threats, we plan to investigate more developers from different companies and consider more metrics. In this section, we discuss related work in the fields of WFH and developer productivity. There are many studies in the literature that investigate the benefits and drawbacks of WFH [1, [27] [28] [29] [30] [31] . According to the literature, the main benefits of WFH for companies include saving costs of buildings and increasing productivity and job satisfaction of employees. For employees, WFH gives them more flexible working time and provides a better work-life balance. WFH can also offer benefits to some special kinds of persons, such as the disabled [2] and transgender developers [32] . The main disadvantages of WFH are the access to technology and the integration of telework with the company's strategy and organizational structure, as well as the teleworkers motivation and control [29] . Felstead and Henseke [30] also reported that telework makes employees it difficult to insulate the world of work from other aspects of life when both worlds collide and overlap. Several studies have investigated the impact of WFH on productivity [6-10, 24, 25] . Among these studies, many of them reported that WFH has a positive impact on the productivity of teleworkers. For example, Coenen and Kok [24] found that telework has a positive effect on the performance of new product development through enabling knowledge sharing, cross-functional cooperation, and interorganizational involvement. On the contrary, WFH might have a negative impact on productivity. For instance, Kazekami [25] found that appropriate telework hours increase labor productivity but when telework hours are too long, telework decreases labor productivity. WFH might decrease the efficiency of developer communication, which plays an important role in software development [5] . However, most of the previous studies for WFH are based on a qualitative analysis using surveys or interviews. The participants in these studies are general workers not only developers, and they do not have the experience of WFH for a long continuing time. In this study, we perform a quantitative analysis based on a lot of activity data of developers when working from home during the time of the COVID-19 pandemic. We focus on the impact of WFH on developer productivity and the potential factors affecting developer productivity. A lot of studies use developers' daily activities to investigate their productivity. For example, Perry et al. [33] found that many developers spend a lot of time on the communication with colleagues. Additionally, many studies reported that developers' work is fragmented and frequently interrupted, which has an important impact on their productivity [34] [35] [36] [37] [38] [39] . For example, Sanchez et al. [40] found that work fragmentation is correlated to lower observed productivity and longer activity switches seem to strengthen the effect. Developer productivity is often measured by software artifacts produced by developers in a certain time, e.g., submitted lines of code (LOC) [13, 14] , function points [41] , completed tasks [42] , and time to implement a requirement [43] . Meyer et al. [19] proposed a list of metrics to measure developer productivity and several ways to improve a developer's productivity through tool support. Some studies also investigate the factors affecting developer productivity, e.g., characteristics of workplace (e.g., privacy, noise) [44] , programming languages and development tools [45] , project switching [46] , and developers' mood [47] . Additionally, personal factors might have an impact on productivity, for example, some developers feel more productive when communicating with others but some don't like to be interrupted when working [48] . Due to the outbreak of COVID-19, some researchers also start to investigate the effect of the pandemic on developers' productivity. For example, Ralph et al. [49] conducted a survey and found that the pandemic had a negative effect on developers' productivity. In this study, we focus on the difference of developer productivity between WFH and working onsite. We measure developer productivity by several metrics based on developers' daily activities, which have been used in previous studies. We also investigate the factors affecting developer productivity when working from home, such as programming language, project type, and project size. In this paper, we investigate the productivity of developers when working from home for a long time due to the COVID-19 pandemic. We use a quantitative analysis based on a dataset of developers' daily activities from Baidu. To compare developer productivity of WFH with that of working onsite, we use several metrics of software development in the dataset, such as the number of builds, commits, and inserted/deleted lines. We find that WFH has different impacts on developer productivity in terms of different metrics. Also, we investigate some factors affecting developer productivity when working from home including programming language and project type/age/size. Additionally, we find that a small number of individual developers have different productivity when working from home. In the future, we plan to extend our study by using more data from more developers and companies. We also want to build machine learning models to predict developer productivity based on developers' daily activities. Telework: existing research and future directions Accessibility to work from home for the disabled: the need for a shift in management style Does it matter where you work? A comparison of how three work venues (traditional office, virtual office, and home office) influence aspects of work and personal/family life The work life of developers: activities, switches and perceived productivity Predicting build failures using social network analysis on developer communication Predicting telecommuter productivity Individual, social and situational determinants of telecommuter productivity Satisfaction and perceived productivity when professionals work from home. Res Practice Human Resour Manag Measuring the productivity impacts of new ways of working Flexible work schedules, virtual work programs, and employee productivity. Dissertation for Ph A large-scale empirical study of just-in-time quality assurance A method of programming measurement and estimation Analytical and empirical evaluation of software reuse metrics An analysis of trends in productivity and cost drivers over years Measuring program comprehension: a large-scale field study with professionals Community, joining, and specialization in open source software innovation: a case study A large scale study of long-time contributor prediction for GitHub projects Agile team perceptions of productivity factors Software developers' perceptions of productivity Factors that influence the productivity of software developers in a developer view Examining the potentially confounding effect of class size on the associations between objectoriented metrics and change-proneness Individual comparisons by ranking methods Workplace flexibility and new product development performance: the role of telework and flexible work schedules Mechanisms to improve labor productivity by performing telework Characterizing and identifying reverted commits Making Telecommuting Happen: A Guide for Telemanagers and Telecommuters The advantages and challenges of working here, there anywhere, and anytime Benefits and barriers of telework: perception differences of human resources managers according to company's operations strategy Assessing the growth of remote working and its consequences for effort, well-being and work-life balance Home-based telework in France: characteristics, barriers and perspectives How remote work can foster a more inclusive environment for transgender developers People, organizations, and process improvement The effects of interruptions on task performance, annoyance, and anxiety in the user interface Interruptions on software teams: a comparison of paired and solo programmers A diary study of task switching and interruptions Notification, disruption, and memory: effects of messaging interruptions on memory and performance Evaluating cues for resuming interrupted programming tasks Interrupts: just a minute never is An empirical study of work fragmentation in software evolution tasks Measuring application development productivity I know what you did last summer-an investigation of how developers spend their time Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity Programmer performance and the effects of the workplace Improving software productivity The sky is not the limit: multitasking across github projects Do moods affect programmers' debug performance? Characterizing software developers by perceptions of productivity Pandemic programming: how COVID-19 affects software developers and how their organizations can help