key: cord-0547971-5xo45v4p
authors: Kruger, Jacob; cCalikli, Gul; Bershadskyy, Dmitri; Heyer, Robert; Zabel, Sarah; SiegmarOttoRuhr-UniversityBochumGermanyUniversityofGlasgow, UK; Germany, Otto-von-Guericke University Magdeburg; Germany, University of Hohenheim
title: Registered Report: A Laboratory Experiment on Using Different Financial-Incentivization Schemes in Software-Engineering Experimentation
date: 2022-02-22
journal: nan
DOI: nan
sha: ae9c614e621069db547faa823ef0606647166516
doc_id: 547971
cord_uid: 5xo45v4p

Empirical studies in software engineering are often conducted with open-source developers or in industrial collaborations. Seemingly, this resulted in few experiments using financial incentives (e.g., money, vouchers) as a strategy to motivate the participants' behavior; which is typically done in other research communities, such as economics or psychology. Even the current version of the SIGSOFT Empirical Standards does mention payouts for completing surveys only, but not for mimicking the real-world or motivating realistic behavior during experiments. So, there is a lack of understanding regarding whether financial incentives can or cannot be useful for software-engineering experimentation. To tackle this problem, we plan a survey based on which we will conduct a controlled laboratory experiment. Precisely, we will use the survey to elicit incentivization schemes we will employ as (up to) four payoff functions (i.e., mappings of choices or performance in an experiment to a monetary payment) during a code-review task in the experiment: (1) a scheme that employees prefer, (2) a scheme that is actually employed, (3) a scheme that is performance-independent, and (4) a scheme that mimics an open-source scenario. Using a between-subject design, we aim to explore how the different schemes impact the participants' performance. Our contributions help understand the impact of financial incentives on developers in experiments as well as real-world scenarios, guiding researchers in designing experiments and organizations in compensating developers.

Experimentation in software engineering rarely involves financial incentives to compensate and motivate participants. However, in most real-world situations it arguably matters whether software developers are compensated, for instance, in the form of wages or bug-bounties [22, 24] of open-source communities. Particularly experimental economists use financial incentives during experiments for two reasons [41] . First, financial incentives improve the validity of the experiment by mimicking real-world incentivisation schemes to motivate participants' realistic behavior and performance. To this end, in addition to show-up or participation fees, the actual performance of participants during the experiment is rewarded by defining a payoff function that maps the participants' performance during the experiment to financial rewards or penalties. Second, they allow to study different incentives with respect to their impact on participants' performance. It is likely that using financial incentives in empirical software engineering can help improve the validity by mimicking and staying true to the real world, too.

Interestingly, there are no guidelines or recommendations on using financial incentives in software-engineering experimentation. Namely, the current SIGSOFT Empirical Standards 1 [29] (as of August 4, 2021; commit b046f37) mention incentives solely in the context of rewarding participation in surveys. Also, to the best of our knowledge and based on a literature review, financial incentives that reward participants' performance during an experiment are not used systematically in empirical software engineering. Although some studies broadly incentivize performance (e.g., Sayagh et al. [31] or Shargabi et al. [32] ), these do not aim to improve the validity of the experiment, only participation. Furthermore, we know from experimental economics [7, 8] that finding a realistic (and thus externally valid) way to reward performance is challenging and no simple one-fits-all solution exists. For instance, the performance of open-source developers depends less on financial rewards than those of industrial developers [3, 4, 19, 42] .

As a step towards understanding and systematizing the potential of using financial incentives in software-engineering experimentation, we propose a two-part study comprising a survey and a controlled experiment in the context of bug detection through code reviews. First, we will conduct a survey with practitioners to elicit real-world incentivisation schemes on bug finding. In the survey, we will distinguish between the schemes most participants prefer and those actually employed. Building on the results, we will define two corresponding payoff functions for our experiment.

To extend our experiment, we will add two more payoff functions: one that is performance-independent and one that resembles the motives of open-source developers. We derive the latter function using the induced-value method established in experimental economics [35, 41] , which induces a controlled willingness of participants to achieve a desired goal (i.e., identify a bug) or obtain a certain good during an experiment by mimicking its monetary value (e.g., a reward). Second, we will employ our actual between-subject experiment to explore to what extent each of the four payoff functions impacts the participants' behavior. Overall, we primarily contribute to improving researchers' understanding of whether and how financial incentives can help software-engineering experimentation. However, our experiment also has the potential to reveal whether different incentivisation schemes could improve practitioners' motivation. Our survey and experimental design artifacts are available for peer-reviewing. 2 

Experiments in software engineering are comparable to "real-effort experiments" in experimental economics, which involve participants who solve certain tasks to increase their payoffs. Consequently, we build on experiences from the field of experimental economics, which involves a large amount of literature on how and when to use financial incentives in real-effort experiments [10, 12, 14, 38] . For instance, some findings indicate gender differences regarding the impact of incentivization schemes, which we have to consider during our experiment. In detail, research has shown that men choose more competitive schemes (e.g., tournaments, performance payments). Similarly, participants with higher social preferences select such competitive schemes more rarely [9, 27] . We will consider such factors when analyzing the results of our experiment (e.g., comparing gender differences if the number of participants allows).

Unfortunately, there is much less research on incentivization schemes in software-engineering experimentation. Mason and Watts [26] have analyzed the impact of financial incentives on crowd performance during software projects using online experiments. The results are similar to those in experimental economics, but the authors also acknowledge that they did not design the incentives to mimic the real world or to improve the participants' motivation. Other studies have been concerned with the impact of payments on employees' motivation [33, 37] , job satisfaction [21, 36] , or job change [6, 13, 16] . For instance, Baddoo et al. [3] conducted a case study and found that developers perceived wages and benefits as an important motivator, but they did not connect payments to objective performance metrics. None of the studies we are aware of decomposes payments or wages into specific components (e.g., performance-dependent versus performance-independent). So, the effectiveness of different payoff schemes on developers' performance remains unclear.

Software-engineering researchers have investigated the motivations of open-source developers to a much greater extent [11, 15, 18, 19, 42] . From the economics perspective, open-source systems represent a public good [5, 25] : they are available to everyone and their consumption do not yield disadvantages to anyone else. A typical problem of public goods is that individual and group incentives collide, which usually leads to an insufficient provision of the good. While typical explanations for open-source development focus on high intrinsic motivation to contribute or learn, this is not always the case. For instance, Roberts et al. [30] show that financial incentives can actually improve open-source developers' motivation (in terms of contributions). Still, financial incentives are at least not always the predominant motivators for software developers [4, 33] . As a consequence, we will use the concept of open-source software as a social good [19] as an extreme example (i.e., the developers help solve a social problem, but do not receive a payment) for designing a fourth payoff function in our experiment.

As explained previously, our study involves two data-collection processes, a survey and a laboratory experiment. In Table 1 , we provide an overview of our study based on the PCI RR study design template, which we will explain in more detail in this section.

Goal. With our survey, we aim to explore i) which payment components (e.g., wages only, bug bounties) are most applied (MA) in practice and ii) which payment components are most preferred (MP) by practitioners. We display an overview of these payment components with concrete examples in Table 2 . Our intention is to understand what is actually employed compared to what would be preferred as a payment schema to guide the design of our experiment.

Structure. To achieve our goal, we created an online questionnaire with the following structure (cf. Table 3 ). At first, we will welcome our participants, informing them about the survey's topic, duration, and their right to withdraw from our experiment at any point in time without any disadvantages. Furthermore, we will ask for consent to collect, process, and publish the data in anonymized from. To allow for questions, we will provide the contact data of at least one author on this first page. Then, we will ask about each participant's background to collect control variables, for instance, regarding their demographics, role in the organization, the domain they work in, and experience. Based on their roles, the online survey will show the questions on the payment structures in an adaptive manner. We designed H1: Participants without performance-based incentivization (NPIT) have on average a worse performance than those with performance-based incentivization (e.g., OSIT, MAIT, MPIT).

H2: The experimental performance of participants under performance-based incentivization (e.g., OSIT, MAIT, MPIT) differs between treatments.

We aim to recruit at least 80 (20 per treatment) computerscience students of the Otto-von-Guericke University Magdeburg. Furthermore, we will conduct an a posteriori power analysis to reason on the power of our tests.

If their assumptions are fulfilled, we will use parametric tests to compare between the treatments. Otherwise, we will employ non-parametric tests. For H1, we will pairwise compare the performanceindependent treatment to the other treatments:

For H2, we will pairwise compare the performancedependent treatments:

In total, we will compute six pairwise tests to compare the four treatments with one another and will correct for multiple hypotheses testing (Holm-Bonferroni method). We will also conduct regression analyses using the treatments as categorical variables (NPIT as base) and age, gender, experience, as well as arousal as exogenous variables Due to our experimental design, we face the issue of multiple hypotheses testing. We address this issue by applying the Holm-Bonferroni correction.

We find support for H1, if our participants' performance in NPIT is worse and if the tests between any of our experimental treatments are significant with p < 0.05 (after correcting with the Holm-Bonferroni method). Confirming H1 means that the performance is better in the specific treatment with performancebased incentives compared to NPIT. This implies that if performance plays a role in a software-engineering experiment, performance-based incentivization should be considered.

We find support for H2, if our participants performance between the treatments differs and the respective tests are significant with p < 0.05 (after correcting with the Holm-Bonferroni method). Confirming H2 means that the practitioners' performance differs depending on the type of incentivization. If we cannot confirm H2, we do not find evidence for OSIT, MAIT, and MPIT to induce different performances.

There is no theory focusing on the role of incentives in software engineering. Incentivization in software-engineering experiments is scarcely applied. Our results could improve experimental designs in software engineering by guiding researchers when and how to use incentives in their experiments.

NPIT [23, 28, 34] .

To explore the payment components (target variables), we will display the ones we summarize in Table 2 . We will use a checklist in which a participant can select the components that are applied in their organization. Each selected component will have a field in which the participant can enter a percentage to indicate to what extent that component impacts their payment (e.g., 80 % wage and 20 % bug bounty). Then, we present the same checklist and fields again. This time, the participant shall define which of the components they would prefer to contribute with what share to the payment. While we present this second list as is to any management role (e.g., project manager, CEO), we ask software engineers (e.g., developer, tester) to decide upon those components from the perspective of being the team or organization lead. To prevent sequence effects, we will randomize the order in which the two treatment questions occur (applied and preferred). Finally, we ask each participant to indicate how many hours per week they work unpaid overtime-which represents a type of performance penalty for our payoff functions-and allow them to enter any additional comments on the survey.

Sampling Participants. We will invite personal contacts and collaborators from different organizations, involving software developers, project managers, and company managers. Note that we exclude self-employed or freelancer developers who typically ask for a fixed payment for a specific task or project. In addition, we will distribute a second version (to distinguish both populations) of our survey through our social media networks. Our goal is to acquire at least 30 responses to obtain a reasonable understanding of applied and preferred payments. Note that we will not pay incentives for participating in the survey. We expect that the survey will take 10 minutes at most, but will verify the required time and understandability of the survey through test runs with three PhD students from our work groups.

Analysis Plan. To specify the payoff functions for our experiment, we will analyze the absolute frequency of combinations of different payment components to identify the most-preferred and most-applied combinations. For these two combinations, we will compute the mean values for their weights. As an example, assume that most of our participants would state to prefer the combination of fixed wages (with a weight of 75 % on average) and bug bounties (25 % on average). Then, we would define a cost function as 0.75 * payment f ix + 0.25 * (bugs correct * reward quality ).

Threats to Validity. Our survey relies mostly on our personal contacts, which may bias its outcomes. We can mitigate this threat, since we have a broad set of collaborators in different countries and organizations. Moreover, defining the "ideal" payoff function for practitioners may pressure the participants, is hard to define (e.g., considering different countries, open-source communities, or expectations), and challenging to measure (e.g., what is preferred or efficient). However, this is due to the nature of our experiment and the property we study: financial incentives. Consequently, these threats remain and we have to discuss their potential impact, which can only be mitigated with an appropriately large sample population. Table 2 : List of components of payment we will ask about in our survey to design payoff functions for the experiment. Note that the term check refers to participants selecting or deselecting a line of code during our experiment (i.e., marking them as buggy or correct as can be seen in Figure 1 ).

not performance-based hourly wage payment for hours spent on code review wage payment per task fixed payment for conducting a code review payment f ix others specified by participants performance-based reward for completing review bonus for finding all bugs reward complete reward for quality bonus for correctly found bug (e.g., bug bounty) reward quality reward for time bonus for performing reviews fast reward time reward for organization's performance bonus provided based on the organization's profits reward share penalty for low quality penalty for mistakes within a certain period (e.g., missed bugs) penalty quality penalty for checks penalty for marking lines of code in the experiment penalty check penalty for required overtime penalty for not completing within working hours penalty time others specified by participants

Goal. After eliciting which payoff functions are used and preferred in practice, we will conduct our actual experiment to measure the impact of different payoff functions in software-engineering experiments. We focus on code reviews and bug identification in this experiment, since these are typical tasks in software engineering that also involve different types of incentives. So, we aim to support software-engineering researchers by identifying which payoff functions can be used to improve the validity of experiments.

Treatments. As motivated, we aim to compare four treatments to reflect different payoff functions that stem from our survey and established research. While we are able to define the payoff functions for the "No Performance Incentives Treatment" (NPIT) and "Open-Source Incentives Treatment" (OSIT) in advance, we need data from our survey to proceed with the "MP Incentives Treatment" (MPIT) and "MA Incentives Treatment" (MAIT). However, we can a priori describe the method we will use to derive the payoff functions for those treatments. Note that some treatments may yield the same payoff function (i.e., NPIT, MAIT, and MPIT). It is unlikely that all three payoff functions will be identical, but we will merge those that are and reduce the number of treatments accordingly (see Table 2 for the variable names):

No Performance Incentives Treatment (NPIT): For NPIT, we provide a fixed payment (i.e., 10 e) that will be payed out at the end of an experimental session. So, this treatment mimics a participation fee for experiments or fixed wages for the real world. Consequently, the payoff is independent of a participant's actual performance. Overall, the payoff function (P F ) for this treatment is:

Open-Source Incentives Treatment (OSIT): Again, this treatment does not depend on our survey results, but builds on findings from the literature on the motivation of open-source developers [11, 15, 18, 19, 42] . We remark that we focus particularly on those developers that do not receive payments (e.g., as wages or bug bounties), but work for free. In a simplified perspective, such developers still act within a conceptual cost-benefit framework (i.e., they perceive to obtain a benefit from working on the software). Besides a participation fee, we will involve a performance-based reward for correctly identifying all bugs to resemble goal-oriented incentives (e.g., personal fulfillment of achieving a goal or supporting open-source projects). Furthermore, we consider the opportunity costs of working on open-source software (i.e., less time for other projects and additional effort for performing a number of checks). Overall, the payoff function (P F ) for this treatment is: P F N P IT = payment f ix + reward complete − time * penalty time − checks * penalty checks MA Incentives Treatment (MAIT): Using our survey results, we will be able the identify a payoff function that represents what is mostly applied in practice. We will then derive a payoff function as explained in Section 3.1.

We will use the same method we used for MAIT to define a payoff function for MPIT.

Note that these payoff functions cannot be perfect, but they are mimicking real-world scenarios, and thus are feasible to achieve our goals. We use the same code-review example for all treatments to keep the complexity of the problem constant. For all treatments, we will calibrate the payoff function so that the expected payoff for each participant in and between treatments is approximately the same (i.e., around 10 e). Implementing similar expected payoffs avoids unfairness between treatments, and ensures that performance differences are caused by different incentive schemes and not the total size of the payoff. After the treatment, we will gather demographic data from the participants (e.g., age, gender) and ask for any concerns or feedback. We estimate that each sessions of the experiment will take 45 minutes.

Code Example. We selected and adapted three different Java code examples (i.e., limited calculator, sorting and searching, a Stack), the first from the learning platform LeetCode 3 and the other two from the "The Algorithms" GitHub repository. 4 To create buggy examples, we injected three bugs into each code example by using mutation operators [20] . Note that we partly reworked the examples to make them more suitable for our experiment (e.g., combining searching and sorting), added comments at the top of each example explaining its general purpose, and kept other comments (potentially adapted) as well as identifier names to improve the realism. We aimed to limit the time of the experiment to avoid fatigue and actually allow for a laboratory setting, and thus decided to use only one example. To select the most suitable subject system for our experiment, we performed a pilot study in which we measured the time and performance of the participants. In detail, we asked one M.Sc. student from the University of Glasgow who has worked as a software practitioner in industry and four PhD students from the University of Zurich to perform the code reviews on the buggy examples. Overall, each example was reviewed by three of these participants. Our results indicate that the sorting and searching example would be most feasible (i.e., ≈12 min, 4/9 bugs correctly identified, 5 false positives), considering that the task should neither be too easy nor to hard, the background of the pilot's participants and the potential participants for our experiment, as well as the examples' quality. The other two examples seemed too large or complicated (i.e., ≈14 min, 2/9 bugs; 4 false positives; ≈8 min, 5/9 bugs, 8 false positives), which is why we decided to use the sorting and searching example (available in our artifacts). 2 We remark that none of the participants from this pilot study will be involved in our actual experiment. In Figure 1 , we display a screenshot of the sorting and searching code example as we will show it to participants in the lab.

Sampling Participants. We aim to recruit a minimum of 80 participants (20 per treatment) by inviting students and faculty members of the Faculty for Computer Science of the Otto-von-Guericke University Magdeburg, Germany. In 2019, 1,676 Bachelor and Master students as well as roughly 200 PhD students had been enrolled at the faculty, and 193 (former) members of the faculty are listed in the participant pool of the MaXLab 5 at which we will conduct the laboratory experiment. We will focus on recruiting participants who passed the faculty courses on Java and algorithms (first two semester) or equivalent courses to ensure that our participants have the fundamental knowledge required for understanding our sorting and searching example. If possible (e.g., considering finances), we will invite further participants (potentially from industry and other faculties) to strengthen the validity of our results. Note that we focus on the Otto-von-Guericke University, since the MaXLab is located there. Regarding the Covid pandemic, it is currently possible to conduct sessions with reduced numbers of participants (i.e., 10 instead of 20).

Hypotheses. Reflecting on findings in software engineering as well as other domains, we define two hypotheses (H) we want to study in our experiment: H 1 Participants without performance-based incentivization (NPIT) have on average a worse performance (lower value in the F1-score, explained shortly) than those with performance-based incentivization (e.g., OSIT, MAIT, MPIT).

The experimental performance of participants under performance-based incentivization (e.g., OSIT, MAIT, MPIT) differs between treatments.

Besides analyzing these hypotheses, we will also compare the behavior (e.g. risk taking) and performance between all groups to understand which incentives have what impact. Moreover, we will use eye trackers to explore fixation counts, fixation lengths, and return fixations. This will allows us to obtain a deeper understanding of the search and evaluation processes during code reviews. Also, it enables us to investigate potential differences in eye movements depending on the incentivization. More precisely, we intend to follow similar studies from software engineering [1] to explore how our participants read the source code, for instance, do they focus on the actually buggy code, what lines are they reading more often, or which code elements do they focus on to explore bugs?

Metrics. The performance of our participants is primarily depending on their correctness in identifying bugs during the code review. Since this can be expressed as confusion matrices, we decided to implement the F1-score. Note that our participants will not be aware of this metric (they will only know about the payoff function) to avoid biases, and any decision based on the payoff function will be reflected by the F1-score (e.g., taking more risks due to missing penalties under NPIT). So, this metric allows us to compare the performances of our participants between treatments considering that they motivate different behaviors, which allows us to test our hypotheses.

Experimental Design. We will allocate participants to their treatment at random, without anyone repeating the experiment in another treatment. On-site, we can execute the experiment at the experimental laboratory MaXLab of the Otto-von-Guericke University using a standardized experimental environment. We will employ a between-subject design measuring the participants' performance and measure the eye movement of four participants (restricted by number of devices) in each session using eye trackers (60 Hz Tobii Pro Nano H). Note that we will identify any impact wearing eye-trackers may have on our participants during our analysis. However, it is not likely that they will have an impact, because this type of eye trackers is mounted to the screen and barely noticeable, not a helmet the participants have to wear. In case there is an impact, we will scale up the number of participants to allow us to ignore this additional variable. The procedure for each session is as follows:

Welcome and Experimental Instructions: After the participants of a session enter the laboratory, they are randomly allocated to working stations with the experimental environment installed. Moreover, four of them are randomly selected for using eye trackers. To this end, we will already state in the invitation that eye tracking is involved in the experiment. If a participant nonetheless disagrees to participate using eye trackers, we will exclude them from the experiment to avoid selection bias. Once all participants are at their places, the experimenter begins the experiment. The participants receive general information about the experiment (e.g., welcoming text), information about the task at hand (code review), explanation on how to enter data (e.g., check box), and the definition of their payoff function for the experiment (with some examples).

Review Task: All participants receive the code example with the task to identify any bugs within it. Note that the participants will not be aware of the precise number of bugs in the code. Instead, a message will explain that the code does not behave as expected when it is executed. At the end of the task, we can incorporate unpaid overtime as a payment component by asking participants to stay for five more minutes to work on the task.

Post Experimental Questionnaire: After the experiment, the participants will receive a set of demographic questions (cf. Table 3 ). We will further apply the distress subscale of the Short Stress State Questionnaire [17] to measure arousal and stress of the participants. Eliciting such data on demographics and arousal will enable us to identify potential confounding parameters.

Payoff Procedure: After we have collected all the data, we will provide information about their performance and payoff to the participants by displaying them on their screen. We will pay out these earnings privately in a separate room in cash immediately afterwards.

Analysis Plan. To analyze our data, we will employ the following steps:

Data Cleaning: The experimental environment stores raw data in CSV files. We do not plan to remove any outliers or data unless we identify a specific reason for which we believe the data would be invalid.

Descriptive Statistics: We will present descriptive statistics for the demographic, dependent, and independent variables for each treatment by reporting means and standard deviations of the respective variables.

Observational Descriptions: Since sole statistical testing is often subject to misinterpretation and not recommended [2, 39, 40] , we will focus on describing our observations. For this purpose, we will start with reporting the results we obtained, plotting suitable visualizations, and identifying patterns within these. The statistical tests will help us to improve our confidence in these observations.

Inferential Statistics: For our analysis, we will focus on performance (i.e., F1 score). We will first check whether the assumptions required for parametric tests are fulfilled, and if not proceed with non-parametric tests. Since we are interested in all possible differences between the four treatments, we have to conduct all pairwise treatment tests. In total, this leads to 6 tests, or to 3 tests if our survey indicates that two treatments are mostly identical. For the significance analyses, we will apply a confidence interval of p < 0.05 and correct for multiple hypotheses testing using the Holm-Bonferroni method. Though the share of participants who will use eye trackers will be constant among all treatments, and thus should not affect treatment effects, we will further check whether the presence of eye trackers affected performance. To increase the statistical robustness, we will also conduct a regression analysis using the treatments as categorical variables and NPIT as base. As exogenous variables, we include: age, gender, experience, and arousal of the participants.

Based on these steps, we will obtain a detailed understanding of how different incetivization schemes impact the performance of software developers during code review.

Developer Reading Behavior While Summarizing Java Methods: Size and Context Matters

Retire Statistical Significance: Scientists Rise Up against Statistical Significance

Software Developer Motivation in a High Maturity Company: A Case Study. Software Process: Improvement and Practice

Intrinsic Motivation in Open Source Software Development

Job Expectations of IS Professionals in Hong Kong

Real-Effort Tasks

Lab Labor: What Can Labor Economists Learn from the Lab?

Performance Pay and Multidimensional Sorting: Productivity, Preferences, and Gender

Monetary and Non-Monetary Incentives in Real-Effort Tournaments

The Shifting Sands of Motivation: Revisiting What Drives Contributors in Open Source

A Structural Analysis of Disappointment Aversion in a Real Effort Competition

Happiness and the Productivity of Software Engineers

Wage Transparency and Performance: A Real-Effort Experiment

Working for Free? Motivations for Participating in Open-Source Projects

Anindya Iqbal, and Gias Uddin. A Survey-Based Qualitative Study to Characterize Expectations of Software Developers from Five Stakeholders

Validation of a Short Stress State Questionnaire

Motivation of Software Developers in Open Source Projects: An Internet-Based Survey of Contributors to the Linux Kernel

Leaving My Fingerprints: Motivations and Challenges of Contributing to OSS for Social Good

An Analysis and Survey of the Development of Mutation Testing

Predictors of Leadership Style, Organizational Commitment and Turnover of Information Systems Professionals

Bounty Programs in Free/Libre/Open Source Software

Effects of Explicit Feature Traceability on Program Comprehension

How Can I Contribute? A Qualitative Analysis of Community Websites of 25 Unix-Like Distributions

Some Simple Economics of Open Source

Financial Incentives and the "Performance of Crowds

Do Women Shy Away From Competition? Do Men Compete Too Much?

Commenting Source Code: Is It Worth It for Small Programming Tasks

Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects

What Should Your Run-Time Configuration Framework Do to Help Developers? Empirical Software Engineering

Performing Tasks Can Improve Program Comprehension Mental Model of Novice Developers

Measuring and Modeling Programming Experience

Experimental Economics: Induced Value Theory. The American Economic Review

Towards a Theory of Software Developer Job Satisfaction and Perceived Productivity

The Role of the Work Itself: An Empirical Examination of Intrinsic Motivation's Influence on IT Workers Attitudes and Intentions

Incentive Systems in a Real Effort Experiment

The ASA Statement on p-Values: Context, Process, and Purpose

Moving to a World Beyond

Toward an Understanding of the Motivation of Open Source Software Developers