Too trivial to test? An inverse view on defect prediction to identify methods with low fault risk


Too trivial to test? An inverse view on
defect prediction to identify methods with
low fault risk
Rainer Niedermayr1,2, Tobias Röhm1 and Stefan Wagner2

1 CQSE GmbH, München, Germany
2 Institute of Software Technology, University of Stuttgart, Stuttgart, Germany

ABSTRACT
Background: Test resources are usually limited and therefore it is often not possible
to completely test an application before a release. To cope with the problem of scarce
resources, development teams can apply defect prediction to identify fault-prone
code regions. However, defect prediction tends to low precision in cross-project
prediction scenarios.
Aims: We take an inverse view on defect prediction and aim to identify methods that
can be deferred when testing because they contain hardly any faults due to their
code being “trivial”. We expect that characteristics of such methods might be
project-independent, so that our approach could improve cross-project predictions.
Method: We compute code metrics and apply association rule mining to create rules
for identifying methods with low fault risk (LFR). We conduct an empirical
study to assess our approach with six Java open-source projects containing
precise fault data at the method level.
Results: Our results show that inverse defect prediction can identify approx. 32–44%
of the methods of a project to have a LFR; on average, they are about six times
less likely to contain a fault than other methods. In cross-project predictions with
larger, more diversified training sets, identified methods are even 11 times less likely
to contain a fault.
Conclusions: Inverse defect prediction supports the efficient allocation of test
resources by identifying methods that can be treated with less priority in testing
activities and is well applicable in cross-project prediction scenarios.

Subjects Data Mining and Machine Learning, Software Engineering
Keywords Testing, Inverse defect prediction, Fault risk, Low-fault-risk methods

INTRODUCTION
In a perfect world, it would be possible to completely test every new version of a software
application before it was deployed into production. In practice, however, software
development teams often face a problem of scarce test resources. Developers are busy
implementing features and bug fixes, and may lack time to develop enough automated unit
tests to comprehensively test new code (Ostrand, Weyuker & Bell, 2005; Menzies &
Di Stefano, 2004). Furthermore, testing is costly and, depending on the criticality of a
system, it may not be cost-effective to expend equal test effort to all components
(Zhang, Zhang & Gu, 2007). Hence, development teams need to prioritize and limit
their testing scope by restricting the code regions to be tested (Menzies et al., 2003;

How to cite this article Niedermayr R, Röhm T, Wagner S. 2019. Too trivial to test? An inverse view on defect prediction to identify
methods with low fault risk. PeerJ Comput. Sci. 5:e187 DOI 10.7717/peerj-cs.187

Submitted 25 October 2018
Accepted 18 March 2019
Published 15 April 2019

Corresponding author
Rainer Niedermayr,
niedermayr@cqse.eu

Academic editor
Mario Luca Bernardi

Additional Information and
Declarations can be found on
page 23

DOI 10.7717/peerj-cs.187

Copyright
2019 Niedermayr et al.

Distributed under
Creative Commons CC-BY 4.0

http://dx.doi.org/10.7717/peerj-cs.187
mailto:niedermayr@�cqse.�eu
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.187
http://www.creativecommons.org/licenses/by/4.0/
http://www.creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/


Bertolino, 2007). To cope with the problem of scarce test resources, development teams
aim to test code regions that have the best cost-benefit ratio regarding fault detection.
To support development teams in this activity, defect prediction has been developed and
studied extensively in the last decades (Hall et al., 2012; D’Ambros, Lanza & Robbes,
2012; Catal, 2011). Defect prediction identifies code regions that are likely to contain
a fault and should therefore be tested (Menzies, Greenwald & Frank, 2007;
Weyuker & Ostrand, 2008).

This paper suggests, implements, and evaluates another view on defect prediction: inverse
defect prediction (IDP). The idea behind IDP is to identify code artifacts (e.g., methods)
that are so trivial that they contain hardly any faults and thus can be deferred or ignored in
testing. Like traditional defect prediction, IDP also uses a set of metrics that characterize
artifacts, applies transformations to pre-process metrics, and uses a machine-learning
classifier to build a prediction model. The difference rather lies in the predicted classes.
While defect prediction classifies an artifact either as buggy or non-buggy, IDP identifies
methods that exhibit a low fault risk (LFR) with high certainty and does not make an
assumption about the remaining methods, for which the fault risk is at least medium or
cannot be reliably determined. As a consequence, the objective of the prediction also differs.
Defect prediction aims to achieve a high recall, such that as many faults as possible can
be detected, and a high precision, such that only few false positives occur. In contrast,
IDP aims to achieve high precision to ensure that LFR methods contain indeed hardly any
faults, but it does not necessarily seek to predict all non-faulty methods. Still, it is desired that
IDP achieves a sufficiently high recall such that a reasonable reduction potential arises
when treating LFR methods with a lower priority in QA activities.

Research goal: We want to study whether IDP can reliably identify code regions that
exhibit only a LFR, whether ignoring such code regions—as done silently in defect
prediction—is a good idea, and whether IDP can be used in cross-project predictions.

To implement IDP, we calculated code metrics for each method of a code base and
trained a classifier for methods with LFR using association rule mining. To evaluate IDP,
we performed an empirical study with the Defects4J dataset (Just, Jalali & Ernst, 2014)
consisting of real faults from six open-source projects. We applied static code analysis and
classifier learning on these code bases and evaluated the results. We hypothesize that
IDP can be used to pragmatically address the problem of scarce test resources.
More specifically, we hypothesize that a generalized IDP model can be used to identify
code regions that can be deferred when writing automated tests if none yet exist, as is
the situation for many legacy code bases.

Contributions: (1) The idea of an inverse view on defect prediction: While defect
prediction has been studied extensively in the last decades, it has always been employed to
identify code regions with high fault risk. To the best of our knowledge, the present
paper is the first to study the identification of code regions with LFR explicitly.
(2) An empirical study about the performance of IDP on real open-source code bases.
(3) An extension to the Defects4J dataset (Just, Jalali & Ernst, 2014): To improve data
quality and enable further research—reproduction in particular—we provide code metrics
for all methods in the code bases and an indication whether they were changed in a bug-fix

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 2/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


patch, a list of methods that changed in bug fixes only to preserve API compatibility,
and association rules to identify LFR methods.

The remainder of this paper is organized as follows. “Association Rule Mining” provides
background information about association rule mining. “Related Work” discusses
related work. “IDP Approach” describes the IDP approach, that is, the computation of
the metrics for each method, the data pre-processing, and the association rule mining
to identify methods with LFR. Afterward, “Empirical Study” summarizes the design and
results of the IDP study with the Defects4J dataset. Then, “Discussion” discusses the
study’s results, implications, and threats to validity. Finally, “Conclusion” summarizes the
main findings and sketches future work.

ASSOCIATION RULE MINING
Association rule mining is a technique for identifying relations between variables in a large
dataset and was introduced by Agrawal, Imieli�nski & Swami (1993). A dataset contains
transactions consisting of a set of items that are binary attributes. An association rule
represents a logical implication of the form {antecedent} / {consequent} and expresses that
the consequent is likely to apply if the antecedent applies. Antecedent and consequent both
consist of a set of items and are disjoint. The support of a rule expresses the proportion of the
transactions that contain both antecedent and consequent out of all transactions.
The support of an item X with respect to all transactions T is defined as suppðXÞ ¼ t2T:X�tj jTj j .
It is related to the significance of the itemset (Simon, Kumar & Li, 2011). The confidence of a
rule expresses the proportion of the transactions that contain both antecedent and
consequent out of all transactions that contain the antecedent. The confidence of a rule
X / Y is defined as confðX ! YÞ ¼ suppðX[YÞsuppðXÞ . It can be considered as the precision
(Simon, Kumar & Li, 2011). A rule is redundant if a more general rule with the same or a
higher confidence value exists (Bayardo, Agrawal & Gunopulos, 1999).

Association Rule Mining has been successfully applied in defect prediction studies
(Song et al., 2006; Czibula, Marian & Czibula, 2014; Ma et al., 2010; Zafar et al., 2012).
A major advantage of association rule mining is the natural comprehensibility of the rules
(Simon, Kumar & Li, 2011). Other commonly used machine-learning algorithms for
defect prediction, such as support vector machines or Naive Bayes classifiers, generate
black-box models, which lack interpretability. Even decision trees can be difficult to
interpret due to the subtree-replication problem (Simon, Kumar & Li, 2011). Another
advantage of association rule mining is that the gained rules implicitly extract high-order
interactions among the predictors.

RELATED WORK
Defect prediction is an important research area that has been extensively studied (Hall
et al., 2012; Catal & Diri, 2009). Defect prediction models use code metrics (Menzies,
Greenwald & Frank, 2007; Nagappan, Ball & Zeller, 2006; D’Ambros, Lanza & Robbes, 2012;
Zimmermann, Premraj & Zeller, 2007), change metrics (Nagappan & Ball, 2005;
Hassan, 2009; Kim et al., 2007), or a variety of further metrics (such as code ownership
(Bird et al., 2011; Rahman & Devanbu, 2011), developer interactions (Meneely et al., 2008;

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 3/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Lee et al., 2011), dependencies to binaries (Zimmermann & Nagappan, 2008), mutants
(Bowes et al., 2016), code smells (Palomba et al., 2016) to predict code areas that are
especially defect-prone. Such models allow software engineers to focus quality-assurance
efforts on these areas and thereby support a more efficient resource allocation
(Menzies, Greenwald & Frank, 2007; Weyuker & Ostrand, 2008).

Defect prediction is usually performed at the component, package or file level
(Nagappan & Ball, 2005; Nagappan, Ball & Zeller, 2006; Bacchelli, D’Ambros & Lanza,
2010; Scanniello et al., 2013). Recently, more fine-grained prediction models have been
proposed to narrow down the scope for quality-assurance activities. Kim, Whitehead &
Zhang (2008) presented a model to classify software changes. Hata, Mizuno & Kikuno
(2012) applied defect prediction at the method level and showed that fine-grained
prediction outperforms coarse-grained prediction at the file or package level if efforts
to find the faults are considered. Giger et al. (2012) also investigated prediction models
at the method level and concluded that a Random Forest model operating on change
metrics can achieve good performance. More recently, Pascarella, Palomba & Bacchelli
(2018) replicated this study and confirmed the results. However, they reported that a more
realistic inter-release evaluation of the models shows a dramatic drop in performance
with results close to that of a random classifier and concluded that method-level bug
prediction is still an open challenge (Pascarella, Palomba & Bacchelli, 2018). It is considered
difficult to achieve sufficiently good data quality at the method level (Hata, Mizuno &
Kikuno, 2012; Shippey et al., 2016); publicly available datasets have been provided in Shippey
et al. (2016), Just, Jalali & Ernst (2014), and Giger et al. (2012).

Cross-project defect prediction predicts defects in projects for which no historical
data exists by using models trained on data of other projects (Zimmermann et al., 2009;
Xia et al., 2016). He et al. (2012) investigated the usability of cross-project defect
prediction. They reported that cross-project defect prediction works only in few cases and
requires careful selection of training data. Zimmermann et al. (2009) also provided
empirical evidence that cross-project prediction is a serious problem. They stated that
projects in the same domain cannot be used to build accurate prediction models without
quantifying, understanding, and evaluating process, data and domain. Similar findings
were obtained by Turhan et al. (2009), who investigated the use of cross-company
data for building prediction models. They found that models using cross-company data
can only be “useful in extreme cases such as mission-critical projects, where the cost of
false alarms can be afforded” and suggested using within-company data if available.
While some recent studies reported advances in cross-project defect prediction (Xia et al.,
2016; Zhang et al., 2016; Xu et al., 2018), it is still considered as a challenging task.

Our work differs from the above-mentioned work in the target setting: we do not
predict artifacts that are fault-prone, but instead identify artifacts (methods) that are very
unlikely to contain any faults. While defect prediction aims to detect as many faults
as possible (without too many false positives), and thus strives for a high recall (Mende &
Koschke, 2009), our IDP approach strives to identify those methods that are not
fault-prone to a high certainty. Therefore, we optimized our approach toward the precision
in detecting LFR methods. To the best of our knowledge, this is the first work to

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 4/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


study LFR methods. Moreover, as far as we know, cross-project prediction has not yet been
applied at the method level. To perform the classification, we applied association rule
mining. Association rule mining has previously been applied with success in defect
prediction (Song et al., 2006; Morisaki et al., 2007; Czibula, Marian & Czibula, 2014;
Ma et al., 2010; Karthik & Manikandan, 2010; Zafar et al., 2012).

IDP APPROACH
This section describes the IDP approach, which identifies LFR methods. The approach
comprises the computation of source-code metrics for each method, the data pre-
processing before the mining, and the association rule mining. Figure 1 illustrates the steps.

Metric computation
Like defect prediction models, IDP uses metrics to train a classifier for identifying LFR
methods. For each method, we compute the source-code metrics listed in Table 1 that we
considered relevant to judge whether a method is trivial. They comprise established
length and complexity metrics used in defect prediction, metrics regarding occurrences of
programming-language constructs, and categories describing the purpose of a method.

SLOC is the number of source lines of code, that is, LOC without empty lines and
comments. Cyclomatic Complexity corresponds to the metric proposed by McCabe (1976).
Despite this metric being controversial (Shepperd, 1988; Hummel, 2014)—due to the
fact that it is not actionable, difficult to interpret, and high values do not necessarily
translate to low readability—it is commonly used as variable in defect prediction
(Menzies et al., 2002, 2004; Zimmermann, Premraj & Zeller, 2007). Furthermore, a low
number of paths through a method could be relevant for identifying LFR methods.

Figure 1 Overview of the approach. Metrics for faulty methods are computed at the faulty state; metrics
for non-faulty methods are computed at the state of the last bug-fix commit.

Full-size DOI: 10.7717/peerj-cs.187/fig-1

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 5/28

http://dx.doi.org/10.7717/peerj-cs.187/fig-1
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Maximum nesting depth corresponds to the “maximum number of encapsulated scopes
inside the body of the method” (NDepend, 2017). Deeply nested code is more difficult to
understand, therefore, it could be more fault-prone. Maximum method chaining expresses
the maximum number of chain elements of a method invocation. We consider a
method call to be chained if it is directly invoked on the result from the previous method
invocation. The value for a method is zero if it does not contain any method invocations,
one if no method invocation is chained, or otherwise the maximum number of chain
elements (e.g., two for getId().toString(), three for getId().toString().
subString(1)). Unique variable identifiers counts the distinct names of variables that are
used within the method. The following metrics, metrics M6–M31, count the occurrences of
the respective Java language construct (Gosling et al., 2013).

Next, we derive further metrics from the existing ones. They are redundant, but
correlated metrics do not have any negative effects on association rule mining (except on
the computation time) and may improve the results for the following reason: if an item
generated from a metric is not frequent, rules with this item will be discarded because
they cannot achieve the minimum support; however, an item for a more general
metric may be more frequent and survive. The derived metrics are:

� All Conditions, which sums up If Conditions, Switch-Case Blocks, and Ternary
Operations (M16 + M27 + M29)

� All Arithmetic Operations, which sums up Incrementations, Decrementations, and
Arithmetic Infix Operations (M7 + M8)

Furthermore, we compute to which of the following categories a method belongs
(a method can belong to zero, one, or more categories):

� Constructors: Special methods that create and initialize an instance of a class.
They might be less fault-prone because they often only set class variables or delegate
to another constructor.

� Getters: Methods that return a class or instance variable. They usually consist of a single
statement and can be generated by the IDE.

� Setters: Methods that set the value of a class or instance variable. They usually consist of
a single statement and can be generated by the IDE.

� Empty Methods: Non-abstract methods without any statements. They often exist to meet
an implemented interface, or because the default logic is to do nothing and is supposed
to be overridden in certain sub-classes.

� Delegation Methods: Methods that delegate the call to another method with the same
name and further parameters. They often do not contain any logic besides the delegation.

� ToString Methods: Implementations of Java’s toString method. They are often used
only for debugging purposes and can be generated by the IDE.

Note that we only use source-code metrics and do not consider process metrics. This is
because we want to identify methods that exhibit a LFR due to their code.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 6/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Association rule mining computes frequent item sets from categorical attributes; therefore,
our next step is to discretize the numerical metrics. In defect prediction, discretization is
also applied to the metrics: Shivaji et al. (2013) and McCallum & Nigam (1998) reported that

Table 1 Computed metrics for each method.

Metric name Type

M1 Source lines of code (SLOC) Length

M2 Cyclomatic complexity (CC) Complexity

M3 Maximum nesting depth Maximum value

M4 Maximum method chaining Maximum value

M5 Unique variable identifiers Unique count

M6 Anonymous class declarations Count

M7 Arithmetic In- or Decrementations Count

M8 Arithmetic infix operations Count

M9 Array accesses Count

M10 Array creations Count

M11 Assignments Count

M12 Boolean operators Count

M13 Cast expressions Count

M14 Catch clauses Count

M15 Comparison operators Count

M16 If conditions Count

M17 Inner method declarations Count

M18 Instance-of checks Count

M19 Instantiations Count

M20 Loops Count

M21 Method invocations Count

M22 Null checks Count

M23 Null literals Count

M24 Return statements Count

M25 String literals Count

M26 Super-method invocations Count

M27 Switch-Case blocks Count

M28 Synchronized blocks Count

M29 Ternary operations Count

M30 Throw statements Count

M31 Try blocks Count

M32 All conditions Count

M33 All arithmetic operations Count

M34 Is constructor Boolean

M35 Is setter Boolean

M36 Is getter Boolean

M37 Is empty method Boolean

M38 Is delegation method Boolean

M39 Is ToString method Boolean

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 7/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


binary values can yield better results than using counts when the number of features is low.
We discretize as follows:

� For each of the metrics M1–M5, we inspect their distribution and create three classes.
The first class is for metric values until the first tertile, the second class for values until
the second tertile, and the third class for the remaining values.

� For all count metrics (including the derived ones), we create a binary “has-no”-metric,
which is true if the value is zero, e.g., CountLoops = 0 ⇒ NoLoops = true.

� For the method categories (setter, getter, : : : ), no transformation is necessary as they are
already binary.

Data pre-processing
At this point, we assume that we have a list of faulty methods with their metrics at the
faulty state (the list may contain a method multiple times if it was fixed multiple times)
and a list of all methods. Faulty methods can be obtained by identifying methods that
were changed in bug-fix commits (Zimmermann, Premraj & Zeller, 2007; Giger et al., 2012;
Shippey et al., 2016). A method is considered as faulty when it was faulty at least once
in its history; otherwise it is considered as not faulty. We describe in “Fault data extraction”
how we extracted faulty methods from the Defects4J dataset.

Prior to applying the mining algorithm, we have (1) to address faulty methods with
multiple occurrences, (2) to create a unified list of faulty and non-faulty methods, and
(3) to tackle dataset imbalance.

Steps (1) and (2) require that a method can be uniquely identified. To satisfy this
requirement, we identified a method by its name, its parameter types, and the qualified
name of its surrounding class. We integrated the computation of the metrics into the
source-code analysis tool Teamscale (Heinemann, Hummel & Steidl, 2014; Haas,
Niedermayr & Jurgens, 2019), which is aware of the code history and tracks method
genealogies. Thereby, Teamscale detects method renames or parameter changes so that we
could update the method identifier when it changed.

(1) A method may be fixed multiple times; in this case, a method appears multiple times in
the list of the faulty methods. However, each method should have the same weight and
should therefore be considered only once. Consequently, we consolidate multiple
occurrences of the same method: we replace all occurrences by a new instance and
apply majority voting to aggregate the binary metric values. It is common practice in
defect prediction to have a single instance of every method with a flag that indicates
whether a method was faulty at least once (Menzies et al., 2010; Giger et al., 2012;
Shippey et al., 2016; Mende & Koschke, 2009).

(2) To create a unified dataset, we take the list of all methods, remove those methods that
exist in the set of the faulty methods, and add the set of the faulty methods with the
metrics computed at the faulty state. After doing that, we end up with a list containing
each method exactly once and a flag indicating whether a method was faulty or not.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 8/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


(3) Defect datasets are often highly imbalanced (Khoshgoftaar, Gao & Seliya, 2010),
with faulty methods being underrepresented. Therefore, we apply SMOTE1, a well-
known algorithm for over- and under-sampling, to address imbalance in the dataset
used for training (Longadge, Dongre & Malik, 2013; Chawla et al., 2002). It artificially
generates new entries of the minority class using the nearest neighbors of these
cases and reduces entries from the majority class (Torgo, 2010). If we do not apply
SMOTE to highly imbalanced datasets, many non-expressive rules will be generated
when most methods are not faulty. For example, if 95% of the methods are not
faulty and 90% of them contain a method invocation, rules with high support will be
generated that use this association to identify non-faulty methods. Balancing avoids
those nonsense rules.

IDP classifier
To identify LFR methods, we compute association rules of the type {Metric1, Metric2,
Metric3, : : : } / {NotFaulty}. Examples for the metrics are SlocLowestThird,
NoNullChecks, IsSetter. A method that satisfies all metric predicates of a rule is not faulty to
the certainty expressed by the confidence of the rule. The support of the rule expresses
how many methods with these characteristics exist, and thus, it shows how generalizable
the rule is.

After computing the rules on a training set, we remove redundant ones (see “Association
Rule Mining”) and order the remaining rules first descending by their confidence and
then by their support. To build the LFR classifier, we combine the top n association rules
with the highest confidence values using the logical-or operator. Hence, we consider a
method to have a LFR if at least one of the top n rules matches. To determine n,
we compute the maximum number of rules until the faulty methods in the LFR methods
exceed a certain threshold in the training set.

Of course, IDP can also be used with other machine-learning algorithms. We decided
to use association rule mining because of the natural comprehensibility of the rules
(see “Association Rule Mining”) and because we achieved a better performance compared
to models we trained using Random Forest.

EMPIRICAL STUDY
This section reports on the empirical study that we conducted to evaluate the IDP
approach.

Research questions
We investigate the following questions to research how well methods that contain hardly any
faults can be identified and to study whether IDP is applicable in cross-project scenarios.

RQ 1: What is the precision of the classifier for low-fault-risk methods? To evaluate the
precision of the classifier, we investigate how many methods that are classified as “LFR”
(due to the triviality of their code) are faulty. If we want to use the LFR classifier for
determining methods that require less focus during quality assurance (QA) activities,

1 Synthetic minority over-sampling
technique.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 9/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


such as testing and code reviews, we need to be sure that these methods contain hardly
any faults.

RQ 2: How large is the fraction of the code base consisting of methods classified as
“low fault risk”? We study how common LFR methods are in code bases to find out how
much code is of lower importance for quality-assurance activities. We want to determine
which savings potential can arise if these methods are excluded from QA.

RQ 3: Is a trained classifier for methods with low fault risk generalizable to other
projects? Cross-project defect prediction is used to predict faults in (new) projects, for
which no historical fault data exists, by using models trained on other projects. It is
considered a challenging task in defect prediction (He et al., 2012; Zimmermann et al.,
2009; Turhan et al., 2009). As we expect that the characteristics of LFR methods might be
project-independent, IDP could be applicable in a cross-project scenario. Therefore, we
investigate how generalizable our IDP classifier is for cross-project use.

RQ 4: How does the classifier perform compared to a traditional defect prediction
approach? The main purpose of defect prediction is to detect fault-prone code. Most
traditional defect prediction approaches are binary classifications, which classify a method
either as (likely) faulty or not faulty. Hence, they implicitly also detect methods with a LFR.
Therefore, we compare the performance of our classifier with the performance of a
traditional defect prediction approach.

Study objects
For our analysis, we used data from Defects4J, which was created by Just, Jalali & Ernst
(2014). Defects4J is a database and analysis framework that provides real faults for
six real-world open-source projects written in Java. For each fault, the original commit
before the bug fix (faulty version), the original commit after the bug fix (fixed version),
and a minimal patch of the bug fix are provided. The patch is minimal such that it contains
only code changes that (1) fix the fault and (2) are necessary to keep the code compilable
(e.g., when a bug fix involves method-signature changes). It does not contain changes
that do not influence the semantics (e.g., changes in comments, local renamings),
and changes that were included in the bug-fix commit but are not related to the actual fault
(e.g., refactorings). Due to the manual analysis, this dataset at the method level is much
more precise than other datasets at the same level, such as Shippey et al. (2016) and
Giger et al. (2012), which were generated from version control systems and issue trackers
without further manual filtering. The authors of Just, Jalali & Ernst (2014) confirmed that
they considered every bug fix within a given time span.

Table 2 presents the study objects and their characteristics. We computed the metrics
SLOC and #Methods for the code revision at the last bug-fix commit of each project;
the numbers do not comprise sample and test code. #Faulty methods corresponds to the
number of faulty methods derived from the dataset.

Fault data extraction
Defects4J provides for each project a set of reverse patches2, which represent bug fixes.
To obtain the list of methods that were at least once faulty, we conducted the following

2 A reverse patch reverts previous changes.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 10/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


steps for each patch. First, we checked out the source code from the project repository at
the original bug-fix commit and stored it as fixed version. Second, we applied the
reverse patch to the fixed version to get to the code before the bug fix and stored the
resulting faulty version.

Next, we analyzed the two versions created for every patch. For each file that was
changed between the faulty and the fixed version, we parsed the source code to identify the
methods. We then mapped the code changes to the methods to determine which methods
were touched in the bug fix. After that, we had the list of faulty methods. Figure 2
summarizes these steps.

We inspected all 395 bug-fix patches and found that 10 method changes in the patches
do not represent bug fixes. While the patches are minimal, such that they contain only
bug-related changes (see “Study Objects”), these ten method changes are semantic-
preserving, only necessary because of changed signatures of other methods in the patch,
and therefore included in Defects4J to keep the code compilable. Figure 3 presents an
example. Although these methods are part of the bug fix, they were not changed
semantically and do not represent faulty methods. Therefore, we decided to remove them
from the faulty methods in our analysis. The names of these ten methods are provided
in the dataset to this paper (Niedermayr, Röhm & Wagner, 2019).

Table 2 Study objects.

Name SLOC #Methods #Faulty methods

JFreeChart (Chart) 81.6k 6.8k 39

Google Closure compiler 166.7k 13.0k 148

Apache commons Lang 16.6k 2.0k 73

Apache Commons Math 9.5k 1.2k 132

Mockito 28.3k 2.5k 64

Joda Time 89.0k 10.1k 45

013b1e 6cc308 f81f3f

FAULTY
#1

c1e8ed

FIXED 
#1

8ce240

CLEANED
REVERSE
PATCH

fa30f1

Figure 2 Derivation of faulty methods. The original bug-fix commit c1e8ed to fix the faulty version
f81f3f may contain unrelated changes. Defect4J provides a reverse patch, which contains only the actual
fix. We applied it to the fixed version c1e8ed to get to fa30f1 We then identified methods that were
touched by the patch and computed their metrics at state fa30f1.

Full-size DOI: 10.7717/peerj-cs.187/fig-2

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 11/28

http://dx.doi.org/10.7717/peerj-cs.187/fig-2
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Procedure
After extracting the faulty methods from the dataset, we computed the metrics listed in “IDP
Approach.” We computed them for all faulty methods at their faulty version and for all methods
of the application code3 at the state of the fixed version of the last patch. We used Eclipse
JDT AST (http://www.eclipse.org/jdt/) to create an AST visitor for computing the metrics.
For all further processing, we used the statistical computing software R (R Core Team, 2018).

To discretize the metrics M1–M5, we first computed their value distribution. Figure 4
shows that their values are not normally distributed (most values are very small). To create
three classes for each of these metrics4, we sorted the metric values, and computed
the values at the end of the first and at the end of the second third. We then put all methods
until the last occurrence of the value at the end of the first third into class 1, all methods
until the last occurrence of the value at the end of the second third into class 2, and all
other methods into class 3. Table 3 presents the value ranges of the resulting classes.
The classes are the same for all six projects.

We then aggregated multiple faulty occurrences of the same method (this occurs if a
method is changed in more than one bug-fix patch) and created a unified dataset of faulty
and non-faulty methods (see “Data pre-processing”).

Next, we split the dataset into a training and a test set. For RQ 1 and RQ 2, we used
10-fold cross-validation (Witten et al., 2016, Chapter 5). Using the caret package (from
Jed Wing et al. (2017)), we randomly sampled the dataset of each project into ten stratified
partitions of equal sizes. Each partition is used once for testing the classifier, which is
trained on the remaining nine partitions. To compute the association rules for RQ 3—in
which we study how generalizable the classifier is—for each project, we used the methods
of the other five projects as training set for the classifier.

Before computing association rules, we applied the SMOTE algorithm from the DMwR
package (Torgo, 2010) with a 100% over-sampling and a 200% under-sampling rate to
each training set. After that, each training set was equally balanced (50% faulty methods,
50% non-faulty methods)5.

Figure 3 Example of method change without behavior modification to preserve API compatibility.
The method escapeJavaScript(String) invokes escapeJavaStyleString(String, boolean,
boolean). A further parameter was added to the invoked method; therefore, it was necessary to adjust
the invocation in escapeJavaScript(String). For invocations with the parameter value true, the
behavior does not change (Lang, patch 46, simplified). Full-size DOI: 10.7717/peerj-cs.187/fig-3

3 Code without sample and test code.

4 We did not use the ntile function to
create classes, because it always generates
classes of the same size, such that
instances with the same value may end
up in different classes (e.g., if 50% of the
methods have the complexity value 1,
the first 33.3% will end up in class 1, and
the remaining 16.7% with the same value
will end up in class 2).

5 We computed the results for the
empirical study once with and once
without addressing the data imbalance in
the training set. The prediction perfor-
mance was better when applying
SMOTE, therefore, we decided to use it.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 12/28

http://www.eclipse.org/jdt/
http://dx.doi.org/10.7717/peerj-cs.187/fig-3
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


We then used the implementation of the Apriori algorithm (Agrawal & Srikant, 1994) in
the arules package (Hahsler, Gruen & Hornik, 2005; Hahsler et al., 2017) to compute
association rules with NotFaulty as target item (rule consequent). We set the threshold for
the minimum support to 10% and the threshold for the minimum confidence to 90%
(support and confidence are explained in “Association Rule Mining”). We experimented
with different thresholds and these values produced good results (results for other
configurations are in the dataset provided with this paper; Niedermayr, Röhm & Wagner,
2019). The minimum support avoids overly infrequent (i.e., non-generalizable) rules
from being created, and the minimum confidence prevents the creation of imprecise rules.
Note that no rule (with NotFaulty as rule consequent) can reach a higher support than
50% after the SMOTE pre-processing. After computing the rules, we removed redundant
ones using the corresponding function from the apriori package. We then sorted the
remaining rules descending by their confidence.

Using these rules, we created two classifiers to identify LFR methods. They differ in the
number of comprised rules. The strict classifier uses the top n rules until the share of faulty
methods in all methods (of the training set) exceeds 2.5% in the LFR methods (of the
training set). The more lenient classifier uses the top n rules until the share exceeds 5% in
the LFR methods. (Example: We applied the top one rule to the training set, then
applied the next rule, : : : , until the matched methods in the training set contained

Table 3 Generated classes and their value ranges.

Metric Class 1 Class 2 Class 3

SLOC [0;3] [4;8] [9;∞)
Cyclomatic complexity [0;1] [2;2] [3;∞)
Maximum nesting depth [0;0] [1;1] [2;∞)
Maximum method chaining [0;1] [2;2] [3;∞)
Unique variable identifiers [0;1] [2;3] [4;∞)

Figure 4 Metrics M1–M5 are not normally distributed. (A) SLOC, (B) cyclomatic complexity,
(C) maximum nesting depth, (D) maximum method chaining, and (E) unique variable identifiers.

Full-size DOI: 10.7717/peerj-cs.187/fig-4

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 13/28

http://dx.doi.org/10.7717/peerj-cs.187/fig-4
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


2.5% out of all faults). Figure 5 presents how an increase in the number of selected rules
affects the proportion of LFR methods and the share of faulty methods that they
contain. For RQ 1 and RQ 2, the classifiers were computed for each fold of each project.
For RQ 3, the classifiers were computed once for each project.

To answer RQ 1, we used 10-fold cross-validation to evaluate the classifiers separately for
each project. We computed the number and proportion of methods that were classified as
“LFR” but contained a fault (≈ false positives). Furthermore, we computed precision
and recall. Our main goal is to identify those methods that we can say, with high certainty,
contain hardly any faults. Therefore, we consider it as more important to achieve a high
precision than to predict all methods that do not contain any faults in the dataset.

As the dataset is imbalanced with faulty methods in the minority, the proportion of
faults in LFR methods might not be sufficient to assess the classifiers (SMOTE was applied
only to the training set). Therefore, we further computed the fault-density reduction,
which describes how much less likely the LFR methods contain a fault. For example, if 40%
of all methods are classified as “LFR” and contain 10% of all faults, the factor is 4.
It can also be read as: 40% of all methods contain only one fourth of the expected faults.
We mathematically define the fault-density reduction factor based on methods as

Proportion of LFR methods out of all methods
Proportion of faulty LFR methods out of all faulty methods

and based on SLOC as

Proportion of SLOC in LFR methods out of all SLOC
Proportion of faulty LFR methods out of all faulty methods

:

For both classifiers (strict variant with 2.5%, lenient variant with 5%), we present the
metrics for each project and the resulting median.

To answer RQ 2, we assessed how common methods classified as “LFR” are. For each
project, we computed the absolute number of LFR methods, their proportion out of all
methods, and their extent by considering their SLOC. LFR SLOC corresponds to the
sum of SLOC of all LFR methods. The proportion of LFR SLOC is computed out of all
SLOC of the project.

0%

20%

40%

60%

80%

0 100

Number of Rules

Figure 5 Influence of the number of selected rules (Lang). The number of rules influences the —
proportion of low-fault-risk (LFR) methods and the — share of faulty methods in LFR out of all faulty
methods. Full-size DOI: 10.7717/peerj-cs.187/fig-5

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 14/28

http://dx.doi.org/10.7717/peerj-cs.187/fig-5
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


To answer RQ 3, we computed the association rules for each project with the methods of
the other five projects as training data. Like in RQ 1 and RQ 2, we determined the number of
used top n rules with the same thresholds (2.5% and 5%). To allow a comparison with
the within-project classifiers, we computed the same metrics like in RQ 1 and RQ 2.

To answer RQ 4, we computed for each method the 9 code and 15 change metrics
that were used in Giger et al. (2012). The metrics and their descriptions are listed in
Table 4. We applied Random Forest as machine learning algorithm and configured it like
in the paper of Giger et al. (2012). We computed the results for within-project predictions
using 10-fold cross-validation and we further computed the results for cross-project
predictions like in RQ 3. We present the same evaluation metrics as in the previous
research questions.

Results
This section presents the results to the research questions. The data to reproduce the
results is available at (Niedermayr, Röhm & Wagner, 2019).

Table 4 RQ 4: Code and change metrics used in Giger et al. (2012).

Metric name Description

Code metrics

fanIN Number of methods that reference a given method

fanOUT Number of methods referenced by a given method

localVar Number of local variables in the body of a method

parameters Number of parameters in the declaration

commentToCodeRatio Ratio of comments to source code (line-based)

countPath Number of possible paths in the body of a method

complexity McCabe cyclomatic complexity of a method

execStmt Number of executable source code statements

maxNesting Maximum nested depth of all control structures

Change metrics

methodHistories Number of times a method was changed

authors Number of distinct authors that changed a method

stmtAdded Sum of all source code statements added to a method body

maxStmtAdded Maximum number of source code statements added to a method body

avgStmtAdded Average number of source code statements added to a method body

stmtDeleted Sum of all source code statements deleted from a method body

maxStmtDeleted Maximum number of source code statements deleted from a method body

avgStmtDeleted Average number of source code statements deleted from a method body

churn Sum of churn (stmtAdded—stmtDeleted)

maxChurn Maximum churn

avgChurn Average churn

decl Number of method declaration changes

cond Number of condition expression changes in a method body

elseAdded Number of added else-parts in a method body

elseDeleted Number of deleted else-parts from a method body

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 15/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


RQ 1: What is the precision of the classifier for low-fault-risk methods? Table 5 presents
the results. The methods classified to have LFR by the stricter classifier, which allows a
maximum fault share of 2.5% in the LFR methods in the (balanced) training data, contain
between two and eight faulty methods per project. The more lenient classifier, which allows
a maximum fault share of 5%, classified between four and 15 faulty methods as LFR. The
median proportion of faulty methods in LFR methods is 0.3% resp. 0.4%.

The fault-density reduction factor for the stricter classifier ranges between 4.3 and
10.9 (median: 5.7) when considering methods and between 1.5 and 4.4 (median: 3.2) when
considering SLOC. In the project Lang, 28.6% of all methods with 13.8% of the SLOC
are classified as LFR and contain 4.1% of all faults, thus, the factor is 7.0 (SLOC-based: 3.4).
The factor never falls below 1 for both classifiers.

Table 6 exemplarily presents the top three rules for Lang. Methods that work with fewer
than two variables and do not invoke any methods as well as short methods without
arithmetic operations, cast expressions, and method invocations are highly unlikely to
contain a fault.
RQ 2: How large is the fraction of the code base consisting of methods classified as “low
fault risk”? Table 5 presents the results. The stricter classifier classified between 16.5% and

Table 5 RQ 1, RQ 2: Evaluation of within-project IDP to identify low-fault-risk (LFR) methods.

Project Faults in LFR LFR methods LFR methods LFR SLOC LFR methods
contain : : : %
of all faults

Fault-density reduction

# % Prec Rec # % # % (Methods) (SLOC)

Within-project IDP, 10-fold: min. support = 10%, min. confidence = 90%, rules until fault share in training set = 2.5%

Chart 4 0.1% 99.9% 44.1% 2,995 43.9% 11,228 15.8% 10.3% 4.3 1.5

Closure 6 0.2% 99.8% 29.2% 3,759 28.9% 15,497 10.5% 4.1% 7.1 2.6

Lang 3 0.5% 99.5% 29.6% 576 28.6% 2,242 13.8% 4.1% 7.0 3.4

Math 2 1.1% 98.9% 18.4% 190 16.5% 570 4.8% 1.5% 10.9 3.1

Mockito 5 0.6% 99.4% 35.1% 875 34.4% 6,128 25.1% 7.8% 4.4 3.2

Time 8 0.1% 99.9% 80.4% 8,063 80.2% 62,063 78.1% 17.8% 4.5 4.4

Median 0.3% 99.7% 32.3% 31.7% 14.8% 6.0% 5.7 3.2

Within-project IDP, 10-fold: min. support = 10%, min. confidence = 90%, rules until fault share in training set = 5%

Chart 4 0.1% 99.9% 44.8% 3,040 44.6% 11,563 16.3% 10.3% 4.3 1.6

Closure 15 0.3% 99.7% 41.8% 5,385 41.5% 25,981 17.6% 10.1% 4.1 1.7

Lang 6 0.7% 99.3% 45.0% 879 43.7% 3,630 22.3% 8.2% 5.3 2.7

Math 7 2.7% 97.3% 24.3% 255 22.1% 878 7.3% 5.3% 4.2 1.4

Mockito 6 0.5% 99.5% 47.8% 1,189 46.8% 8,260 33.8% 9.4% 5.0 3.6

Time 9 0.1% 99.9% 82.8% 8,298 82.5% 63,333 79.7% 20.0% 4.1 4.0

Median 0.4% 99.6% 44.9% 44.1% 20.0% 9.8% 4.3 2.2

IDP can identify methods with LFR. On average, only 0.3% of the methods classified
as “LFR” by the strict classifier are faulty. The identified LFR methods are, on average,
5.7 times less likely to contain a fault than an arbitrary method in the dataset.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 16/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


80.2% of the methods as LFR (median: 31.7%, mean: 38.8%), the more lenient classifier
matched between 22.1% and 82.5% of the methods (median: 44.1%, mean: 46.9%). The
median of the comprised SLOC in LFR methods is 14.8% (mean: 24.7%) respectively 20.0%
(mean: 29.5%).

RQ 3: Is a trained classifier for methods with low fault risk generalizable to
other projects? Table 7 presents the results for the cross-project prediction with training

Table 6 Top three association rules for Lang (within-project, fold 1).

# Rule Support (%) Confidence (%)

1 {UniqueVariableIdentifiersLessThan2, NoMethodInvocations}
⇒ {NotFaulty}

10.98 100.00

2 {SlocLessThan4, NoMethodInvocations, NoArithmeticOperations}
⇒ {NotFaulty}

10.98 100.00

3 {SlocLessThan4, NoMethodInvocations, NoCastExpressions}
⇒ {NotFaulty}

10.60 100.00

Table 7 RQ 3: Evaluation of cross-project IDP.

Project Faults in LFR LFR methods LFR methods LFR SLOC LFR methods
contain : : : %
of all faults

Fault-density reduction

# % Prec. Rec. # % # % (Methods) (SLOC)

Cross-project IDP: min. support = 10%, min. confidence = 90%, rules until fault share in training set = 2.5%

Chart 3 0.1% 99.9% 32.1% 2,182 32.0% 7,434 10.5% 7.7% 4.2 1.4

Closure 2 0.1% 99.9% 25.0% 3,207 24.7% 11,584 7.9% 1.4% 18.3 5.8

Lang 1 0.2% 99.8% 23.1% 449 22.3% 1,357 8.3% 1.4% 16.3 6.1

Math 8 2.9% 97.1% 26.6% 280 24.3% 1,129 9.4% 6.1% 4.0 1.6

Mockito 1 0.2% 99.8% 21.7% 539 21.2% 1,698 6.9% 1.6% 13.6 4.4

Time 1 0.1% 99.9% 18.4% 1,845 18.3% 5,807 7.3% 2.2% 8.3 3.3

Median 0.2% 99.8% 24.0% 23.3% 8.1% 1.9% 10.9 3.9

Cross-project IDP: min. support = 10%, min. confidence = 90%, rules until fault share in training set = 5%

Chart 4 0.2% 99.8% 35.5% 2,411 35.4% 9,363 13.2% 10.3% 3.4 1.3

Closure 4 0.1% 99.9% 25.9% 3,327 25.6% 15,583 10.6% 2.7% 9.5 3.9

Lang 4 0.7% 99.3% 27.7% 542 26.9% 1,959 12.0% 5.5% 4.9 2.2

Math 18 5.1% 94.9% 32.9% 354 30.7% 1,634 13.7% 13.6% 2.2 1.0

Mockito 1 0.2% 99.8% 25.0% 620 24.4% 3,495 14.3% 1.6% 15.6 9.1

Time 1 0.0% 100.0% 20.0% 2,007 20.0% 7,552 9.5% 2.2% 9.0 4.3

Median 0.2% 99.8% 26.8% 26.3% 12.6% 4.1% 6.9 3.1

Using within-project IDP, on average, 32–44% of the methods, comprising about
15–20% of the SLOC, can be assigned a lower importance during testing.

In the best case, when ignoring 16.5% of the methods (4.8% of the SLOC), it is still
possible to catch 98.5% of the faults (Math).

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 17/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


data from the respective other projects. Compared to the results of the within-project
prediction, except for Math, the number of faults in LFR methods decreased or
stayed the same in all projects for both classifier variants. While the median proportion
of faults in LFR methods slightly decreased, the proportion of LFR methods also
decreased in all projects except Math. The median proportion of LFR methods is
23.3% (SLOC: 8.1%) for the stricter classifier and 26.3% (SLOC: 12.6%) for the more
lenient classifier.

The fault-density reduction improved compared to the within-project prediction for
both the method and SLOC level in both classifier variants: For the stricter classifier,
the median of the method-based factor is 10.9 (+5.2); the median of the SLOC-based factor
is 3.9 (+0.7). Figure 6 illustrates the fault-density reduction for both within-project
(RQ 1, RQ 2) and cross-project (RQ 3) prediction.

RQ 4: How does the classifier perform compared to a traditional defect prediction
approach? Table 8 presents the results of the within- and cross-project prediction
according to the approach by Giger et al. (2012). In the within-project prediction scenario,
the classifier predicts on average 99.4% of the methods to be non-faulty. As a consequence,
the average recall regarding non-faulty methods reaches 99.9%. However, the number
of methods that are classified as non-faulty but actually contain a fault increases by
magnitudes compared to the IDP approach (i.e., precision deteriorates). For example,
77% of Closure’s faulty methods are wrongly classified as non-faulty. The median
fault-density reduction is 1.6 at the method level (strict IDP: 5.7) and 1.4 when considering
SLOC (strict IDP: 3.2). Consequently, methods classified by the traditional approach
to have a LFR are still less likely to contain a fault than other methods, but the difference is
not as high as in the IDP classifier.

0

5

10

15

chart closure lang math mockito time
Fa

ul
t−

de
ns

ity
 r

ed
uc

tio
n

(m
et

ho
ds

)

Figure 6 Comparison of the IDP within-project ( 2.5%, 5.0%) with the IDP cross-project ( 2.5%,
5.0%) classifiers (method-based). The fault-density reduction expresses how much less likely a LFR

method contains a fault (definition in Procedure). Higher values are better (Example: If 40% of the methods
are LFR and contain 5% of all faults, the factor is 8). The dashed line is at one; no value falls below.

Full-size DOI: 10.7717/peerj-cs.187/fig-6

Using cross-project IDP, on average, 23–26% of the methods, comprising about
8–13% of the SLOC, can be classified as “LFR.” The methods classified by the stricter
classifier contain, on average, less than one eleventh of the expected faults.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 18/28

http://dx.doi.org/10.7717/peerj-cs.187/fig-6
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


The results in the cross-project prediction scenario are similar. In four of the six
projects, the number of faults in LFR methods increased compared with the within-project
prediction scenario. The fault-density reduction deteriorated to 1.2 both at the method
and SLOC level (strict IDP: 10.9 resp. 3.9). For all projects, IDP outperformed the
traditional approach.

DISCUSSION
The results of our empirical study show that only very few LFR methods actually contain a
fault, and thus, they indicate that IDP can successfully identify methods that are not
fault-prone. On average, 31.7% of the methods (14.8% of the SLOC) matched by the strict
classifier contain only 6.0% of all faults, resulting in a considerable fault-density reduction
for the matched methods. In any case, LFR methods are less fault-prone than other
methods (fault-density reduction is higher than one in all projects); based on methods,
LFR methods are at least twice less likely to contain a fault. For the stricter classifier,
the extent of the matched methods, which could be deferred in testing, is between 5% and
78% of the SLOC of the respective project. The more lenient classifier matches more
methods and SLOC at the cost of a higher fault proportion, but still achieves satisfactory
fault-density reduction values. This shows that the balance between fault risk and matched
extent can be influenced by the number of considered rules to reflect the priorities of
a software project.

Interestingly, the cross-project IDP classifier, which is trained on data from the
respective other five projects, exhibits a higher precision than the within-project IDP
classifier. Except for the Math project, the LFR methods contain fewer faulty methods in

Table 8 RQ 4: Results of a traditional defect prediction approach.

Project Faults in LFR LFR methods LFR methods LFR SLOC LFR methods
contain : : : %
of all faults

Fault-density reduction

# % Prec. Rec # % # % (Methods) (SLOC)

Within-project defect prediction: traditional approach used in Giger et al. (2012)

Chart 36 0.5% 99.5% 99.9% 6,785 99.9% 70,457 99.6% 92.3% 1.1 1.1

Closure 114 0.9% 99.1% 99.9% 12,862 99.6% 142,518 97.6% 77.0% 1.3 1.3

Lang 36 1.9% 98.1% 99.5% 1,905 97.6% 14,462 91.2% 49.3% 2.0 1.9

Math 62 6.4% 93.6% 98.4% 967 91.9% 7,234 66.5% 47.0% 2.0 1.4

Mockito 46 1.9% 98.1% 99.8% 2,481 99.1% 24,232 99.5% 71.9% 1.4 1.4

Time 26 0.3% 99.7% 100.0% 9,995 99.8% 78,809 99.8% 57.8% 1.7 1.7

Median 1.4% 98.6% 99.9% 99.4% 98.6% 64.8% 1.6 1.4

Cross-project defect prediction: traditional approach used in Giger et al. (2012)

Chart 32 0.5% 99.5% 99.2% 6,755 99.1% 68,214 96.3% 82.1% 1.2 1.2

Closure 82 0.6% 99.4% 99.5% 12,859 99.0% 141,322 96.0% 55.4% 1.8 1.7

Lang 52 2.6% 97.4% 99.9% 1,990 98.9% 15,085 92.8% 71.2% 1.4 1.3

Math 103 9.2% 90.8% 99.8% 1,123 97.3% 9,512 79.5% 78.0% 1.2 1.0

Mockito 58 2.3% 97.7% 100.0% 2,534 99.8% 24,420 99.8% 90.6% 1.1 1.1

Time 39 0.4% 99.6% 100.0% 10,050 99.9% 79,191 99.7% 86.7% 1.2 1.2

Median 1.5% 98.5% 99.9% 99.0% 96.1% 80.0% 1.2 1.2

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 19/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


the cross-project prediction scenario. This is in line with the method-based fault-
density reduction factor of the strict classifier, which is in four of six cases better in the
cross-project scenario (SLOC-based: three of six cases). However, the proportion of
matched methods decreased compared to the within-project prediction in most projects.
Accordingly, the cross-project results suggest that a larger, more diversified training
set identifies LFR methods more conservatively, resulting in a higher precision and
lower matching extent.

Math is the only project in which IDP within-project prediction outperformed IDP
cross-project prediction. This project contains many methods with mathematical
computations expressed by arithmetic operations, which are often wrapped in
loops or conditions; most of the faults are located in these methods. Therefore, the
within-project classifiers used few, very precise rules for the identification of LFR
methods.

To sum up, our results show that the IDP approach can be used to identify methods
that are, due to the “triviality” of their code, less likely to contain any faults. Hence,
these methods require less focus during quality-assurance activities. Depending on
the criticality of the system and the risk one is willing to take, the development of
tests for these methods can be deferred or even omitted in case of insufficient
available test resources. The results suggest that IDP is also applicable in cross-project
prediction scenarios, indicating that characteristics of LFR methods differ less
between projects than characteristics of faulty methods do. Therefore, IDP can be
used in (new) projects with no (precise) historical fault data to prioritize the code to
be tested.

Limitations
A limitation of IDP is that even LFR methods can contain faults. An inspection of faulty
methods incorrectly classified to have a LFR showed that some faults were fixed by
only adding further statements (e.g., to handle special cases). This means that a method
can be faulty even if the existing code as such is not faulty (due to missing code).
Further imaginable examples for faulty LFR methods are simple getters that return the
wrong variable, or empty methods that are unintentionally empty. Therefore, while
these methods are much less fault-prone, it cannot be assumed that they never contain any
fault. Consequently, excluding LFR methods from testing and other QA activities carries a
risk that needs to be kept in mind.

Relation to defect prediction
As discussed in detail in “Introduction,” IDP presents another view on defect prediction.
The focus of IDP on LFR methods requires an optimization toward precision, so that
hardly any faulty methods are erroneously classified as trivial. The comparison with
a traditional defect prediction approach showed that IDP classified much fewer methods as
trivial. However, methods classified by IDP as trivial contain far fewer faulty methods,
that is, IDP achieves a higher precision. Consequently, the identified trivial methods can be
deferred or even excluded from quality-assurance activities.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 20/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Threats to validity
Next, we discuss the threats to internal and external validity.

Threats to internal validity
The learning and evaluation was performed on information extracted from Defects4J
(Just, Jalali & Ernst, 2014). Therefore, the quality of our data depends on the quality of
Defects4J. Common problems for defect datasets created by analyzing changes in commits
that reference a bug ticket in an issue tracking system are as follows. First, commits that
fix a fault but do not reference a ticket in the commit message cannot be detected
(Bachmann et al., 2010). Consequently, the set of commits that reference a bug fix may
not be a fair representation of all faults (Bird et al., 2009; D’Ambros, Lanza & Robbes,
2012; Giger et al., 2012). Second, bug tickets in the issue tracker may not always
represent faults and vice versa. Herzig, Just & Zeller (2013) pointed out that a significant
amount of tickets in the issue trackers of open-source projects is misclassified. Therefore,
it is possible that not all bug-fix commits were spotted. Third, methods may contain
faults that have not been detected or fixed yet. In general, it is not possible to prove that a
method does not contain any faults. Fourth, a commit may contain changes (such
as refactorings) that are not related to the bug fix, but this problem does not affect the
Defects4J dataset due to the authors’ manual inspection. These threats are present
in nearly all defect prediction studies, especially in those operating at the method level.
Defect prediction models were found to be resistant to such kind of noise to a certain
extent (Kim et al., 2011).

Defects4J contains only faults that are reproducible and can be precisely mapped to
methods; therefore, faulty methods may be under-approximated. In contrast, other
datasets created without manual post-processing tend to over-approximate faults. To
mitigate this threat, we replicated our IDP evaluation with two study objects used in Giger
et al. (2012). The observed results were similar to our study.

Threats to external validity
The empirical study was performed with six mature open-source projects written in
Java. The projects are libraries and their results may not be applicable to other
application types, for example, large industrial systems with user interfaces. The results
may also not be transferable to projects of other languages, for the following reasons:
First, Java is a strongly typed language that provides type safety. It is unclear if the
IDP approach works for languages without type safety, because it could be that
even simple methods in such languages exhibit a considerable amount of faults.
Second, in case the approach as such is applicable to other languages, the collected metrics
and the LFR classifier need to be validated and adjusted. Other languages may use
language constructs in a different way or use constructs that do not exist in Java.
For example, a classifier for the C language should take constructs such as GOTOs and the
use of pointer arithmetic into consideration. Furthermore, the projects in the dataset
(published in 2014) did not contain code with lambda expressions introduced in Java 8
(http://www.oracle.com/technetwork/articles/java/architect-lambdas-part1-2080972.html).

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 21/28

http://www.oracle.com/technetwork/articles/java/architect-lambdas-part1-2080972.html
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Therefore, in newer projects that make use of lambda expressions, the presence of lambdas
should be taken into consideration when classifying methods. Consequently, further studies
are necessary to determine whether the results are generalizable.

Like in most defect prediction studies, we treated all faults as equal and did not consider
their severity. According to Ostrand, Weyuker & Bell (2004), the severity of bug tickets
is often highly subjective. In reality, not all faults have the same importance, because
some cause higher failure follow-up costs than others.

CONCLUSION
Developer teams often face the problem of scarce test resources and need therefore to
prioritize their testing efforts (e.g., when writing new automated unit tests). Defect
prediction can support developers in this activity. In this paper, we propose an inverse
view on defect prediction (IDP) to identify methods that are so “trivial” that they
contain hardly any faults. We study how unerringly such LFR methods can be identified,
how common they are, and whether the proposed approach is applicable for cross-
project predictions.

We show that IDP using association rule mining on code metrics can successfully
identify LFR methods. The identified methods contain considerably fewer faults than the
average code and can provide a savings potential for QA activities. Depending on the
parameters, a lower priority for QA can be assigned on average to 31.7% resp. 44.1% of the
methods, amounting to 14.8% resp. 20.0% of the SLOC. While cross-project defect
prediction is a challenging task (He et al., 2012; Zimmermann et al., 2009), our results
suggest that the IDP approach can be applied in a cross-project prediction scenario at the
method level. In other words, an IDP classifier trained on one or more (Java open-source)
projects can successfully identify LFR methods in other Java projects for which no—or
no precise—fault data exists.

For future work, we want to replicate this study with closed-source projects, projects
of other application types, and projects in other programming languages. It is also
of interest to investigate which metrics and classifiers are most effective for the IDP
purpose and whether they differ from the ones used in traditional defect prediction.
Moreover, we plan to study whether code coverage of LFR methods differs from code
coverage of other methods. If guidelines to meet a certain code coverage level are
set by the management, unmotivated testers may add tests for LFR methods first because
it might be easier to write tests for those methods. Consequently, more complex methods
with a higher fault risk may remain untested once the target coverage is achieved.
Therefore, we want to investigate whether this is a problem in industry and whether
it can be addressed with an adjusted code-coverage computation, which takes LFR
methods into account.

ACKNOWLEDGEMENTS
We thank Nils Göde, Florian Deißenböck, and the anonymous reviewers for their
valuable feedback.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 22/28

http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


ADDITIONAL INFORMATION AND DECLARATIONS

Funding
This work was funded by the Institute of Software Technology of the University of
Stuttgart and the German Federal Ministry of Education and Research (BMBF), grant
“SOFIE, 01IS18012A.” The funders had no role in study design, data collection and
analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors:
Institute of Software Technology of the University of Stuttgart.
German Federal Ministry of Education and Research (BMBF): SOFIE, 01IS18012A.

Competing Interests
Rainer Niedermayr and Tobias Röhm are employees of CQSE GmbH. The responsibility
for this article lies with the authors.

Author Contributions
� Rainer Niedermayr conceived and designed the experiments, performed the
experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared
figures and/or tables, performed the computation work, authored or reviewed drafts of
the paper, approved the final draft.

� Tobias Röhm conceived and designed the experiments, analyzed the data, authored or
reviewed drafts of the paper, approved the final draft.

� Stefan Wagner conceived and designed the experiments, analyzed the data, authored or
reviewed drafts of the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability:

Niedermayr, Rainer (2019): Dataset: Too Trivial To Test? figshare. Fileset.
DOI 10.6084/m9.figshare.6194171.v1.

REFERENCES
Agrawal R, Imieli�nski T, Swami A. 1993. Mining association rules between sets of items in large

databases. ACM SIGMOD Record 22(2):207–216 DOI 10.1145/170036.170072.

Agrawal R, Srikant R. 1994. Fast algorithms for mining association rules. In: Proceedings of 20th
International Conference on Very Large Data Bases (VLDB’94). San Francisco: Morgan
Kaufmann, Vol. 1215, 487–499.

Bacchelli A, D’Ambros M, Lanza M. 2010. Are popular classes more defect prone? In: Proceedings
of 13th International Conference on Fundamental Approaches to Software Engineering
(FASE’10). Berlin, Heidelberg: Springer, Vol. 6013, 59–73 DOI 10.1007/978-3-642-12029-9_5.

Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A. 2010. The missing links: bugs
and bug-fix commits. In: Proceedings of 18th International Symposium on Foundations of
Software Engineering (FSE’10). New York: ACM, 97–106.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 23/28

https://doi.org/10.6084/m9.figshare.6194171.v1
http://dx.doi.org/10.1145/170036.170072
http://dx.doi.org/10.1007/978-3-642-12029-9_5
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Bayardo RJ, Agrawal R, Gunopulos D. 1999. Constraint-based rule mining in large, dense
databases. In: Proceedings of the 15th International Conference on Data Engineering (ICDE’99).
Piscataway: IEEE, 188–197 DOI 10.1109/ICDE.1999.754924.

Bertolino A. 2007. Software testing research: achievements, challenges, dreams. In: Proceedings of
Future of Software Engineering (FOSE’07). Los Alamitos: IEEE Computer Society, 85–103
DOI 10.1109/FOSE.2007.25.

Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P. 2009. Fair and
balanced? bias in bug-fix datasets. In: Proceedings of 7th Joint Meeting of the European Software
Engineering Conference and the Symposium on the Foundations of Software Engineering
(ESEC/FSE’09). New York: ACM, 121–130.

Bird C, Nagappan N, Murphy B, Gall H, Devanbu P. 2011. Don’t touch my code! examining
the effects of ownership on software quality. In: Proceedings of 8th Joint Meeting of the European
Software Engineering Conference and the Symposium on the Foundations of Software Engineering
(ESEC/FSE’11). New York: ACM, 4–14.

Bowes D, Hall T, Harman M, Jia Y, Sarro F, Wu F. 2016. Mutation-aware fault prediction.
In: Proceedings of 25th International Symposium on Software Testing and Analysis (ISSTA’16).
New York: ACM, 330–341.

Catal C. 2011. Software fault prediction: a literature review and current trends. Expert Systems
with Applications 38(4):4626–4636 DOI 10.1016/j.eswa.2010.10.024.

Catal C, Diri B. 2009. A systematic review of software fault prediction studies. Expert Systems with
Applications 36(4):7346–7354 DOI 10.1016/j.eswa.2008.10.027.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. 2002. Smote: synthetic minority over-sampling
technique. Journal of Artificial Intelligence Research 16:321–357 DOI 10.1613/jair.953.

Czibula G, Marian Z, Czibula IG. 2014. Software defect prediction using relational
association rule mining. Information Sciences 264:260–278 DOI 10.1016/j.ins.2013.12.031.

D’Ambros M, Lanza M, Robbes R. 2012. Evaluating defect prediction approaches: a
benchmark and an extensive comparison. Empirical Software Engineering 17(4–5):531–577
DOI 10.1007/s10664-011-9173-9.

Jed Wing MKC, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B,
Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, Hunt T, the R Core Team.
2017. caret: Classification and Regression Training. R package version 6.0-76. Available at
https://topepo.github.io/caret/.

Giger E, D’Ambros M, Pinzger M, Gall HC. 2012. Method-level bug prediction. In: Proceedings of
6th International Symposium on Empirical Software Engineering and Measurement (ESEM’12).
New York: ACM, 171–180.

Gosling J, Joy B, Steele G, Bracha G, Buckley A. 2013. The Java language specification, Java SE 7
edition, february 2012. Available at http://docs.oracle.com/javase/specs/jls/se7/html/index.html
(accessed 31 March 2019).

Hahsler M, Buchta C, Gruen B, Hornik K. 2017. arules: mining association rules and frequent
itemsets. R package version 1.5-2. Available at http://mhahsler.github.io/arules/.

Hahsler M, Gruen B, Hornik K. 2005. arules—A computational environment for mining
association rules and frequent item sets. Journal of Statistical Software 14(15):1–25
DOI 10.18637/jss.v014.i15.

Hall T, Beecham S, Bowes D, Gray D, Counsell S. 2012. A systematic literature review on
fault prediction performance in software engineering. IEEE Transactions on Software
Engineering 38(6):1276–1304 DOI 10.1109/tse.2011.103.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 24/28

http://dx.doi.org/10.1109/ICDE.1999.754924
http://dx.doi.org/10.1109/FOSE.2007.25
http://dx.doi.org/10.1016/j.eswa.2010.10.024
http://dx.doi.org/10.1016/j.eswa.2008.10.027
http://dx.doi.org/10.1613/jair.953
http://dx.doi.org/10.1016/j.ins.2013.12.031
http://dx.doi.org/10.1007/s10664-011-9173-9
https://topepo.github.io/caret/
http://docs.oracle.com/javase/specs/jls/se7/html/index.html
http://mhahsler.github.io/arules/
http://dx.doi.org/10.18637/jss.v014.i15
http://dx.doi.org/10.1109/tse.2011.103
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Haas R, Niedermayr R, Jurgens E. 2019. Teamscale: tackle technical debt and control the quality
of your software. In: Proceedings of the 2nd International Conference on Technical Debt
(TechDebt'19 Tools Track). Piscataway: IEEE.

Hassan AE. 2009. Predicting faults using the complexity of code changes. In: Proceedings of
31st International Conference on Software Engineering (ICSE’09). Los Alamitos: IEEE Computer
Society, 78–88 DOI 10.1109/ICSE.2009.5070510.

Hata H, Mizuno O, Kikuno T. 2012. Bug prediction based on fine-grained module histories.
In: Proceedings of 34th International Conference on Software Engineering (ICSE’12). Piscataway:
IEEE, 200–210.

He Z, Shu F, Yang Y, Li M, Wang Q. 2012. An investigation on the feasibility of cross-project defect
prediction. Automated Software Engineering 19(2):167–199 DOI 10.1007/s10515-011-0090-3.

Heinemann L, Hummel B, Steidl D. 2014. Teamscale: Software quality control in real-time.
In: Proceedings of 36th International Conference on Software Engineering (ICSE’14), New York:
ACM.

Herzig K, Just S, Zeller A. 2013. It’s not a bug, it’s a feature: How misclassification impacts bug
prediction. In: Proceedings of 35th International Conference on Software Engineering (ICSE’13).
Piscataway: IEEE, 392–401.

Hummel B. 2014. McCabe’s cyclomatic complexity and Why We Don’t Use It. Available at
https://www.cqse.eu/en/blog/mccabe-cyclomatic-complexity/ (accessed 31 March 2019).

Just R, Jalali D, Ernst MD. 2014. Defects4J: a database of existing faults to enable controlled testing
studies for Java programs. In: Proceedings of 23rd International Symposium on Software Testing
and Analysis (ISSTA’14). Tool demo, New York: ACM, 437–440.

Karthik R, Manikandan N. 2010. Defect association and complexity prediction by mining
association and clustering rules. In: Proceedings of 2nd International Conference on Computer
Engineering and Technology (ICCET’10). Piscataway: IEEE, Vol. 7, 569.

Khoshgoftaar TM, Gao K, Seliya N. 2010. Attribute selection and imbalanced data: problems in
software defect prediction. In: Proceedings of 22nd International Conference on Tools with
Artificial Intelligence (ICTAI’10). Piscataway: IEEE, Vol. 1, 137–144.

Kim S, Whitehead EJ Jr, Zhang Y. 2008. Classifying software changes: clean or buggy?
IEEE Transactions on Software Engineering 34(2):181–196 DOI 10.1109/tse.2007.70773.

Kim S, Zhang H, Wu R, Gong L. 2011. Dealing with noise in defect prediction. In: Proceedings
of 33rd International Conference on Software Engineering (ICSE’11). Piscataway: IEEE,
481–490.

Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A. 2007. Predicting faults from cached history.
In: Proceedings of 29th International Conference on Software Engineering (ICSE’07).
Los Alamitos: IEEE Computer Society, 489–498.

Lee T, Nam J, Han D, Kim S, In HP. 2011. Micro interaction metrics for defect prediction.
In: Proceedings of 8th Joint Meeting of the European Software Engineering Conference and the
Symposium on the Foundations of Software Engineering (ESEC/FSE’11). New York: ACM,
311–321.

Longadge R, Dongre S, Malik L. 2013. Class imbalance problem in data mining: review.
International Journal of Computer Science and Network 2(1):83–87.

Ma B, Dejaeger K, Vanthienen J, Baesens B. 2010. Software defect prediction based on association
rule classification. In: Proceedings of 1st International Conference on E-Business Intelligence
(ICEBI’10). Paris: Atlantis Press.

McCabe TJ. 1976. A complexity measure. IEEE Transactions on Software Engineering 2(4):308–320.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 25/28

http://dx.doi.org/10.1109/ICSE.2009.5070510
http://dx.doi.org/10.1007/s10515-011-0090-3
https://www.cqse.eu/en/blog/mccabe-cyclomatic-complexity/
http://dx.doi.org/10.1109/tse.2007.70773
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


McCallum A, Nigam K. 1998. A comparison of event models for naive bayes text classification.
In: Proceedings of Workshop on Learning for Text Categorization (AAAI-98-W7). Palo Alto, California:
AAAI Press, Vol. 752, 41–48.

Mende T, Koschke R. 2009. Revisiting the evaluation of defect prediction models. In: Proceedings
of 5th International Conference on Predictor Models in Software Engineering (PROMISE’09).
New York: ACM, 7.

Meneely A, Williams L, Snipes W, Osborne J. 2008. Predicting failures with developer networks
and social network analysis. In: Proceedings of 16th International Symposium on Foundations of
Software Engineering (FSE’08). New York: ACM, 13–23.

Menzies T, Di Stefano JS. 2004. How good is your blind spot sampling policy? In: Proceedings of
8th International Symposium on High Assurance Systems Engineering. Piscataway: IEEE,
129–138.

Menzies T, Di Stefano JS, Chapman M, McGill K. 2002. Metrics that matter. In: Proceedings of
27th Annual NASA Goddard Software Engineering Workshop. Piscataway: IEEE, IEEE/NASA,
51–57.

Menzies T, Di Stefano J, Orrego A, Chapman R. 2004. Assessing predictors of software defects.
In: Proceedings of Workshop Predictive Software Models (PSM’04).

Menzies T, Greenwald J, Frank A. 2007. Data mining static code attributes to learn defect
predictors. IEEE Transactions on Software Engineering 33(1):2–13
DOI 10.1109/tse.2007.256941.

Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A. 2010. Defect prediction from static
code features: current results, limitations, new approaches. Automated Software Engineering
17(4):375–407 DOI 10.1007/s10515-010-0069-5.

Menzies T, Stefano J, Ammar K, McGill K, Callis P, Davis J, Chapman R. 2003. When can we
test less? In: Proceedings of 9th International Symposium on Software Metrics (SMS’03).
Piscataway: IEEE, 98–110.

Morisaki S, Monden A, Matsumura T, Tamada H, Matsumoto K-I. 2007. Defect data analysis
based on extended association rule mining. In: Proceedings of 4th International Workshop on
Mining Software Repositories (MSR’07). Los Alamitos: IEEE Computer Society.

Nagappan N, Ball T. 2005. Use of relative code churn measures to predict system defect density.
In: Proceedings of 27th International Conference on Software Engineering (ICSE’15). Piscataway:
IEEE, 284–292.

Nagappan N, Ball T, Zeller A. 2006. Mining metrics to predict component failures. In: Proceedings
of 28th International Conference on Software Engineering (ICSE’06). New York: ACM, 452–461.

NDepend. 2017. Code metrics definitions. Available at http://www.ndepend.com/docs/code-
metrics#ILNestingDepth (accessed 8 August 2017).

Niedermayr R, Röhm T, Wagner S. 2019. Dataset: Too trivial to test? Available at
https://figshare.com/articles/Dataset_Too_Trivial_To_Test_/6194171.

Ostrand TJ, Weyuker EJ, Bell RM. 2004. Where the bugs are. ACM SIGSOFT Software
Engineering Notes. Vol. 29. New York: ACM, 86–96.

Ostrand TJ, Weyuker EJ, Bell RM. 2005. Predicting the location and number of faults in large
software systems. IEEE Transactions on Software Engineering 31(4):340–355
DOI 10.1109/tse.2005.49.

Palomba F, Zanoni M, Fontana FA, De Lucia A, Oliveto R. 2016. Smells like teen spirit:
improving bug prediction performance using the intensity of code smells. In: Proceedings.
32nd International Conference on Software Maintenance and Evolution (ICSME’16). Piscataway:
IEEE, 244–255.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 26/28

http://dx.doi.org/10.1109/tse.2007.256941
http://dx.doi.org/10.1007/s10515-010-0069-5
http://www.ndepend.com/docs/code-metrics#ILNestingDepth
http://www.ndepend.com/docs/code-metrics#ILNestingDepth
https://figshare.com/articles/Dataset_Too_Trivial_To_Test_/6194171
http://dx.doi.org/10.1109/tse.2005.49
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Pascarella L, Palomba F, Bacchelli A. 2018. Re-evaluating method-level bug prediction.
In: Proceedings of 25th International Conference on Software Analysis, Evolution and
Reengineering (SANER’18). Piscataway: IEEE, 592–601.

R Core Team. 2018. R: A language and environment for statistical computing. Version 3.5.1. Vienna:
R Foundation for Statistical Computing. Available at https://www.R-project.org/.

Rahman F, Devanbu P. 2011. Ownership, experience and effects: a fine-grained study of
authorship. In: Proceedings of 33rd International Conference on Software Engineering (ICSE’11).
New York: ACM, 491–500.

Scanniello G, Gravino C, Marcus A, Menzies T. 2013. Class level fault prediction using software
clustering. In: Proceedings of 28th International Conference on Automated Software Engineering
(ASE’13). Piscataway: IEEE, 640–645.

Shepperd M. 1988. A critique of cyclomatic complexity as a software metric. Software Engineering
Journal 3(2):30–36 DOI 10.1049/sej.1988.0003.

Shippey T, Hall T, Counsell S, Bowes D. 2016. So you need more method level datasets for your
software defect prediction? Voilà! In: Proceedings of 10th International Symposium on Empirical
Software Engineering and Measurement (ESEM’16). New York: ACM.

Shivaji S, Whitehead EJ, Akella R, Kim S. 2013. Reducing features to improve code
change-based bug prediction. IEEE Transactions on Software Engineering 39(4):552–569
DOI 10.1109/tse.2012.43.

Simon GJ, Kumar V, Li PW. 2011. A simple statistical model and association rule filtering for
classification. In: Proceedings of 17th International Conference on Knowledge Discovery and Data
Mining (SIGKDD’11). New York: ACM, 823–831.

Song Q, Shepperd M, Cartwright M, Mair C. 2006. Software defect association mining and
defect correction effort prediction. IEEE Transactions on Software Engineering 32(2):69–82
DOI 10.1109/tse.2006.1599417.

Torgo L. 2010. Data Mining with R, learning with case studies. Boca Raton: CRC Press.

Turhan B, Menzies T, Bener AB, Di Stefano J. 2009. On the relative value of cross-company and
within-company data for defect prediction. Empirical Software Engineering 14(5):540–578
DOI 10.1007/s10664-008-9103-7.

Weyuker EJ, Ostrand TJ. 2008. What can fault prediction do for you? In: Proceedings of 2nd
International Conference on Tests and Proofs (TAP’08). New York: Springer, 1–10.

Witten IH, Frank E, Hall MA, Pal CJ. 2016. Data mining: practical machine learning tools and
techniques. San Francisco: Morgan Kaufmann.

Xia X, Lo D, Pan SJ, Nagappan N, Wang X. 2016. Hydra: massively compositional model for
cross-project defect prediction. IEEE Transactions on Software Engineering 42(10):977–998
DOI 10.1109/tse.2016.2543218.

Xu Z, Liu J, Luo X, Zhang T. 2018. Cross-version defect prediction via hybrid active
learning with kernel principal component analysis. In: Proceedings of 25th International
Conference on Software Analysis, Evolution and Reengineering (SANER’18). Piscataway: IEEE,
209–220.

Zafar H, Rana Z, Shamail S, Awais MM. 2012. Finding focused itemsets from software defect data.
In: Proceedings of 15th International Multitopic Conference (INMIC’12). Piscataway: IEEE,
418–423.

Zhang H, Zhang X, Gu M. 2007. Predicting defective software components from code complexity
measures. In: Proceedings of 13th Pacific Rim International Symposium on Dependable
Computing (PRDC’07). Piscataway: IEEE, 93–96.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 27/28

https://www.R-project.org/
http://dx.doi.org/10.1049/sej.1988.0003
http://dx.doi.org/10.1109/tse.2012.43
http://dx.doi.org/10.1109/tse.2006.1599417
http://dx.doi.org/10.1007/s10664-008-9103-7
http://dx.doi.org/10.1109/tse.2016.2543218
http://dx.doi.org/10.7717/peerj-cs.187
https://peerj.com/computer-science/


Zhang F, Zheng Q, Zou Y, Hassan AE. 2016. Cross-project defect prediction using a connectivity-
based unsupervised classifier. In: Proceedings of 38th International Conference on Software
Engineering (ICSE’18). New York: ACM, 309–320.

Zimmermann T, Nagappan N. 2008. Predicting defects using network analysis on dependency
graphs. In: Proceedings of 30th International Conference on Software Engineering (ICSE’08).
Piscataway: IEEE, 531–540.

Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. 2009. Cross-project defect prediction:
a large scale experiment on data vs. domain vs. process. In: Proceedings of 7th Joint Meeting of
the European Software Engineering Conference and the Symposium on the Foundations of
Software Engineering (ESEC/FSE’09). New York: ACM, 91–100.

Zimmermann T, Premraj R, Zeller A. 2007. Predicting defects for eclipse. In: Proceedings of
3rd International Workshop on Predictor Models in Software Engineering (PROMISE’07).
Washington, D.C.: IEEE Computer Society, 9.

Niedermayr et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.187 28/28

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.187

	Too trivial to test? An inverse view on defect prediction to identify methods with low fault risk
	Introduction
	Association Rule Mining
	Related Work
	IDP Approach
	Empirical Study
	Discussion
	Conclusion
	flink8
	References


<<
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Warning
  /CompatibilityLevel 1.4
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile (None)
  /AlwaysEmbed [ true
  ]
  /NeverEmbed [ true
  ]
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages false
  /ColorImageDownsampleType /Average
  /ColorImageResolution 300
  /ColorImageDepth 8
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /FlateEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages false
  /GrayImageDownsampleType /Average
  /GrayImageResolution 300
  /GrayImageDepth 8
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /FlateEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages false
  /MonoImageDownsampleType /Average
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  >>
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  ]
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False

  /CreateJDFFile false
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken voor kwaliteitsafdrukken op desktopprinters en proofers. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents for quality printing on desktop printers and proofers.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  >>
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  ]
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /NoConversion
      /DestinationProfileName ()
      /DestinationProfileSelector /NA
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure true
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles true
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /NA
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /LeaveUntagged
      /UseDocumentBleed false
    >>
  ]
>> setdistillerparams
<<
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice