key: cord-0961840-eb18ths2
authors: Kaushik, Anupama; Kaur, Prabhjot; Choudhary, Nisha; Priyanka
title: Stacking regularization in analogy-based software effort estimation
date: 2022-01-03
journal: Soft comput
DOI: 10.1007/s00500-021-06564-w
sha: f65652255d5925a606d8a07f467ce6de28c2a452
doc_id: 961840
cord_uid: eb18ths2

Analogy-based estimation (ABE) estimates the effort of the current project based on the information of similar past projects. The solution function of ABE provides the final effort prediction of a new project. Many studies on ABE in the past have provided various solution functions, but its effectiveness can still be enhanced. The present study is an attempt to improve the effort prediction accuracy of ABE by proposing a solution function SABE: Stacking regularization in analogy-based software effort estimation. The core of SABE is stacking, which is a machine learning technique. Stacking is beneficial as it works on multiple models harnessing their capabilities and provides a better estimation accuracy as compared to a single model. The proposed method is validated on four software effort estimation datasets and compared with the already existing solution functions: closet analogy, mean, median and inverse distance weighted mean. The evaluation criteria used are mean magnitude of relative error (MMRE), median magnitude of relative error (MdMRE), prediction (PRED) and standard accuracy (SA). The results suggested that the SABE showed promising performance for almost all the evaluation criteria when compared with the results of the earlier studies.

Cost estimation is a methodology of forecasting the expense of executing a project with a given framework. A cost estimate is a summary of all costs involved, from commencement to culmination (duration of the project). This pays for every item required for the project, from supplies to manpower, and estimates a total amount deciding the cost of a project. Cost estimation can be utilized for determining the performance of a project. Accurate cost estimation leads to a successful project, and inaccurate estimations result in a project failure. Analogy-based estimation (ABE) is a cost-evaluation method used for software projects. It was commonly used and researched as an alternative to multiple regression mod- (Shepperd and Schofield 1997) .

Portraying the ABE approach in brief: Initially, the project to be evaluated is kept alongside the projects, similar in characteristics and present in historical archive. After that, one or more identical project is discovered from the archive by a predefined similarity function. At last, to generate the final estimation, heuristic function is applied to predict the effort of a new project based on the information of the retrieved projects.

In this paper, the authors proposed a method SABE (Stacking regularization in analogy-based software effort estimation). The authors utilized stacked generalization which is a prevalent concept related to any knowledge feeding scheme from one generalizer to another afore the final approximation is made (Wolpert 1992) . It is a machine learning technique which couples the capabilities of various heterogeneous models and provides better estimate than a single model. The two techniques used in designing SABE are polynomial regression and LASSO (Least Absolute Shrinkage and Selection Operator) regression. Polynomial regression is a specialized version of a linear regression in which a polynomial equation shows the relationship between the objective variable and the independent variables in data, and modeled with a curvi-linear relationship (Ostertagová 2012) . LASSO is a kind of regression analysis that uses shrinkage. It performs variable selection and regularization so as to increase the prediction accuracy and interpretability of the statistical model it generates (Tibshirani 1996) . More details on these models are described in the next section. Regression models have been long used in software estimation studies by various authors (Jørgensen 2004) , (Yücalar et al. 2016) , (Nassif et al. 2019) , (Anandhi and Chezian 2014) and provided good cost estimates. The machine learning approaches are employed not only for predictions in software engineering, but they are also utilized in various other fields such as text document clustering and COVID-19 predictions (Abualigah and Hanandeh 2015; Abualigah and Khader 2017; Abualigah et al. 2019 Abualigah et al. , 2021 The remaining portion of this paper is structured according to the following: Section 2 reviews the relevant research; Section 3 presents a succinct observation regarding the context of this study; Section 4 addresses the proposed Stacking regularization in analogy-based software effort estimation (SABE). Section 5 describes the dataset and evaluation criteria. Section 6 provides the experimental evaluation and results. Section 7 presents the statistical performance assessment of the proposed technique, and Sect. 8 finally concludes the paper.

A variety of research (Kumar et al. 2020) have been performed on numerous techniques that are acclimatized to estimate the software cost. In fuzzy analogy software effort estimation (FASEE), Ezghari and Zahi (Ezghari and Zahi 2018) suggested a new methodology to the uncertainty management. The proposal introduced consistency criteria in the estimation model in order to increase the data quality and infer a consistent possibility distribution, called Consistent fuzzy analogy software effort estimation(C-FASEE). Idri et al. (Idri et al. 2016) proposed a model using Missing Data (MD) technique. They explored MD strategies with classical analogy and fuzzy analogy. The researchers used three MD methods (toleration, elimination and imputation techniques for K-nearest neighbors), three missing mechanisms (MCAR: Missing completely at random, MAR: Missing at random, NIM: Non-ignorable missing) and MD percentages from 10 percent to 90 percent. Their findings revealed that regardless of MD technique, the data set used, missing mechanism or MD percentage, fuzzy analogy provided more accurate estimates in terms of the standard accuracy measure (SA) than the classical analogy. Their analysis showed that the utilization of k-nearest neighbors imputation may increase the predictability of analogy-based techniques than toleration or deletion. A model on analogy-based software effort estimation is proposed by Benala and Mall (Benala and Mall 2018) using differential evolution. They investigated the efficacy of the differential evolution (DE) algorithm by applying five mutation strategies to optimize the feature weights of homogeneous attribute functions of analogy-predicated estimation (ABE). The empirical analysis is designated as DABE: differential evolution in analogy-based software development effort estimation by them. Singh et al. (Singh et al. 2018a) presented an incipient variant of DE utilizing homeostasis adaption-based mutation operator (HABMO) and applied it for software cost estimation. They used mean magnitude of relative error (MMRE), mean magnitude of relative error relative to the estimate (MMER), mean squared error (MSE) and root mean squared error (RMSE) as the evaluation matrices. They found the proposed technique worked best in comparison with different variants of DE. Azzeh et al. (Azzeh et al. 2015) presented a model for analogybased effort estimation, entitled, An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation. In their paper, they investigated the potential of ensemble learning for variants of adjustment methods used in analogy-based effort estimation. Their results were subjected to statistical paramountcy testing, and showed consequential plausible amendments in predictive performance where ensemble methods were applied. As a case study, Effendi et al. (Effendi et al. 2019 ) utilized a student desk platform in their paper to expound a use case point method. They utilized the data from the actual software development in their work and adjusted the effort using the use case point process. The effort computed for three separate applications by using the use case point was compatible with the actual result. Idri et al. (Idri et al. 2016 ) used classical and fuzzy analogy ensembles for estimation of software development effort. They conducted the study on 100/60 variants of classical/fuzzy analogy techniques over seven datasets. Using the Scott-Knott statistical test, these variants were clustered and ranked. The results revealed that there was no strongest single classical or fuzzy analogy technique for all the datasets, and the ensembles that were built were conventionally better than the single technique. Phannachitta (Phannachitta 2020) proposed an innovative approach by introducing a combined effort adapter for analogy estimation. They cumulated gradient boosting machine algorithm and conventional adaptation technology based on productivity adjustment. They found their technique was at par in comparison with the already existing effort estimation techniques used in the study. Zima (Zima 2015) presented a case-based reasoning model of cost estimation at the preliminary stage of a construction project. In the paper, it was postulated that the benefits of the model presented were its flexibility and the ease of calculation. Singh et al. (Singh et al. 2018b) utilized Improved environmental adaption model with real parameter (IEAM-RP) to estimate the software effort. They used NASA dataset to evaluate their technique. The experimental results demonstrated the effectiveness of IEAM-RP. Suresh Kumar et al. (2021) proposed a gradient boosting regressor model. The model is compared with stochastic gradient descent, k-nearest neighbor, decision tree, bagging regressor, random forest regressor, Ada-boost regressor, and gradient boosting regressor. The authors evaluated the model using mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and R 2 . They showed the results on Cocomo81 and China datasets.

Above are the few experimental studies on software cost estimation and modern techniques are integrated from time to time. In this article, the authors have consequently endeavored to incorporate a software cost estimation methodology SABE (Stacking regularization in analogy-based software effort estimation). The SABE method has not been used up till now for analogy-based estimation as per the current knowledge of the authors.

Stacking (infrequently kenned as Stacked Generalization) is an ensemble algorithm of machine learning. It integrates the results from several models of machine learning. First proposed by David Wolpert (Wolpert 1992) in 1992, the key purpose is to minimize the error of generalization. As per his view, it is a more sophisticated variant of cross-validation.

The core concept behind the framework of stacking is to render forecasts utilizing one or more first-stage models and their estimates as feature, to match up to one or more secondlevel models. The principle of stacking is shown in Fig. 1. 

Polynomial regression is a form of linear regression, where the association between the independent x variable and the dependent y variable is formulated as a polynomial n th degree. The first proposal was created in 1815 (Stigler 1974; Gergonne 1974) for an experiment with polynomial regression. This is a special case of linear regression where the data with a curvilinear relationship between target variable and independent variables are compiled in a polynomial equation. The value of the target variable changes, with the predictor (s), in a curvilinear relationship uniformly. The fundamental objective of regression analysis is for modeling the expected value of the dependent variable y according to the value of the independent variable x. The authors utilized Eq.(1) in simple regression:

where y is the dependent variable on x, p is y intercept, q defines the slope, e is the rate of error. The above equation is conventionally model for n th term in equation 2:

As the regression function is linear in terms of unknown variables, these models are consequently linear from the estimation perspective.

The regularization is considered to make things regular or acceptable. It regularizes or decreases the coefficient toward zero. It is a methodology practiced by introducing an external penalty term to the error function to adjust the process. It makes the optimal solution unique. The supplemental term regulates the exceedingly shifting function to avert extreme values from being taken by the coefficients. This technique avoids the risk of overfitting and improves model interpretability.

LASSO regression is a regularization technique used over regression methods for accurate predictions. It was published in 1986 (Santosa and Symes 1986) in the literature of geophysics and afterward was rediscovered and popularized separately by Robert Tibshirani in 1996 (Tibshirani 1996 . LASSO regression implements L1 regularization that integrates a penalty equivalent to the absolute value of the coefficient's magnitude. This method of regularization will result in sparse models containing fewer coefficients; certain coefficients can be zero and omitted from the model. Larger penalties lead to coefficient values that are more proximate to zero, which is ideal for making simpler models. Equation 3 indicates the cost feature of LASSO regression.

where λ is the shrinkage amount:

1. When λ=0, no parameters are eliminated. The calculation is equivalent to that of linear regression. 2. When λ rises, the coefficients are gradually set to zero and discarded. 3. As there is an increase in λ, the bias grows. 4. As there is a decrease in λ, the variance grows. 

The analogous estimation is a method utilized for calculating specific parameters for future operation by utilizing historical data values. In analogy-based effort estimation, the effort of a new project is calculated based on the past development experience of similar projects. The ABE method broadly consists of a four-step procedure (Aamodt and Plaza 1994) :

1. First, extract the most similar cases to the current project from the historical database. 2. Second, reuse the knowledge of these similar projects in effort estimation of the current project. 3. Third, evaluate the solution function and find the effort of the current project. 4. Fourth, preserve the solution for future estimations.

The similarity degree is determined by the similarity function between the projects. Four similarity functions are utilized in this paper, namely the Euclidean distance, Manhattan distance, Jaccard distance and Minkowski distance. The predilection of similarity measure determines the option of k-nearest neighbors. The most acknowledged variant of computing distance is the Euclidean distance. Euclidean distance is:

where P 1 is the targeted project and P 2 is the historical project and n is the number of attributes. Distance from Manhattan which is shown in Eq.(5) is a metric in which the distance between the two points is the sum of their Cartesian coordinates' absolute differences, where P 1 is the targeted project and P 2 is the historical project and n is the number of attributes.

Coefficient of Jaccard distance formula is a metric used to explain the distinctions between the samples. It can be shown as (6):

The distance from Minkowski given in (7) is a simplified statistical version of Euclidean distance and Manhattan distance.

In the above equation, Minkowski distance between data record m and n is represented by d(m, n). j is the total number of variables of x, i is the variable's index, and α is the Minkowski metric order.

The number of k − most proximate neighbors is a critical parameter impacting the estimation of a new project. k = 1 to 5 is used in this article, as it is perceived in several researches (Benala and Mall 2018; Jørgensen et al. 2003; Huang and Chiu 2006) 

Once k most similar projects are chosen, certain statistics are estimated depending upon the chosen similar projects. This is called solution function, and it is used in the final forecast of a new project. During experimental analysis, the following assessment methods are utilized by the author as the substructure for the solution function: the closet analogy (most similar project) (Walkerden and Jeffery 1999), the mean of chosen similar projects (Shepperd and Schofield 1997) , the median of chosen similar projects (Angelis and Stamelos 2000) and the inverse distance weighted mean (Kadoda et al. 2000) . The mean is the average or a quantified "central" value of the efforts of k most homogeneous projects, where k > 1. The median is the median of the efforts of the most homogeneous projects, where k > 2 . When the number of the most similar projects increases, the median becomes a more reliable figure compared with the mean.

The inverse distance-weighted mean (I W M) indicates that the importance of more related projects is greater than the less homogeneous projects. It can be defined as (8):

where P represents the project being estimated, k is the total number of most similar projects, P k is the k th most similar Table 1 Difference table (Azzeh, 2011) Input Output

project, Sim(P, P k ) shows the similarity between projects P k and P, and C p k denotes the cost of the most similar project P k .

For decades, incorrect cost estimate of the project has plagued software developments. Weak forecasts have not only resulted in projects exceeding budget and timeline but also, in many cases, being entirely terminated. The quality to accurately estimate software development time, cost, and manpower, changes as more incipient methodologies supersede old ones. Consequently, an effective and precise software cost estimation model is highly required in software project management. The proposed framework SABE (Stacking regularization in analogy-based software effort estimation) is discussed in this section. The conventional analogy-based estimation approach is first considered to produce an unadjusted retrieval effort when a new target project comes to be predicted. The primary aim of the adjustment is to locate the 'update' that transforms the effort from the projects retrieved into the target effort.

In SABE framework, the model is fed with a series of projects. One project is kept as a test project and the remaining projects as historical projects. After applying the similarity functions as discussed in Sect. 3.4.1, the closest analogy projects are determined. These projects are now subjected to a solution function as explained in Sect. 3.4.3, and Stacking Regularization (SR) algorithm. The effort computed by both the procedures is then adapted to provide the final SABE solution function. The flowchart of SABE is given in Fig. 2 . The code for the proposed model and other methods of comparative adaptation are implemented in Python. The algorithm of effort prediction and adjustment through SABE is as follows:

Step 1 :

Start with Data Preprocessing.

Step 2 :

Project number P t is exempted from the dataset as a test project, and the remainder projects are considered as historic.

Step 3 :

The analogy projects P a most similar to the test project P t are retrieved using various similarity functions as discussed in 3.4.1 for each k-nearest neighbors (k = 1...5) . Equation (9) depicts the Euclidean distance.

where SM indicates measure of similarity, N is the count of predictor attributes, P t and P j are projects under investigation. This step is repeated with the rest of the similarity functions, i.e., Manhattan distance, Minkowski distance and Jaccard distance.

Step 4 :

The closest analog projects, i.e., P a , are utilized to perform two variety of operations indicated under step 4.1 and step 4.2.

Step 4.1(a) : Fed closest analogies to the solution function as discussed in Sect. 3.4.3 to find effort of the project to be estimated. The effort computed through one of the solution functions is depicted in Eq. (10).

where SM(P t , P a ) indicates measure of similarity between P t and P a , Si ze(P a ) is the size of the project P a as given in the dataset.

Step 4.1(b) : Make the difference table as shown in Table  1 (Azzeh 2011) , taking difference between the predicted effort using Eq.(10) and the test project effort along with all predictable attributes as shown in Table 1 .

where d k is the difference between t th project P t and its closest analogy P a at k th attribute, d e is the difference between t th project P t and its closest project P a at effort attribute.

Step 4.2 :

Provide the closest analogies to the Stacking Regularization (SR) algorithm, which constructed SABE dependent adaptation mechanism as illustrated in Fig. 4 to estimate E f f ort S R .

Step 5 :

The predicted difference table and estimated effort obtained from the SR algorithm is used to adapt and update the goal project effort as given in Eq. (13).

Step 6 : Final calculation of MRE, MMRE, MdMRE, PRED and SA is performed. Step7 :

End.

The process of SR algorithm is described as below:

Step 1 : Read each test project and its corresponding similar projects obtained using similarity functions for training.

Step 2 :

Set polynomial regression features as degree from 2 to 8.

Step 3 :

Set LASSO regression features as eps = 0.0001(Length of the path), n alphas = 1(number of alphas along the regularization path) and normalization=True.

Step 4 :

Perform stacking by first using polynomial regression as the first stage model for each test project and using its similar projects.

Step 5 :

Output from polynomial regression is fed to the second stage model of stacking, LASSO regression with cross-validations = 10 to get the final effort for each test project.

Step 6 :

Output from LASSO regularization is treated as predicted effort from SR algorithm. 

For the assessment and comparison of the precision of the analogy-based effort estimation model, the following prevalent evaluation criteria is employed. The magnitude of error correlated with estimated effort is kenned as absolute error (AE). It is the distinction between a certain project's estimated effort and actual effort. This is represented in Eq. (14),

where α i is the actual effort andα i is predicted effort.

As seen in Eq.(15), magnitude of relative error (MRE) measures the total error percentage to actual effort. It can be evaluated by dividing absolute error by α i which is actual effort.

MMRE is mean of magnitude of relative error. For the n number of projects, it is the average of MRE. MMRE can be interpreted in mathematical terms as in Eq. (16), where n is number of projects.

, the output predictor, is the percentage of predictions that fell inside the actual kenned value x, expressed in Eq. (18), where n is number of projects and D i can be calculated by:

MMRE and PRED are rendered on MRE. Because of their asymmetric distribution, the MRE-predicated quantifications have intrinsic drawbacks as performance metrics, biasing against presage model that they are. Therefore, MMRE and PRED are also underestimated prejudiced measures (Angelis and Stamelos 2000; Kadoda et al. 2000; Boehm 1984) . Another measure was also published, named as the mean absolute error (MAE), which is not prejudiced and does not show asymmetric distribution as MMRE does. The MAE (Gergonne 1974) is measured using an average of absolute errors as shown in Eq. (19):

MAE became challenging to decipher since its residuals are not standardized. Consequently, an incipient metric called standardized accuracy (SA) as given in Eq.(20) was stated by Shepperd and MacDonell (Shepperd and MacDonell 2012) . The SA ratio is specifically used to assess how a predictive model outperforms a random guessing baseline and engenders accurate estimates. If not, the model of prediction cannot be considered subsidiary. The SA value proximate to zero suggests poorer reliability, and a negative value is often deemed unsatisfactory.

where MAR (mean absolute residual) is calculated as:

MAR is the MAR of the proposed technique and M AR P 0 is the mean of MARs obtained through large runs of random guessing (Shepperd and MacDonell 2012) . m is the total number of projects.

The authors define in this segment the experimental setup that is utilized in the experiments. The authors have omitted the various consequences of the features as a preprocessing phase by normalizing in the interval [0, 1] utilizing the minmax normalization method (Kocaguneli and Menzies 2013 Table 3 for Cocomo dataset with Euclidean similarity, the best testing value for MMRE is found at k = 3, for MdMRE it is at k = 5, for PRED it is at k = 1, 4 and for SA it is at k = 5 and all at SABE solution function.

Few of the results are depicted graphically also. Figures 3, 4, 5 and 6 depict the performance of all the solution functions at k=3 and at Euclidean similarity. The graphs for Cocomo, NASA and Maxwell datasets show the performance of SABE is at par over all the evaluation criteria. In China dataset, SABE is performing best for MMRE and MdMRE criteria, and there is less variation for PRED and SA criteria. But as depicted from the results of Table 6 on China dataset, it can be seen that SABE is outweighing PRED and SA criteria for Jaccard, Manhattan and Minkowski similarity functions.

The researchers Benala and Mall (Benala and Mall 2018) contributed various ABE models, and they concluded that the best ABE configuration is found at Euclidean similarity, with k=3 and mean solution function. The authors have compared the performance of SABE with Benala and Mall models (Tables 13-15) (Benala and Mall 2018) on Cocomo81, NASA93 and China datasets. The proposed approach is compared with the six models (ABE, GA-ABE, For Cocomo81 dataset, the best MMRE test value is given by SADE-ABE which is 0.016, whereas the best MMRE test value for SABE is 0.0137 at Jaccard similarity and at k=3. In PRED, the best value is given by DABE-3 which is 0.816, for SABE the best PRED value is 1.00 at k=1,4 and using Euclidean similarity. For MdMRE, it is given by PSO-ABE which is 0.021, whereas for SABE the best value is 0.0157 using Manhattan similarity and at k=3. The best SA value is given by DABE-3 which is 98.940, whereas using SABE it is 99.75 at k=4 using Jaccard similarity.

For NASA93 dataset, the best MMRE test value is given by GA-ABE which is 0.009, whereas the best MMRE test value for SABE is 0.0211 at Minkowski similarity and at k=5. In PRED, the best value is given by ABE which is 0.839, for SABE the best PRED value is 1.00 at k=4, 5 and using Jaccard similarity. It is also found at k=4, with Manhattan similarity and at k=5 with Minkowski similarity. For MdMRE, it is given by GA-ABE which is 0.009, whereas for SABE the best value is 0.0089 using Euclidean similarity and at k=1. The best SA value is given by DABE-3 which is 94.234, whereas using SABE it is 99.22 at k=5 using Minkowski similarity.

For CHINA dataset, the best MMRE test value is given by PSO-ABE which is 0.010, whereas the best MMRE test value for SABE is 0.0116 at Jaccard similarity and at k=4. In PRED, the best value is given by ABE which is 0.868, for SABE the best PRED value is 1.00 at k=1,2,3,4,5 by using Euclidean, Manhattan and Minkowski similarity measures. For MdMRE, it is given by DABE-3 which is 0.039, whereas for SABE the best value is 0.0127 using Manhattan similarity and at k=3. The best SA value is given by DABE-3 which is 96.509, whereas using SABE it is 99.62 at k=5 using Manhattan similarity.

All the analyses performed above concluded that SABE is providing the best results for mostly all the models and evaluation criteria except for GA-ABE at MMRE for NASA93 dataset, and for PSO-ABE at MMRE for China dataset. But the best performance of SABE is not restricted to a single configuration. It is giving good results at different similarity measures and at different k values. This is further confirmed using statistical analysis. 

Statistical analysis helps us to find the appropriateness of one model to another (Kitchenham and Mendes 2009), (Mittas and Angelis 2008) . From the discussion in Sect. 6, it is evident that the SABE solution function is providing the best results but now using statistical analysis this will be further confirmed. Also it will help to find the best similarity function and k value to be chosen for SABE. The datasets for software cost estimation studies do not follow any particular distribution so nonparametric statistical tests are used (Kaushik et al. 2016) . All these tests are performed on KEEL (Knowledge Extraction based on Evolutionary Learning) tool (Alcalá-Fdez et al. 2011) , and the statistical analysis is done on MMRE metric. Initially, the authors compared all the solution functions using Friedman test (Hodges and Lehmann 2012) . This test computes the statistical differences among the solution functions. It provides the lowest rank to the best solution function. The Friedman test is computed by:

where q is the total number of techniques,M i is the total rank of the i th data-set, andM j is the total rank of the j th technique. This test is conducted for all the solution functions at k=3 with Euclidean similarity on all the four datasets. The authors have chosen k=3, for the Friedman test as it is considered as the optimal value and it is covering all the solution functions (Benala and Mall 2018) . The null hypothesis assumed that all the solution functions performed equally. The results of the Friedman test is given in Table 7 .

Here, N is the number of input datasets, the degree of freedom (d f ) is 3 as there are four solution functions to compare. The standard χ 2 value with 3 d f and significance value α = 0.05 is 7.815. The null hypothesis is rejected as the χ 2 value listed in the test statistic (Table 7) is more than 7.815 and the p − value is less than 0.05. So, it is deduced that all the solution functions are different. Fig. 7 shows the average ranks of all the solution functions as given by Friedman rank test. In this, the SABE function has the minimum rank of 1 so this is considered as the best solution function. Next, Holm post hoc test (Holm 1979 ) is conducted to find the differences among the techniques with SABE as the control method. Its test statistics is given in Table 8 .

As per the Holm test statistics, the hypothesis is rejected for mean and IWM but it is accepted for Median. This shows that there is not much significant difference between SABE and the median solution function. Now, these two solution functions are statistically validated using Wilcoxon signed rank test (Wilcoxon 1992) , which compares the two functions based on positive and negative differences of their ranks. The null hypothesis assumed here is the two techniques performed equally. The test statistics of the test is provided in Table 9 .

Here, R + shows the sum of ranks in which the first algorithm outperformed the second and R − shows the sum of ranks for the opposite. From the test statistics (Table 9) , the p-value is less than 0.05, so the null hypothesis is rejected, and SABE has outperformed median solution function for all the datasets. The authors also performed statistical analysis to know which is the best similarity function and the best k value for SABE. In order to find the best similarity function, the authors have chosen the best MMRE test values for all the datasets. First, the Friedman test is performed. The null hypothesis assumed that all the similarity functions performed equally. The results of Friedman test for similarity function are given in Table 10 .

Here, N is the number of input datasets, the degree of freedom (d f ) is 3 as there are four similarity functions to compare. The null hypothesis is accepted as the χ 2 value listed in the test statistic (Table 10) is less than 7.815 and the p-value is more than 0.05. So, it is deduced that all the similarity functions performed equally and there are no significant differences. Figure 8 shows the average ranks of all the similarity functions as given by Friedman rank test. It shows that the Minkowski similarity function has the lowest rank of 1.875, so this similarity function is taken as the control method. The statistical analysis is further performed to find the best k value for SABE. The results considered for analysis are MMRE test values for all the k values with Minkowski similarity function and SABE solution function. The Friedman test results for this are given in Table 11 . The hypothesis assumed here is all the k values performed equally. The degree of freedom (df) is 4 corresponding to 5 k values. The null hypothesis is accepted again as the χ 2 value listed in the test statistic (Table 11 ) is less than 9.488 and the p-value is more than 0.05. So, it is deduced that all the k values performed equally and there are no significant differences. The ranks as calculated by Friedman's test is depicted in Fig. 9 , and it shows that k=5 has the lowest rank of 2.5. So, k=5 can be consid-ered as the best value with Minkowski similarity function and SABE solution function.

In this work, Stacking regularization in analogy-based software effort estimation (SABE) is proposed for software projects. The technique SABE is a solution function which is evaluated alongside other solution functions like the closet analogy (CA), mean, median and the inverse distanceweighted mean (IWM). The technique is tested on four software estimation datasets, i.e., Cocomo, NASA, Maxwell and China. It is found that the SABE solution function provided the best results both in experimental evaluations as well as in statistical validations. The four similarity functions used are Euclidean, Jaccard, Manhattan and Minkowski. The k most proximate neighbors from 1 to 5 are considered in implementing the technique. The results of SABE are also compared with other techniques. It is found that SABE worked best maximum number of times for the evaluation criteria in comparison with models proposed by the earlier studies. The research has its own limitations too. The SABE technique uses stacking, which comes with its own price. Stacking involves training multiple base-models to predict the target variable. So, the stacked models take longer time to train than simpler models and require more memory. The choice of the base models used can also affect the results of the proposed technique. The base models used in the study are polynomial regression and lasso regression. The future work can include replacing these models with other base models which will take less time to train and provide more 

Conflict of interest All the authors declare that there is no conflict of interest in publishing this paper.

Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors. 

Case-based reasoning: foundational issues, methodological variations, and system approaches

A novel evolutionary arithmetic optimization algorithm for multilevel thresholding segmentation of covid-19 ct images

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

Applying genetic algorithms to information retrieval using vector space model

Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework

Regression techniques in software effort estimation using cocomo dataset

A simulation tool for efficient analogy based cost estimation

Model tree based adaption strategy for software effort estimation by analogy

An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation

Dabe: Differential evolution in analogy-based software development effort estimation

Software engineering economics

Adjustment factor for use case point software effort estimation (study case: student desk portal)

Uncertainty management in software effort estimation using a consistent fuzzy analogy-based method

The application of the method of least squares to the interpolation of sequences

Rank methods for combination of independent experiments in analysis of variance

A simple sequentially rejective multiple test procedure

Optimization of analogy weights by genetic algorithm for software effort estimation

Improved estimation of software development effort using classical and fuzzy analogy ensembles

Regression models of software development effort estimation accuracy and bias

Software effort estimation by analogy and "regression toward the mean

Integrating firefly algorithm in artificial neural network models for accurate software cost predictions

Why comparative effort prediction studies may be invalid

Software effort models should be assessed via leave-one-out validation

Advancement from neural networks to deep learning in software effort estimation: Perspective of two decades

Comparing cost prediction models by resampling techniques

Software development effort estimation using regression fuzzy models

Modelling using polynomial regression

On an optimal analogy-based software effort estimation

Linear inversion of band-limited reflection seismograms

Evaluating prediction systems in software project estimation

Estimating software project effort using analogies

Differential evolution using homeostasis adaption based mutation operator and its application for software cost estimation

Gergonne's 1815 paper on the design and analysis of polynomial regression experiments

Regression shrinkage and selection via the lasso

An empirical study of analogy-based software effort estimation

Individual comparisons by ranking methods

Stacked generalization

Regression analysis based software effort estimation method

The case-based reasoning model of cost estimation at the preliminary stage of a construction project

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations