key: cord-0071387-1o1uyo70 authors: Ntakolia, Charis; Kokkotis, Christos; Karlsson, Patrik; Moustakidis, Serafeim title: An Explainable Machine Learning Model for Material Backorder Prediction in Inventory Management date: 2021-11-27 journal: Sensors (Basel) DOI: 10.3390/s21237926 sha: c9c1295894111d1cbd911174cc1fc30683e44419 doc_id: 71387 cord_uid: 1o1uyo70 Global competition among businesses imposes a more effective and low-cost supply chain allowing firms to provide products at a desired quality, quantity, and time, with lower production costs. The latter include holding cost, ordering cost, and backorder cost. Backorder occurs when a product is temporarily unavailable or out of stock and the customer places an order for future production and shipment. Therefore, stock unavailability and prolonged delays in product delivery will lead to additional production costs and unsatisfied customers, respectively. Thus, it is of high importance to develop models that will effectively predict the backorder rate in an inventory system with the aim of improving the effectiveness of the supply chain and, consequentially, the performance of the company. However, traditional approaches in the literature are based on stochastic approximation, without incorporating information from historical data. To this end, machine learning models should be employed for extracting knowledge of large historical data to develop predictive models. Therefore, to cover this need, in this study, the backorder prediction problem was addressed. Specifically, various machine learning models were compared for solving the binary classification problem of backorder prediction, followed by model calibration and a post-hoc explainability based on the SHAP model to identify and interpret the most important features that contribute to material backorder. The results showed that the RF, XGB, LGBM, and BB models reached an AUC score of 0.95, while the best-performing model was the LGBM model after calibration with the Isotonic Regression method. The explainability analysis showed that the inventory stock of a product, the volume of products that can be delivered, the imminent demand (sales), and the accurate prediction of the future demand can significantly contribute to the correct prediction of backorders. Backorder occurs when a product is temporarily unavailable or out of stock and the customer places an order for future production and shipment [1] . Backorders are noticed mainly in case of product unavailability due to excessive demand or future release on the market. For instance, COVID-19 and lockdown measures raised the need for antiseptic products and indoor domestic activities that led to a mass wave of online purchases. This trend led to the bullwhip effect, or else the Forrester effect, for many industries and companies that had not succeeded on predicting the increase demand. The stock of products proved insufficient; however, due to products' low availability and alternative solutions, the customers were willing to wait for their order. Another recent example is Sensors 2021, 21, 7926 2 of 12 the early announcement of the upcoming release in the market of a new product from a famous company. In that case, the company accepts backorders from customers, since the initial production quantity will be insufficient to cover the expected demand for the popular product [1] . The ability of a company to address backorders impacts significantly the company's revenue, share market price, and customers' trust [2] . The backorders of products play a crucial role in the management of the inventory, since they affect the total production costs of the whole supply chain. In the literature, various studies addressing the Economic Order Quantity (EOQ) and Economic Production Quantity (EPQ) have been published, taking into account backorders [3] [4] [5] [6] . These approaches include: (i) the coordination and minimization of total costs of the supply chain with backorders [7, 8] among other factors, such as with stochastic supply distribution [9, 10] ; (ii) the inventory problem addressed with backorders [11] [12] [13] [14] and safety stocks [15] , multi-objective optimization formulations with fixed backorder, and timeweighted backorder [16] , stochastic demand and price discount [17, 18] , the integration of human errors [19] , customers' preferences [20] , or customers' behavior [21] , and from the energy-efficient perspective with the aim to minimize carbon emissions [22] ; (iii) fuzzy logic to model the demand or the order quantity for finding the optimal stock quantity [23] [24] [25] ; (iv) heuristic approaches for optimizing the inventory systems [26] [27] [28] . Due to the importance of backorders and their impact on the whole supply chain costs, studies have been focused on the prediction of inventory backorders. To address the issue of backorders prediction, artificial intelligent techniques have been employed to deal with imbalanced data issues, since the number of products going on backorder is much lower than that of those that are on stock [29] . A machine learning approach was proposed [30] to maximize the expected profit of backorder decisions by integrating the profit-based measure into the prediction model and optimizing the decision threshold. In this context, various machine learning models were evaluated, such as Logistic Regression (LR) and k-Nearest Neighbor (KNN) classifiers, Decision Tree, Support Vector Machine (SVM), and Multi-Layer Neural Network (NN). Another machine learning approach based on Distributed Random Forest and Gradient Boosting Machine learning techniques was presented for predicting the probable backorder scenarios in the supply chain [2] . Unsupervised learning was used to predict backorders by using Deep Autoencoder [29] . Deep neural networks for imbalanced data were proposed for backorder prediction [1] . A case study on the Danish Craft Beer Breweries was presented by using machine learning models for predicting the backorders [31] . The above studies employed machine learning methods to address the backorder prediction problem, whether a product would be backordered or not. However, none of these aimed to explain and interpret the impact of features on the prediction output. To this end, this study focused on developing an explainable machine learning pipeline for: (i) evaluating the performance of well-known machine learning models to predict backorders as a binary classification problem and (ii) interpreting the results by using a post-hoc explainability model (SHAP) on the best performing model. Special notice was given to the treatment of the imbalanced dataset by using an undersampling technique. In this study, the publicly available dataset 'Predict Product Backorders. Can you predict product back orders?' (https://www.kaggle.com/c/untadta/data (accessed on 1 October 2021)), that was initially created for a competition, was used. In total, 23 features are included in the dataset. Out of the 23 features given in the dataset (Table 1) , 15 are numerical, and 8 (including the target variable "went on back order") are categorical features. The data consisted of 9714 products that were backordered and 1,038,860 that were not. The presented dataset was used in a machine learning (ML) pipeline to predict possible backorders in the inventory management system. The steps integrated in the ML pipeline were the following ( Figure 1 ): (i) data preprocessing to handle the missing data and the categorical values; (ii) feature selection via a state-of-the-art method, called Boost-ARoota [32, 33] ; (iii) a comparative evaluation of popular machine learning models, such as Random Forest (RF), LightGBM (LGBM), XGBoost (XGB), Balanced Blagging (BB), Neural Networks (NN), Logistic Regression (LR), Support Vector Machines (SVM), and K-Nearest Neighbors (KNN); (iv) an explainability analysis with the use of the SHAP model applied to the best-performing prediction model in (iii). numerical, and 8 (including the target variable "went on back order") are categorical features. The data consisted of 9714 products that were backordered and 1,038,860 that were not. The presented dataset was used in a machine learning (ML) pipeline to predict possible backorders in the inventory management system. The steps integrated in the ML pipeline were the following ( For the preprocessing of the dataset, we deleted the rows with missing values, so that we had the maximum possible real information. In addition, we normalized the data set For the preprocessing of the dataset, we deleted the rows with missing values, so that we had the maximum possible real information. In addition, we normalized the data set to [0,1]. Finally, to address the problem of imbalanced data, we reduced the samples of the majority class to reach the number of samples in the minority class. Regarding the feature selection process, the state-of-the-art selection method BoostA-Roota was used. It is a fast XGBoost wrapper feature selection algorithm that follows the Recursive Feature Elimination approach. It operates similarly to Boruta utilizing XGBoost as the base model. BoostARoota returns an optimal subset of features by eliminating up to 10% of the initial set of features. Its effectiveness has been proven in various applications [32, 33] . A 10-fold cross validation was performed for the selection of the most important features. The comparative evaluation included 8 popular and commonly used classifiers, such as Random Forest (RF) [34] , K-Nearest Neighbor (KNN) [35] , Neural Networks (NN) [36] , Logistic Regression (LR) [37] , Balanced Blagging (BB) [38] , Support Vector Machines (SVM) [39] , XGBoost [40] , and LightGBM [41] . A 70/30% validation strategy was employed to generate the training and testing sets with an integrated cross validation strategy that employed grid search for the hyperparameter tunning to avoid overfitting and bias error. In Table 2 , a description of the employed hyperparameters is presented. For the performance evaluation of the models, the accuracy, recall, f1-score, precision, AUC metrics were used. Following the results from the validation of the models, the classifiers with similar performance were calibrated to increase their performance and identify the best one. Calibration is a post-processing operation, which improves the probability estimation of a model [42, 43] . To calibrate the models, the Platt Scaling (sigmoid) [44] and Isotonic Regression [45, 46] (isotonic) methods were adopted. A post-hoc explainability was finally applied on the best performing model based on the SHapley Additive exPlanations (SHAP) model to explain the predictive model and the contribution of the most important features. SHAP is a game theory approach typically employed to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions [47] [48] [49] . In this section, the results of each step of the ML methodology are presented. The BoostARoota algorithm selected the following features as important in random order of appearance: Sensors 2021, 21, 7926 5 of 12 In this subsection, the results from the comparative evaluation of the ML models are presented. Table 3 shows the best metric scores of each classifier used in this study with their hyperparameters tuning. Furthermore, the roc curves and AUC scores are presented in Figure 2 . Here, the results from the calibration process are shown. Figure 3 depicts the calibration plots for each of the best performing models that achieved similar performance (RF, LGBM, XGB, and BB). In each plot, the perfectly calibrated line (dot line), the initial model, and the model calibrated with the Platt's method (sigmoid) and the Isotonic Regression (isotonic) method are presented. For each model, the best calibrated one that best fitted the dot line was then used for a comparison, illustrated in Figure 4 . Figure 5 shows the Roc curve of the best overall calibrated model (LGBM + Isotonic). Here, the results from the calibration process are shown. Figure 3 depicts the calibration plots for each of the best performing models that achieved similar performance (RF, LGBM, XGB, and BB). In each plot, the perfectly calibrated line (dot line), the initial model, and the model calibrated with the Platt's method (sigmoid) and the Isotonic Regression (isotonic) method are presented. For each model, the best calibrated one that best fitted the dot line was then used for a comparison, illustrated in Figure 4 . Figure 5 shows the Roc curve of the best overall calibrated model (LGBM + Isotonic). In this section, the results of the SHAP analysis are presented. Figure 6 illustrates the summary plot of LGBM calibrated with the Isotonic Regression method, while in Figure 7 , the beeswarm plot for the backordered class is shown. Furthermore, Figures 8 and 9 show two examples for products that were classified correctly as backordered and non-backordered, respectively. In this section, the results of the SHAP analysis are presented. Figure 6 illustrates the summary plot of LGBM calibrated with the Isotonic Regression method, while in Figure 7 , the beeswarm plot for the backordered class is shown. Furthermore, Figure 8 and Figure 9 show two examples for products that were classified correctly as backordered and nonbackordered, respectively. In this section, the results of the SHAP analysis are presented. Figure 6 illustrates the summary plot of LGBM calibrated with the Isotonic Regression method, while in Figure 7 , the beeswarm plot for the backordered class is shown. Furthermore, Figure 8 and Figure 9 show two examples for products that were classified correctly as backordered and nonbackordered, respectively. In this section, the results of the SHAP analysis are presented. Figure 6 illustrates the summary plot of LGBM calibrated with the Isotonic Regression method, while in Figure 7 , the beeswarm plot for the backordered class is shown. Furthermore, Figure 8 and Figure 9 show two examples for products that were classified correctly as backordered and nonbackordered, respectively. The BoostARoota feature selection method selected 17 out of 23 features that formed the initial dataset. These features were used to form the final dataset for training, validation, and testing of the proposed ML pipeline in this study. They were relevant to inventory stock, transit information, sales forecast, and sales quantity (Section 3.1). Eight machine learning models were used for a comparative evaluation (Table 3) . Among these models RF, XGB, LGBM, and BB presented similar performance based on the AUC score (0.95, Figure 2 ). Specifically, Table 3 summarizes the metric scores, the confusion matrixes, and the selected hyperparameters of the employed ML models for this binary problem. The majority of the employed classifiers achieved accuracy up to 88.85% in comparison with KNN, LR, and SVM which achieved lower accuracy (up to 75.93%). The RF, XGB, LGBM, and BB models also achieved high performances in the remaining metrics such as recall (up to 90.69%), f1-score (up to 89.12%), and precision (up to 88.10%) scores. From the confusion matrixes of the aforementioned ML models, it turned out that the ML models work satisfactorily in this task. The RF, XGB, LGBM, and BB models that achieved comparative performance were calibrated based on isotonic and sigmoid methods. Figure 3 illustrates the initial models and their calibration with the aforementioned calibration methods. The results showed that for RF, XGB, and LGBM, the calibration with the Isotonic Regression method reached better results, while for the BB classifier, the Platt Scaling (Sigmoid) method presented better performance. From the comparative evaluation, depicted in Figure 4 , the LGBM classifier calibrated with the Isotonic Regression method presented the best overall performance, as it is asymptotically closer to the dotted line that represents the perfectly calibrated model (Figure 4 and Figure 5 ). Figure 6 presents the features' impact on the output of the best model (LightGBM + Isotonic) for the proposed dataset. The features were sorted by the mean absolute value of the SHAP values which represent the SHAP global feature importance. Furthermore, the most important features that significantly affected the prediction output of the model were the national_inv, the in_transit_qty, the forecast_3_month, the sales_1_month, and the forecast_6_month. The national inv concerns the current inventory level of components. The feature in transit qty describes the quantity in transit, and the sales_1_month concerns the sales quantity in the prior month. The features forecast_3_month and fore-cast_6_month relate to the forecast sales for the next 3 and 6 months. Figure 8 shows that the topmost influential features n_bank, perf_6_month_avg, in transit, national_inv, sales_1_month, forecast_6_month, sales_3_month, and loval_bo_qty led to the prediction value of 0.35, which was transformed to 1. The features that are indicated with red color influenced positively, which means that they dragged the value closer to 1, while the features in blue color had the opposite effect. Similarly, for an example of a backordered product, Figure 9 shows the values of the top influential features that pushed the product to the backordered class. It is observed that lower values of inventory stock, products' quantity received, and source performance of the last 6 months and higher values of forecasts and sales pushed the output prediction to the non-backordered class. To interpret the results from a managerial perspective based on the beeswarm plot illustrated in Figure 7 , a product with low stock and high short-term and mid-term future demand will probably be backordered, since the inventory stock will not be able to satisfy The BoostARoota feature selection method selected 17 out of 23 features that formed the initial dataset. These features were used to form the final dataset for training, validation, and testing of the proposed ML pipeline in this study. They were relevant to inventory stock, transit information, sales forecast, and sales quantity (Section 3.1). Eight machine learning models were used for a comparative evaluation (Table 3) . Among these models RF, XGB, LGBM, and BB presented similar performance based on the AUC score (0.95, Figure 2 ). Specifically, Table 3 summarizes the metric scores, the confusion matrixes, and the selected hyperparameters of the employed ML models for this binary problem. The majority of the employed classifiers achieved accuracy up to 88.85% in comparison with KNN, LR, and SVM which achieved lower accuracy (up to 75.93%). The RF, XGB, LGBM, and BB models also achieved high performances in the remaining metrics such as recall (up to 90.69%), f1-score (up to 89.12%), and precision (up to 88.10%) scores. From the confusion matrixes of the aforementioned ML models, it turned out that the ML models work satisfactorily in this task. The RF, XGB, LGBM, and BB models that achieved comparative performance were calibrated based on isotonic and sigmoid methods. Figure 3 illustrates the initial models and their calibration with the aforementioned calibration methods. The results showed that for RF, XGB, and LGBM, the calibration with the Isotonic Regression method reached better results, while for the BB classifier, the Platt Scaling (Sigmoid) method presented better performance. From the comparative evaluation, depicted in Figure 4 , the LGBM classifier calibrated with the Isotonic Regression method presented the best overall performance, as it is asymptotically closer to the dotted line that represents the perfectly calibrated model (Figures 4 and 5) . Figure 6 presents the features' impact on the output of the best model (LightGBM + Isotonic) for the proposed dataset. The features were sorted by the mean absolute value of the SHAP values which represent the SHAP global feature importance. Furthermore, the most important features that significantly affected the prediction output of the model were the national_inv, the in_transit_qty, the forecast_3_month, the sales_1_month, and the forecast_6_month. The national inv concerns the current inventory level of components. The feature in transit qty describes the quantity in transit, and the sales_1_month concerns the sales quantity in the prior month. The features forecast_3_month and forecast_6_month relate to the forecast sales for the next 3 and 6 months. Figure 8 shows that the topmost influential features n_bank, perf_6_month_avg, in transit, national_inv, sales_1_month, forecast_6_month, sales_3_month, and loval_bo_qty led to the prediction value of 0.35, which was transformed to 1. The features that are indicated with red color influenced positively, which means that they dragged the value closer to 1, while the features in blue color had the opposite effect. Similarly, for an example of a backordered product, Figure 9 shows the values of the top influential features that pushed the product to the backordered class. It is observed that lower values of inventory stock, products' quantity received, and source performance of the last 6 months and higher values of forecasts and sales pushed the output prediction to the non-backordered class. To interpret the results from a managerial perspective based on the beeswarm plot illustrated in Figure 7 , a product with low stock and high short-term and mid-term future demand will probably be backordered, since the inventory stock will not be able to satisfy the customers' demand, and at the same time, the expected quantity of products to be delivered to the inventory is also low (Figure 7) . Therefore, it is shown that an optimal management of an inventory system that can handle and prevent the forthcoming backorders of products incorporates: (i) accurate predictions on future demands of products, so appropriate decisions can be made on the inventory stock of the product and on product production on time; (ii) increase of the products' quantity in transit and/or decrease of the transit time by re-scheduling on time the transportation planning and logistics; (iii) the product performance, which means that if the product's quality satisfies the customers' requirements, the demand of this product is expected to be increased. Businesses target to increase their profit by retaining low production costs trying in parallel to provide quality service for customer satisfaction. An important part of the production costs is related to the inventory management system. Therefore, it is of high importance to effectively and accurately predict various issues that could occur, leading to additional costs and causing a negative impact on the inventory management system and business operation. One of these issues is product backorder. When a product is backordered, the production should be rescheduled in order to address the demand. This adds additional costs to the business operation. To deal with the backorder issue, this study considered two key aspects: (i) the development of an accurate prediction model for product backorder via a comparative evaluation of popular classifiers and model calibration, and (ii) a post-hoc analysis to explain and interpret the major contributing factors that lead to product backorder. Specifically, this study tackled the problem of predicting products that will be backordered in an inventory management system. This problem is usually evaluated as a highly imbalanced binary classification problem. Due to the large volume of data, an under-sampling approach was initially adopted to solve this issue. A machine learning pipeline, based on a comparative evaluation of eight popular classifiers, was then adopted, followed by a calibration process applied to the models with similar performance and an explainability analysis of the best-performing model. The results showed that four models achieved almost comparable performance based on AUC scores and other metrics (Table 3) . Specifically, the RF, XGB, LGBM, and BB models reached an AUC score of 0.95 ( Figure 2 ). These models were calibrated with the Platt's and Isotonic Regression methods. The LGBM model calibrated with the Isotonic Regression method presented a slightly better calibration for our data (Figure 4 ). For this model, post hoc explainability based on the SHAP model showed that the features most contributing to the prediction output of the model relevant to the current inventory level of the component were the quantity in transit and the short-term and mid-term sales quantity and forecast sales ( Figure 6 ). Backorders impact the costs that are linked to production, since the production schedule should be altered in order to deal with the demand of backordered products. Therefore, from the above analysis, it is shown that the decisions that will be made regarding the inventory stock of a product can significantly contribute to the optimal operation of an inventory management system. This decision should be made based on the volume of products that can be delivered, the imminent demand (sales), and the accurate prediction of the future demand. A limitation of this study is the use of resampling techniques to cope with imbalanced data. To this end, future work will include the use of Siamese neural networks that have proven effective in case of imbalanced datasets. Author Contributions: Conceptualization, C.N.; methodology, C.N., C.K. and S.M.; software, C.N., C.K.; validation, C.N., C.K. and S.M.; formal analysis, C.N., C.K. and S.M.; data curation, C.N. and C.K.; writing-original draft preparation, C.N. and C.K.; writing-review and editing, C.N., P.K. and S.M.; visualization, C.N. and C.K.; supervision, C.N.; project administration, S.M. and P.K. All authors have read and agreed to the published version of the manuscript. The dataset used in this study can be found at https://www.kaggle. com/c/untadta/data (accessed on 23 November 2021). Product Backorder Prediction Using Deep Neural Network on Imbalanced Data Prediction of Probable Backorder Scenarios in the Supply Chain Using Distributed Random Forest and Gradient Boosting Machine Learning Techniques Economic Order Quantity Model for Deteriorating Items with Planned Backorder Level Generalized EOQ Formula Using a New Parameter: Coefficient of Backorder Attractiveness. Symmetry Inventory Model for Deteriorating Items with Time and Reliability Dependent Demand and Partial Backorder Survey on Inventory Model of EOQ & EPQ with Partial Backorder Problems. Mater. Today Proc. 2019 Supply Chain Coordination with Variable Backorder, Inspections, and Discount Policy for Fixed Lifetime Products Inter-Dependent Lead-Time and Ordering Cost Reduction Strategy: A Supply Chain Model with Quality Control, Lead-Time Dependent Backorder and Price-Sensitive Stochastic Demand Derivation of Closed-Form Expression for Optimal Base Stock Level Considering Partial Backorder, Deterministic Demand, and Stochastic Supply Disruption Periodicity of World Crude Oil Maritime Transportation: Case Analysis of Aframax Tanker Market Quality Improvement and Backorder Price Discount under Controllable Lead Time in an Inventory Model Optimal Production and Inventory Control of Multi-Class Mixed Backorder and Lost Sales Demand Class Models On the Decomposition Property for a Dynamic Inventory Rationing Problem with Multiple Demand Classes and Backorder Quantifying Sustainable Control of Inventory Systems with Non-Linear Backorder Costs Optimum Ordering Policy for an Imperfect Single-Stage Manufacturing System with Safety Stock and Planned Backorder Multi-Objective Optimization of Hybrid Backorder Inventory Model Multi-Item Inventory Model with Variable Backorder and Price Discount under Trade Credit Policy in Stochastic Demand An Integrated Imperfect Production-Inventory Model with Optimal Vendor Investment and Backorder Price Discount The Effect of Human Errors on an Integrated Stochastic Supply Chain Model with Setup Cost Reduction and Backorder Price Discount Mitigating Partial-Disruption Risk: A Joint Facility Location and Inventory Model Considering Customers' Preferences and the Role of Substitute Products and Backorder Offers Two Backorder Compensation Mechanisms in Inventory Systems with Impatient Customers Optimum Sustainable Inventory Management with Backorder and Deterioration under Controllable Carbon Emissions Decision of a Fuzzy Inventory with Fuzzy Backorder Model Under Cloudy Fuzzy Demand Rate A Study of a Backorder EOQ Model for Cloud-Type Intuitionistic Dense Fuzzy Demand Rate A Periodic Review Inventory Model with Controllable Lead Time and Backorder Rate in Fuzzy-Stochastic Environment A Computational Model for Determining Levels of Factors in Inventory Management Using Response Surface Methodology A Novel Modeling Approach for a Capacitated (S,T) Inventory System with Backlog under Stochastic Discrete Demand and Lead Time Solving Order Planning Problem Using a Heuristic Approach: The Case in a An Un-Supervised Approach for Backorder Prediction Using Deep Autoencoder A Profit Function-Maximizing Inventory Backorder Prediction System Using Big Data Analytics Backorder Prediction Using Machine Learning For Danish Craft Beer Breweries Feature Selection Using ModifiedBoostARoota and Prediction of Heart Diseases Using Gradient Boosting Algorithms Sepsis Prediction in Intensive Care Unit Using Ensemble of XGboost Models A Random Forest Guided Tour A Brief Review of Nearest Neighbor Algorithm for Learning and Classification|IEEE Conference Publication|IEEE Xplore Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review Applied Logistic Regression Actively Balanced Bagging for Imbalanced Data Support Vector Machines for Classification Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost LightGBM: A Highly Efficient Gradient Boosting Decision Tree On the Effect of Calibration in Classifier Combination Improving the Decision Support in Diagnostic Systems Using Classifier Probability Calibration On the Appropriateness of Platt Scaling in Classifier Calibration Smooth Isotonic Regression: A New Method to Calibrate Predictive Models Binary Classifier Calibration Using an Ensemble of Near Isotonic Regression Models Analysis of Regression in Game Theory Approach On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems We would like to thank the authors for the public availability of the dataset (https://www.kaggle.com/c/untadta/data (accessed on 1 October 2021)) used in this study. The authors declare no conflict of interest.