key: cord-0437335-odgxwouq authors: Hasan, Md Rashidul; Kabir, Muntasir A; Shuvro, Rezoan A; Das, Pankaz title: A Comparative Study on Forecasting of Retail Sales date: 2022-03-14 journal: nan DOI: nan sha: 29633fe67b914c1c15f052bee3e1deea52f1a7bf doc_id: 437335 cord_uid: odgxwouq Predicting product sales of large retail companies is a challenging task considering volatile nature of trends, seasonalities, events as well as unknown factors such as market competitions, change in customer's preferences, or unforeseen events, e.g., COVID-19 outbreak. In this paper, we benchmark forecasting models on historical sales data from Walmart to predict their future sales. We provide a comprehensive theoretical overview and analysis of the state-of-the-art timeseries forecasting models. Then, we apply these models on the forecasting challenge dataset (M5 forecasting by Kaggle). Specifically, we use a traditional model, namely, ARIMA (Autoregressive Integrated Moving Average), and recently developed advanced models e.g., Prophet model developed by Facebook, light gradient boosting machine (LightGBM) model developed by Microsoft and benchmark their performances. Results suggest that ARIMA model outperforms the Facebook Prophet and LightGBM model while the LightGBM model achieves huge computational gain for the large dataset with negligible compromise in the prediction accuracy. Large retail companies like Walmart, Costco, Amazon, Target, and others have a unique business model where they sell their own products and competitors' products from the same store, either in-store or online. In some cases, their own products compete against a third party product. These companies are a warehouse of massive volume of data, which they store from multiple streams (from transactions, events, inventory to name the few). Companies like Amazon, Walmart help the third party retailers with analytic support from the time series data they acquire from each product and category, as well as gain valuable insight from the data to maximize their business gain. The have a tremendous amount of transactions of multiple products each day. Analyzing such large volume of data and extracting meaningful insight is always a challenge. However, with the advent of cloud computing in recent years analyzing these gigantic data in a small time frame to solve business problems such as predicting future sales and future demand, product recommendation and so forth have become key to success for every company. As companies scale, forecasting is become an integral and critical part of the business value chain. For example, Walmart can use their historical data to predict the future event sales such as back-to-school, Halloween or Christmas. Moreover, they can use their sales data to have appropriate inventory for perishable products like dairy, bakery and frozen foods. There are certain key aspects to time-series data, namely, trend, seasonality, and noise. For example, during the COVID outbreak in March 2020, people stocked necessary stuffs e.g., bath tissue, which generated significant sales spike in March and eased out in the successive months. If these sudden spike is sales are not identified as noise, models might predict significant demand for bath tissues in the same time next year, which would be catastrophic to business and inventory. Furthermore, certain products show strong seasonalities. For example, Christmas tree sales spike during Christmas not every month of the year. On the other hand, some products might have growth and decline in demand. For example, movie DVD sales are at a decline due to online streaming services and organic produce sales are at an increase. If these trends, either growth or decline, cannot be spotted in time, it might create scarcity (surplus) in inventory in future. Hence, the importance of analyzing product data and the effective translation of the analytics to business has always been a pivotal strength of successful conglomerates, especially in current age. In this paper, we use time-series data obtained from M5 forecasting-accuracy Kaggle to predict future sales of Walmart products for subsequent 28 days. The dataset was prepared using store sales in three different geographic location in the United States, (Midwest: Wisconsin, West: California, and South: Texas). The data includes item level, department, product categories, and store details Kaggle. The dataset includes 5 years of product sales data including variables such as price, promotions, day of the week, and special events. We classify the models used for our analysis in three types. (1) Statistical models e.g., Autoregressive Integrated Moving Average (ARIMA) Box et al. [2015] (2) gradient boosting based models e.g., light gradient boosting model (LightGBM) Ke et al. [2017] , (3) additive models such as Facebook Prophet model Taylor and Letham [2017] , which are described below. ARIMA is the natural extension of Auto regressive and Moving Average models Box et al. [2015] . Along with all the feature and advantages, ARIMA allows to model a non stationary series to a stationary series by taking a sequence of differences. By integrating this differences ARIMA became the more robust to use with any data set and benchmark to compare with other forecasting algorithm. ARIMA essentially can add the difference repeatedly in order to reduce a non-stationary data to stationary. By incorporating with AR order, differencing, and MA order, ARIMA becomes more flexible and robust for forecasting time series data. In addition, by adding the seasonal component to ARIMA model, it becomes an extension of base model that supports the direct modeling with seasonal components. Here we also apply Facebook prophet which is specially designed to forecast time series data. It has been so far a good competitor to other time series models for several reasons. One, since it is using generalized additive model, it can effortlessly add different variables in the model such as national holidays or events, event types, daily, weekly, biweekly, monthly, quarterly, and yearly trend. In this analysis, we engineered the features of these variable and apply to Facebook prophet model. Two, Facebook prophet can handle missing data and outliers by fitting trends, seasonality, and other dummy variables. This phenomenon of Facebook prophet model provides a significant different from other existing time series models. However, we have applied Facebook prophet model to Walmart sale's dataset with feature engineering and not worrying about any missing data. Finally, LightGBM works as an out of the box algorithm for the given dataset for two reasons. First, we have found that the one-month sales data we are predicting are not significantly different from the training data consists of nearly 5.3 years. This indicates a natural buying pattern of the Walmart customers, which does not significantly vary on average in the specific item level (here we have 30,490 items). Second, we have created several important features (which are described in Section 5.2) to capture the sales pattern of an item in weekly, biweekly, monthly, bi-monthly, quarterly basis. In addition, we capture special big events (e.g., recreational, religious, cultural, national) with two features of the tree. We emphasize that this feature creation is a novel contribution for this type of sale's dataset, which enables us to use tree-based methods for time series prediction. However, such creation of many features prohibits the use of traditional tree-based algorithms, such as random forest, due to their computation inefficiency with large dataset. Hence, we have used LightGBM to overcome the computation complexity issues. The paper is organized as follows. A description of the literature is given in Section 2. In Section 3, we explore the data set and introduce the methods for modelling the data in Section 4. We prepare the data for modeling in Section 5 and show the prediction results applying the model in Section 6. We discuss the results and conclude the paper in Sections 7 and 8, respectively. Forecasting is widely used and studied topics in both academia and industry. Accurate forecasting significantly drives the accuracy of predicting future which is extremely important in the industries such as retail where the uncertainty of future product sales vary abruptly. A comprehensive survey on the time series forecasting, especially prediction methodologies, can be found in Mahalakshmi et al. [2016] . The general approaches described are for time series forecasting are the regression methods (single variable and multiple), stochastic forecasting techniques (e.g., support vector machine (SVM)), soft-computing based forecasting (e.g., artificial neural network (ANN)), fuzzy based forecasting (e.g., fuzzy C-Means (FCM) with ANN). The above methods are used for forecasting electricity loads and prices, trend and seasonality forecasting, stock selection and portfolio construction, etc. Aras et al. compared single method with ensemble technique Aras et al. [2017] . Each of the methods of forecasting has its advantage and can work better with a specific data set, i.e. linear model works better with if the data is linear. For forecasting time series model, Facebook prophet model so far got a large attention to several researcher from a diverse field of studies. Yenidogan et al.Yenidogan et al. [2018] compared the prophet model with ARIMA for forecasting Bitcoin and showed the prophet model outperform ARIMA model. Facebook prophet model has been shown an interesting use in research of environmental phenomena like ground level ambient fine particulate matter (P M2.5) concentrations Zhao et al. [2018] . The authors in Zhao et al. [2018] detected weekly and monthly trend of P M2.5 concentrations in between 2007 and 2015 for 220 monitoring stations in the United States. Aguilera et al. Aguilera et al. [2019] predicted daily groundwater-level (GWL) for seasonal water management and shown that prophet model outperforms most of relevant method that has been used so far. Scholars are using Long Short-Term Memory (LSTM) for predicting stock market time series model. Predicting stock market prices is always challenging and Facebook prophet does not perform well compare to different neural network's model Mohan et al. [2019] . However, Fang et al. Fang et al. [2019] suggested to use LSTM and prophet to predict the trend and then use the inverse neural network for predicting a time series works better than existing time series models. Another interesting use of this prophet method is predicting an expected meal counts to plan, buy, and produce food for an organization's staffs. Yurtsever and Tecim Yurtsever and Tecim [2020] have shown such application of this model and argued about accuracy and simple use case of this model. The recent pandemic due to Covid-19 virus would have peaked in late October has been predicted by Wang et al.Wang et al. [2020] while using machine learning and Facebook prophet method. Finally, There have been very little works on predicting time series using tree-based algorithms. This is mostly due to fact that the tree-based algorithms produce regression prediction by assigning average of the leaf (i.e., average of the training data belongs to that leaf) to the data point it is predicting James et al. [2013] . Hence, they cannot capture the trend in the data well, which is very common in the many time series data. To the best of our knowledge, the traditional ARIMA model and tree based Random forest were used for predicting outbreaks of the avian influenza (H5N1) in poultry populations in Egypt Kane et al. [2014] . It has been shown that Random forest achieves better performance compared to the ARIMA model for retrospective and prospective predictions. This is because Random forest was able to capture nonlinear relationship between lagged values and predict values as well as upward shocks of the avian influenza outbreaks. Motivated by this fact, in this paper, we have used LightGBM, which is a tree-based gradient boosting method for Walmart sale's predictions. In this section, we will explore the M5 forecasting (estimate the unit sales of Walmart retail goods) data provided by Walmart, which can be accessed from kaggle Kaggle. We have used the M5 dataset for this work, which is a grouped time series data of Walmart unit sales across ten stores, located in three different states in the United States (CA, TX, and WI). In this dataset, 3049 products are classified into 3 product categories, which are hobbies, foods, and household, respectively, and 7 different product departments, which can be visualized in figure 1. We perform both univariate and multivariate analyses, these analyses also depend on whether the variables are categorical or continuous. Most of the analyses and the findings are based on a series of questions and answers that arise by examining the data. There are two kind of decomposition model: additive and multiplicative. Additive model and Multiplicative modes are described as follows, respectively: , and , where l(t) is the level; d(t) is the trend; s(t) is the seasonality; e(t) is the noise. In a multiplicative time series, the components multiply together to make the time series, i.e., there is an increasing trend, the amplitude of seasonal activity increases. This is appropriate for our sale data since the increase in number of Walmart sales would also increase the seasonal sales. The three components are shown separately in the bottom three panels of the figure 2. These components can be added together to reconstruct the data shown in the top panel (the original time series). We also noticed that the trend of the data is not strong enough. In the figure 3.1, we notice that the average price difference of sell prices between normal days and events days are small. It shows that events do not drive the sell prices. We then calculate the average price of products in normal day, event's day and the price differences between them. We found that during events, the prices are discounted for 20% products and the prices are increased for 26% products. Although the average sell price remains almost the same between normal and event days, on product level, sell prices are driven by the events. Further, investigation of the data reveals that Food 3, Household 2 and Household 1 are the departments with most discounts. From figure 3.2, we can see that foods category accounted for most sales; hobbies accounted for least sales. Then, we show the sales in different weekdays. Intuitively, the product sales are higher in weekends compared to weekdays, which can be verified from figure 3.3. Next, we plot the price of product categories over time including the confidence intervals in figure 3.4. We notice that the pricing variation is occurring most in hobbies product. For the food and household category, the prices are negligible in variation compared to hobbies category. In this section, we briefly describe the models (i.e., ARIMA, Facebook prophet, LightGBM) that we have used extensively for forecasting unit sales. The ARIMA modeling procedure was introduced in a pioneering study conducted by Box et al. in 1970 Box et al. [2015 . ARIMA model consider three different component of historical data which is autoregressive terms, moving average, and differencing terms. And very often those components are specified with model like ARIMA(p, d, q) which defines, this model uses p autoregressive terms, q moving average terms and d differences. ARIMA model is based on identifying the structure of the auto-correlations function (ACF) in the data. Classical regression model is often insufficient for explaining all of the interesting dynamics of a time series. For example, an ACF of the residuals of the simple linear regression reveals additional structure in the data that the regression model can not capture Shumway et al. [2000] . Instead, the introduction of correlation as a phenomenon that generates lagged linear relations lead to the development of the autoregressive (AR) and moving average (MA) models. Box and Jenkins Box et al. [2015] added non-stationary part to the mix of AR and MA model, which lead to the widely used ARIMA model. ARIMA model consists of several theoretical approach that includes an iterative three-step model-building process: model identification, parameter estimation, and diagnostic checking. ARIMA model can be expressed by the following equation: where y ′ t and et are the differenced series, and random error specified at time t respectively. Moreover, ϕ and θ are the parameters of the ARIMA model. In recent era of e-commerce forecasting, Taylor and Letham Taylor and Letham [2017] introduced a new time series forecasting method based on generalized additive model. This method is well known as Facebook Prophet model for business time series forecasting. The model is used extensively in recent years for data with strong seasonality, and can handle missing data easily. The main model is incorporated with three important components such as trend, seasonality, and holidays. In the simplest form, model can be expressed using the following equation: where g(t) represents the trend function to model non-periodic effects in the value of the time series; s(t) depicts the seasonality function (weekly, monthly, and yearly); h(t) represents the effects of holidays over the entire year. The error term, e(t) represents the parametric assumption that it is normally distributed. For forecasting trend, Taylor and Letham Taylor and Letham [2017] have implemented two models: a saturated growth model and a piecewise linear model. Growth trend is typically modeled by logistic growth using the following equation: where C is the carrying capacity, k is the growth rate, and m is an offset parameter. Since many business models do not have constant carrying capacity, C can be replaced by C(t). Also, growth rate is not always constant due to new products in the market. Prophet allows S change-points at times sj; j = 1, 2, . . . ... and can be readjusted the growth rate as δ ∈ R s , where δj is the change in rate that occurs at time sj. Now, at time t, the rate is k + j:t>s j δj where k is base rate. This can be easily explained by introducing a vector a(t) ∈ (0, 1) S such that Therefore, the rate is k + a(t) T δ. If the rate k must be adjusted, the offset parameter m also needs to be adjusted to connect the endpoints of the segments. Therefore, correct adjustment at change-point j can be evaluated as γj = (sj − m − l