key: cord-0878013-6a8eb5zu authors: Zuokas, Danas; Gul, Evren; Lim, Alvin title: How did COVID-19 change what people buy: Evidence from a supermarket chain date: 2022-05-05 journal: Journal of Retailing and Consumer Services DOI: 10.1016/j.jretconser.2022.103010 sha: 169164f4509f0847b0d1dc86272c9da820d6d31d doc_id: 878013 cord_uid: 6a8eb5zu This research takes a retrospective view of the COVID-19 pandemic and attempts to accurately measure its impact on sales of different product categories in grocery retail. In total 150 product categories were analyzed using the data of a major supermarket chain in the Netherlands. We propose to measure the pandemic impact by excess sales – the difference of actual and expected sales. We show that the pandemic impact is twofold: (1) There was a large but brief growth at 30.6% in excess sales associated with panic buying across most product categories within a two-week period; and (2) People spending most of their time at home due to imposed restrictions resulted in an estimated 5.4% increase in total sales lasting as long as the restrictions were active. The pandemic impact on different product categories varies in magnitudes and timing. Using time series clustering, we identified eight clusters of categories with similar pandemic impacts. Using clustering results, we project that product categories used for cooking, baking or meal preparation in general will have elevated sales even after the pandemic. It is hard to overestimate the COVID-19 pandemic impact on the retail industry. The data provided by the statistical office of the European Union (Eurostat) 1 shows that in the 27 European Union countries year-over-year decrease in retail trade (except those of motor vehicles and motorcycles) was 6.9% in March, 17.5% in April and 2.2% in May of 2020. This drop was mostly driven by non-food products (including fuel). While the retail sales of food, beverages and tobacco increased by 8.3% in March, 0.8% in April and 4.7% in May, respectively, as compared to the same periods in 2019. This increase stayed at 3.0% on average from March 2020 to February 2021. These were mainly caused by various governmental measures to control the pandemic and the change in people's shopping behavior. At the very start of the pandemic, with the rise of the number of daily new confirmed cases and deceased, governments responded by banning public gatherings, closing educational institutions, and canceling international flights. People reacted by rushing to the stores. Probably the most iconic image of such behavior is a shopping cart piled up with packs of toilet paper. Panic buying received a huge interest among researchers in various sci-social isolation (Hwang, Rabheru, Peisah, Reichman, & Ikeda, 2020) , among others. With all these changes in everyday life, food consumption habits had to change too. People had to think more on how they get their next meal, relying on food delivery services (Tandon, Kaur, Bhatt, Mäntymäki, & Dhir, 2021) or preparing their own meals (Bennett, Young, Butler, & Coe, 2021) . To avoid crowds, consumers chose to order goods online (Baarsma & Groenewegen, 2021) or taking various precautionary measures as they visited brick-andmortar stores (Wang, Xu, Schwartz, Ghosh, & Chen, 2020) . All this led to the change in consumption of various product categories. With this research we are trying to estimate this change. First, time series models are fitted with sales data ending just before the start of the pandemic. These models are used to estimate expected future sales. We subtract these values from actual future sales to get the excess sales due to the impact of the pandemic. This generalizes the main methodological contribution of our paper -to provide a framework to measure the pandemic impact without pandemic data acting as exogenous information. This contrasts with the approach where various pandemic metrics (the number of new daily cases, dummy variables indicating lockdown period, etc.) act as exogenous variables in a regression model for sales. Studies using similar and alternative approaches are reviewed, and arguments supporting our research strategy are presented in Section 2. Section 3 provides the details on the time series models and presents the estimated impact of the pandemic for 150 product categories of the supermarket chain. To summarize the results of the pandemic impact on all product categories, we group them using data clustering and provide explanations on the clustering results. Clustering methods and distinct clusters themselves are presented in Section 4. Thus, we contribute to the existing literature by providing an extensive reference for the researchers who are interested in the consumer behavior during the COVID-19 pandemic. Finally, in Section 5, we review the benefits of our research for business practitioners. We also assess the potential limitations in our strategy and consider some future directions of research to extend the current work. 3 J o u r n a l P r e -p r o o f behavior, household expenditure, and retail sales. No matter how diverse, they could be grouped with regards to data sets and methods used. One class of research, e.g., Laato, Islam, Farooq, and Dhir (2020) , Omar, Nazri, Ali, and Alam (2021), Eger, Komárková, Egerová, and Mičík (2021) , use respondent survey data and structural equation modeling (a well established framework in social sciences with rigorous procedures) to investigate how consumers' latent (not measured directly), inner (psychological) factors influence their purchasing behavior. In contrast, our study does not look for the motives behind the changed purchasing behavior but detects the behavioral change itself using historical purchase data. Another set of research analyzes consumer spending using purchase transaction data (basically payment by card, cash withdrawals and incoming money flows) across different expenditure categories, e.g., travel, groceries, restaurants, etc. Transaction data of 760 thousand Danish households were used in Andersen, Hansen, Johannesen, and Sheridan (2020) to compare pre-pandemic year-over-year increase in spending to those after the onset of the pandemic to estimate the pandemic's impact. Similar pandemic impact estimation is used in Carvalho et al. (2020) wherein 2.1 billion transactions from 1.6 million distinct points of sale in Spain were examined. Daily transaction data in 214 Chinese cities were examined in Chen, Qian, and Wen (2020) to estimate the impact of the pandemic by fitting a linear regression model with a dummy variable indicating the pandemic time period. Two articles analyzed data that is similar to that of our research and had similar research objectives -to measure the pandemic impact on various grocery product categories. Vall Castelló and Lopez Casasnovas (2021) investigated the sales data of a Catalan supermarket chain with 11% market share and reported sales changes in 12 food product categories. To capture the effect of newly confirmed COVID-19 cases on sales, they included it as a variable in the linear regression model. O'Connell,Áureo de Paula, and Smith (2021) analyzed purchase data on 138 categories of fast-moving consumer goods from 17 thousand 4 J o u r n a l P r e -p r o o f UK households and grouped the categories into four groups. As it is not stated in the paper, we suppose that the authors manually assigned each category to a group. In contrast, we use a data-driven approach to group product categories. Product categories were assigned to three groups in Jung and Sung (2017) Previous studies show that to measure the pandemic impact on sales, researchers take two directions. The first is (auto)regression modeling with pandemic explanatory variables. Including a pandemic time period indicator allows the capture of change in average sales level. Such models might not be capable of inferring other (more dynamic) forms of pandemic impact. Another common model specification is to introduce the number of confirmed cases which captures the pandemic impact with a multiplier effect. This strategy is limiting in explaining sales dynamics of various product categories beyond a scale on the same data. is compared to that of from January 1 to February 15 (normal). Excess spending is computed as year-over-year change in daily spend. A consistent approach is taken in Panzone, Larcom, and She (2021) where 12 seasonal ARIMA models are fitted on sales data. Then, the difference of actual and expected sales values are computed with expected values coming from forecasts given by the best model. Our method is similar to this approach, but we improve on it by using three more time series modeling alternatives besides ARIMA and combining forecast results of the models into one final forecast. Such an approach has the advantage over regression modeling in that a researcher does not have to explicitly specify how the pandemic affects sales. Consequently, the only concern left is accurate time series forecasting, and the methods for this task are well-developed and established. On the condition that forecasts are accurate, this allows us to obtain pandemic impact in its pure form. On the other hand, this does not allow the capture of the effects of different factors contributing to the total pandemic impact. Also, it can be applied to a relatively short time period because forecast accuracy decreases. Forecasting is essential in conventional retail analytics. It finds its uses in product pricing, promotion planning, marketing, assortment modeling and other operational areas. Comprehensive and conceptual overview of the research literature on forecasting in retail can be found in Fildes, Ma, and Kolassa (2019). It shows that for a broad range of different tasks and time series data, various statistical or machine learning methods are often applied. Before presenting methods that we applied in our study, let us first briefly describe the data that we used. We had access to four years of sales data of a supermarket chain in the Netherlands representing over 20% market share in 2020. 150 categories (most of them food and drinks but also home, body, baby, and pet care) that constitute 98.9% of 2020 total sales were analyzed. Time series were aggregated to four-week periods, and the logarithmic transformation was used -a common practice in econometric modeling. • Exponential smoothing; and • ARIMA models. They all are suitable for our analysis. First, they account for trend and seasonality, which are included in most sales time series. Second, they are easy to implement, do not require a heavy computational infrastructure and can select the best model in an autonomous manner if needed. This is important because we are dealing with 150 sales time series and do not have the resources to fine-tune each model separately. Next, we describe how we apply each of the listed classical methods in our case. Time series decomposition. Separately for each category, using additive decomposition the value of the logarithm of sales, y t , is split into a trend T t , a seasonality S t and the remainder R t components. The seasonality component is estimated using moving averages. Then, linear median regression model is fitted on the seasonally adjusted component, y t −Ŝ t , using linear trend and the logarithm of the total selling area as explanatory variables. The An h-step forecastŷ t+h|t at time period t is computed via the following recursive relations: where m denotes the number of seasons, k is the integer part of (h − 1)/m and α, β and γ are smoothing parameters 2 . ARIMA models. To be exact, SARIMAX models are used where S stands for seasonality and X for external regressors. Taking different specifications, a whole set of models of the same family are fitted, and the best model is selected based on the value of AICc Information criterion (AIC with small-sample correction). Model specification varies by taking different combinations of major parameters: the (seasonal) order of auto-regressive and moving-average parts and the (seasonal) integration order. To measure model out-of-sample forecast accuracy, we fit a model using data that ends one year prior to the onset of the pandemic. Then, one-year forecast values are subtracted from the actual values. Any aggregate characteristic of these differences (mean absolute error, root-mean-square error) is appropriate to rank the models and choose the best. It is now a common practice, especially in machine learning, to use ensemble learning 2 See https://otexts.com/fpp2/holt-winters.html for more details 8 J o u r n a l P r e -p r o o f techniques (see Polikar (2009) for a short introduction). Very often ensemble models perform better than separate models because the forecast errors of these models cancel each other out. However, at times, these techniques are computationally more expensive than the individual models themselves. We have tried one of the well-known and simple to implement ensemble learning methods, We begin this section with the results on forecasting accuracy. For each of the 150 product categories four time series models were fitted using 2 years of data ending in four-week J o u r n a l P r e -p r o o f Inspecting the performance of individuals models, it might be tempting to exclude aHW model from averaging. Nevertheless, AVG4 proved to be more accurate. This means there are product categories for which aHW is more accurate than the rest of the models (this is a good example of model diversification). Also, bagging does not bring significant improvement to the accuracy, therefore, we chose the average of all four model forecasts as a final forecast value, which also does not require additional computational resources as bagging does. Next, forecasting results for several product categories are presented in Figure 1 . We chose categories with different MAPE values: 1.5% for cat food, 3.1% -bananas, 4.9%baby diapers and 6.6% for cereal. Models are fitted using data that ends in period 2 of Percentage pandemic impact is calculated as the difference of actual and forecast values divided by forecast value. We divide by forecast value because it represents "normal" sales, and actual value represent "normal" plus excess sales caused by pandemic. Therefore, pandemic impact is measured as a percentage increase of "normal" sales. The first aggregation period of the pandemic covers two weeks before and two weeks at the start of the pandemic. Therefore, for this period the value of two-week excess sales is multiplied by two to compare it to four weeks of "normal" sales. Our choice of such aggregation was deliberate as we wanted to separate panic buying effects. Remark 1 We estimate the panic buying effect at 30.6% of excess sales for the two-week panic buying period from March 11 to March 24 of 2020. After the panic buying period, excess sales values vary around 5%, except for a period between the first and the second wave of the pandemic when strict pandemic control measures were loosened. During this period, the average excess sales value is 1.5%. Remark 2 We estimate pandemic impact on total sales at 5.4% of excess sales over the entire pandemic period, excluding the panic buying period. We have already seen in Figure 1 pandemic impacts of different magnitudes and various dynamics. It is rational to expect, however, that there are categories with similar impact. It might be that sales of some categories were more alike even before the start of the pandemic and also have similar impact. Conversely, it could be that some categories reacted in a similar fashion to the pandemic while prior to it had quite different sales dynamics. In this situation, a data-driven approach might work best to discover pandemic impact patterns, and we employ clustering techniques to achieve this. formally it is defined as follows: where δ conv (S 1 , S 2 ) is a conventional distance measure between the raw values of two time 14 J o u r n a l P r e -p r o o f series S 1 and S 2 , and cort(S 1 , S 2 ) is a correlation coefficient of time series changes (temporal correlation): Function f (x) = 2/(1 + exp(kx)) modulates (via parameter k ≥ 0) the contribution of temporal correlation to total value of the dissimilarity index. We chose k = 1 which gives more uniform contribution over the whole range of cort(S 1 , S 2 ) values. Euclidean distance was chosen as a conventional distance measure. The final step in hierarchical clustering is to select an algorithm by which time series are joined into clusters, and we chose the agglomerate approach. Initially each element is a cluster of its own. Then, repeatedly, two most similar clusters are merged until all objects end up in one cluster. This process is presented by a dendrogram. There are several methods on how two clusters are merged, and we chose Ward's method (Murtagh & Legendre, 2014). Using this method, at each agglomeration step, two clusters are selected to merge so that after merging the total within-cluster variance increases by the smallest amount. Next, the results of clustering 150 product categories using percentage pandemic impact are presented. We start by showing the agglomeration process of all 150 product categories in Figure 3 . The earlier two clusters merge (the more similar they are) the closer to the left is the link that joins them together. Clusters are obtained by cutting the dendrogram vertically at a certain level and collecting the elements of each cut branch. It is rare that one cut gives the final set of clusters unless they are well separated and there are no outliers. Usually, we see some elements forming tight clusters early and others later. Also, some elements stay on their own for a while -these are outliers (can be spotted 16 J o u r n a l P r e -p r o o f at the bottom of Figure 3 ). It is noteworthy to discuss several of them (forecasting results and pandemic impact are shown in Figure 4 ). Hand soap. This product category defines the pandemic and is the biggest outlier. During the two weeks of panic buying, it sold around 9.2 times usual. This excess buying gradually decreased and stayed at about 83.3% above usual level afterward. Even after the pandemic, the sales of this category might stay at a higher level due to a new habit of maintaining hand hygiene. Candles. From the start of the pandemic during the rest of 2020, candles sold on average 40.9% in excess. Business professionals in this market suggest that scented candles (and home fragrance in general) serve as a portal to the dreamland and adds coziness to a place 3 . Gambling. The sales of this product category show very interesting dynamics. During the Finally, for each cluster, we give a possible interpretation and show impact graphs of several representatives. Looking at some of these graphs, it might seem that product cate-gories are not that similar. But, let us remember that the chosen distance measure receives contribution not only from the distance with respect to raw values (magnitude) but also from the distance measured by correlation of time series changes (dynamics). The former is recognized more easily and supposedly is perceived as a "true" distance by our brains. The latter requires more effort and can be perceived as a similarity in the shape. Cluster 1: Survival kit. Product categories in this cluster have the biggest panic buying effect exceeding 100%. The cluster includes toilet paper, preserved food (vegetables, meats, fish), pasta, rice, various meal mixes and soups -basic, dry-packaged foods that do not take up much space (except for toilet paper). After the first two weeks excess sales dropped and stayed at 5-10% throughout the year showing the magnitude of a stay-at-home effect. Cluster 2: Stay-at-home (stock). This cluster reflects increased activity (baking, tidying) at home. Stay-at-home effect is bigger than that of Cluster 1 and varies around 15-20%. Cluster members are coffee, flour, oils, sugar, frozen fish and vegetables, cleaners and bathroom supplies. These are dry-packaged food ingredients and home/body care products that need more space or special equipment (freezers) to keep. They also have very big panic buying effect reaching 60-80%. Categories in this cluster show more similarity in shape rather than magnitude of pandemic impact. Alcoholic drinks (except beer) had stay-at-home effect at around 10-15% while soda and cola at around 0%. The spike at the beginning of 2021 includes shopping for New Year's eve celebration. Figure 9 : Representatives of Cluster 4. Cluster 5: Breakfast and lunch. Sandwich spreads, pastes and toppings, cereal, butter and margarine, bread substitutes, minced meats, poultry, frozen potatoes, chocolate are in this cluster. Except for a minced meat and poultry, these categories can be easily stored. Thus, panic buying effect is large and is around 50%. The cluster is quite like Cluster 2, but categories in this cluster are consumed daily and are purchased more frequently. Cluster 7: School's out. This is another cluster with negative pandemic impact except for two panic buying weeks. Looking at the list of cluster members -iced tea, snack bars, candy, convenience (pizza, pancakes), frozen meals, diet (lactose-free, slimming, sport food), cosmetics (shaving, hair), juices -suggests that consumers of these products could be schoolchildren and students. With the closure of schools, schoolchildren had less opportunities to consume soft drinks and snacks. While the closure of universities (and campuses) left some students no choice but to return to their parents house where they had healthier food. In April 2020, for example, the sales of the snack bars was 30%, and those of the convenience category -12% below the expected. J o u r n a l P r e -p r o o f Pandemic impact for all 150 product categories is presented in Table 2 in Appendix. To summarize clustering results, Figure 14 depicts all product categories as points on a xyscatter plot, where on x-axis we have the panic buying effect and on y-axis the median of pandemic impact (excluding panic buying period). For each cluster, a bivariate t-distribution is fitted, and an ellipse region is added which covers 80% of a probability mass of that distribution. We can use this plot as a visual tool for cluster validation, and it is yet another evidence that our clusters are well separated. J o u r n a l P r e -p r o o f Cluster 1 immediately draws attention with the biggest panic buying effect. Yet, afterwards, sales stayed moderately elevated at around 5-10%, similarly to Cluster 5 and 8. Product categories in these three clusters are mostly used in everyday meal preparation and make up around 50% of total sales. Cluster 4 with total sales proportion at around 14% also has similar median pandemic impact, however panic buying effect is closer to zero. Product categories in all these five clusters are mostly used for cooking, baking and preparing all kinds of meals. With the lifestyles changing (working more from home) and the pandemic strengthening the trend of home cooking and baking, we expect elevated sales of these product categories from now on. Our projection is supported by the fact that the pandemic impact stays well above zero even during the pause between the two pandemic waves for these clusters. For the remaining clusters (Clusters 4, 6 and 7) we can not find a support for sales increase after the end of pandemic. For Cluster 4 ("A drink or two") no panic buying effect is observed, and increased pandemic impact is observed only during the summer season. This can be explained by people choosing to spend their vacation at home country. As for Cluster 6 ("In excess"), we see pandemic impact approaching zero from the negative side as people slowly clear off their stocks. With schools and universities fully open, and students returning to campuses, we expect sales for product categories in Cluster 7 ("School's out") also return to normal levels. Finally, let us discuss how the methods and results in this research could be applied and how it could be extended. Our research has several potential implications for other researchers and business practitioners. The strategy that we chose to measure pandemic impact could be applied in any situation where it is important to estimate the discrepancy between normal (expected) and unusual We have shown how different product categories can be grouped using pandemic impact data. Using product level sales data and the same methodology, researchers could identify small groups of complementary products. Similar goals are set when performing market basket analysis, and this approach could be used to support it. Clustering has an advantage in that it requires significantly smaller amount of computational resources. Also, when comparing two products, it can take sales dynamics into account, contrary to the static nature of the classical algorithms. Using a similar approach, Li, Wu, Zhang, and Zou (2021) proposed temporal correlation of sales time series to measure product similarity. Using thorough methodology, we have shown the magnitude and the shape of the pandemic impact for different product categories. Employing this information, business practitioners could take timely actions in anticipation of a demand change in similar future events. To benefit more people they could impose a limit on the allowable purchase quantity or adjust prices in certain categories. Finally, knowing product categories which might have elevated sales in the future can help retailers in other merchandising and marketing functions such as deciding the changes to product assortment and planning for promotion, for example. The columns "Panic", "Post-panic", and "Average" represent the effect of the pandemic as percent change in sales for the two-week panic buying period, the four-week immediately following the two weeks of panic buying, and the one-year of pandemic excluding the first six weeks, respectively. • Pandemic impact on sales is measured without using pandemic-related information. • Panic buying effect is estimated at 30.6% of increase in total sales for two weeks. • Excluding panic buying overall pandemic impact is estimated at 5.4% of excess sales. • 150 product categories are analyzed which fully represent the shopping basket. • Product categories are grouped into 8 clusters of similar pandemic impact profile. algorithms implement ward's criterion? Preparing for a pandemic: Spending dynamics and panic buying during the covid-19 first wave The panic buying behavior of consumers during the covid-19 pandemic: Examining the influences of uncertainty, perceptions of severity, perceptions of scarcity, and anxiety Estimating the impact of the first covid-19 lockdown on uk food retailers and the restaurant sector Ensemble learning Calibrating sales forecast in a pandemic using competitive online non-parametric regression Why do people purchase from food delivery apps? a consumer value perspective The effect of lockdowns and infection rates on supermarket sales Covid-19 and retail grocery management: Insights from a broad-based consumer survey