key: cord-0076072-evlo75sq authors: Michis, Antonis A. title: Retail distribution evaluation in brand-level sales response models date: 2022-03-25 journal: J Market Anal DOI: 10.1057/s41270-022-00165-8 sha: 1886c1fa7a98f47d6cf2aded932e0d76e1390ff6 doc_id: 76072 cord_uid: evlo75sq The effectiveness of a product’s distribution network in retail stores is an important consideration for marketing managers. An effective distribution network typically covers a large number of stores in the geographic area of a market and establishes a continuous presence in the top-selling outlets of a product category at the same time. This study proposes a semiparametric, brand-level version of the SCAN*PRO sales model, to evaluate the impact of retail distribution changes on sales. The model is estimated using the iteratively reweighted least squares method and provides the following outputs: (i) least squares coefficient estimates for the price and promotional drivers in the model specification and (ii) two-dimensional plots of the nonmonotonic relationship between the weighted distribution and sales. The proposed model can be estimated with commonly available retail scanning data and is demonstrated using three laundry detergent brands from The Netherlands. The establishment and monitoring of an effective product distribution network of stores is a crucial part of the marketing strategy of an organization. The success of marketing activities such as new product introductions, promotions and temporary price reductions strongly depends on the existence and development of an effective distribution network. Such a network should provide access to a large number of retail outlets while covering the most important stores (in terms of sales volume) within the boundaries of the market. However, with the growing bargaining power of large retailers, manufacturers are nowadays required to allocate more financial resources in order to secure their products' presence in the important retail chains of a market, which constrains the scale of the distribution coverage that a company can afford. Effective distribution management is even more complex for multinational corporations that distribute products in countries where market infrastructures, retail technology and shoppers' habits greatly differ from the North American and Western European standards (e.g., emerging markets). Guissoni et al. (2021) note that in emerging markets, the cost of expanding the retail distribution network of a brand represents a considerable percentage of consumer packaged goods sales. Frequently, one of the main challenges for a multinational corporation in these environments is to oversee the proper implementation of its marketing strategy by the independent distributors (Arnold 2000) . This is due to the underdeveloped retail infrastructures and logistics systems that characterize these markets (Sharma et al. 2019) . Reibstein and Farris (1995) , Krider et al. (2005) , Krider et al. (2008) , and Friberg and Sancturary (2017) emphasize that there is a cause and effect relationship between sales/ market shares and distribution, which can be both highly nonlinear and difficult to estimate. Knowledge of the shape of this relationship, commonly known as the velocity curve (Hirche et al. 2021b ), provides marketing managers with a better understanding of the bi-directional causality between market share and distribution, as well as, the structure of the retail distribution environment. Furthermore, understanding the product characteristics that are associated with over-or underperformance relative to the velocity curve is an important consideration for marketing managers, particularly in cases related with SKU discontinuations, additional marketing support or new SKU introductions (Wilbur and Farris 2014; Hirche et al. 2021a) . The sudden disruptions on consumer demand imposed by the COVID-19 pandemic had a negative impact on supply chain performance, with important implications for retail distribution management. For example, Pantano et al. (2020) note that consumers tend to cope with one time stock-outs by switching stores or brands, while the emergence of stockpiling behaviour should be expected to impact price sensitivity. Furthermore, the lower accessibility of store premises due to government imposed restrictions (e.g. lockdowns) and the increase in consumers' health concerns can result in the adoption of alternative distribution channels (Erjavec and Manfreda 2022) and consumption patterns by consumers (Park, et al. 2022) , as is the case with online shopping and/or switching to small grocery stores that are more accessible in terms of spatial proximity. Understanding the market outcomes associated with these changes and developing appropriate marketing actions requires the development of analytical tools that are able to analyse the effects of marketing instruments (distribution, price, promotions) jointly. The sales response model proposed in this study therefore provides a managerial tool for continuously evaluating and monitoring the effectiveness of a product's distribution network, jointly with the other instruments in the marketing mix. It is based on a semiparametric version of the SCAN*PRO sales model, which was designed for commercial applications and is particularly well suited for scenario analysis (see Leeflang et al. 2000; van Heerde et al. 2002 and Wittink et al. 2011) . However, unlike the standard version model, it incorporates handling distribution as an additional explanatory variable that enters the model in nonparametric form. Furthermore, our analysis contributes to the growing semiparametric modelling literature in marketing, which addresses several important questions that cannot be adequately addressed with parametric methods. Hruschkla, (2002) , Steiner et al. (2007) , Van Heerde (2017) and Guhl et al. (2018) note that semiparametric models are generally useful for studying the functional form characterising the relationship between marketing inputs and outputs, not only with respect to pricing but also in other areas of marketing where interactions exist. Even though the shape of the velocity curve (convex, concave, S-shaped, linear) is an important generalisation for marketing managers, it is not commonly incorporated in the sales response models frequently used by marketing managers. In this context, the semiparametric sales response model proposed in this study aims to address the following related gaps in the marketing literature: (i) It provides semiparametric estimates of the nonlinear sales-distribution relationship, which facilitates understanding of the expected returns from possible distributions expansions. (ii) It enables the joint evaluation of all the main marketing drivers (e.g. price changes, promotions) that can increase performance over and above the levels predicted by the sales-distribution relationship. (iii) It can be used for "what-if" scenario analyses, which are important for planning the efficient allocation of marketing resources. (iv) It can be calibrated using commonly available scanning data, which facilitates its application by marketing managers. The rest of the article is organized as follows. A review of the existing literature on retail distribution modelling is provided in the "Literature review" section, including the main shapes of the velocity curve that are likely to be encountered in retail markets. The proposed model and the associated semiparametric estimation method are explained in the "Methodology" section. In the "Empirical study" section, an empirical study, which uses three laundry detergent brands from the Netherlands, is presented to demonstrate the marketing insights associated with the proposed model. The "Practical implications" section elaborates on the practical implications associated with the empirical study and the "Conclusions" section concludes with a discussion of the main findings of the study. Increases in the retail distribution coverage of a product tend to be associated with relatively large sales (and market share) increases during the initial stages of a product's life cycle. For this reason, Hanssens et al. (2001, p. 347 ) emphasize that distribution elasticities should be expected to be greater than one. Progressively, as more stores are added to the distribution network of a product, decreasing or increasing returns could occur, and the sales impact might vary (see Lilien et al. 1992, p. 435 ). This suggests a nonlinear relationship between the market response variables and distribution. A related study by Farris et al. (1989) showed that the market share can be a convex function of the distribution; and Reibstein and Farris (1995) proposed several other nonlinear shapes of the relationship that are theoretically possible, such as concave and S-shaped functions. In addition, Frazier and Lassar (1996) showed that the distribution levels of brands within a product category can vary considerably since distribution intensity tends to be influenced by several moderator effects such as manufacturer channel practices, brand strategies and retailer credible commitments (e.g., contractual restrictions). These effects further complicate the impact of distribution changes on sales. Bronnenberg et al. (2000) concentrated on feedback effects and showed that during the growth stage of a repeatpurchase product, the formation of market shares is influenced by a positive feedback mechanism with distribution. This is because retailers' distribution decisions consider the prior market share performance of brands. The authors also found evidence of a time window that is crucial to the establishment of market shares. Beyond this time window, the feedback mechanism between distribution and market share changes weakens, retailers are less prone to distribution changes and late entrants find it difficult to seize considerable market shares. Nishida (2017) provided further evidence for the existence of a nonmonotonic, inverted U relationship between the density of one's own outlets and the sales performance per outlet in a market. First, through a detailed econometric study, the author found (on average) a 115.3% market share advantage for first entrants over later entrants in the Japanese convenience store industry. Of this significant advantage, 101.7% was due to a higher number of outlets (higher outlet share in the market), and the remaining 13.6% was due to higher sales per store. Second, further analysis of the higher sales per store advantage (13.6%) revealed that during the early stages of a brand's distribution development, the addition of new stores increases sales per outlet. However, after a certain threshold, sales per outlet start to decrease as new stores are added. This is due to the trade-off between the positive brand advertising effect associated with repeated purchases in one's own outlets and the negative cannibalization effect associated with more intense competition between these outlets. Sharma et al. (2019) examined the importance of retail distribution strategies in emerging markets that are characterized by underdeveloped road infrastructures and low levels of retail store penetration. Τo study the price and retail distribution strategies of manufacturers, the authors developed a competitive model for the insecticide market of India, where manufacturers compete across multiple channels and product forms. The econometric results of this study suggest that distribution exerts a considerable influence on profits since ignoring distribution in the competitive model can result in serious profit overestimation (by 7% to 55%). Furthermore, the observed price and distribution strategies in the market suggest the existence of competition rather than collusion among manufacturers. It is also worth mentioning that the research by Tenn and Yun (2008) showed that price elasticity estimates can be significantly biased when limited retail distribution is not accounted for in demand models. Hirche et al. (2021b) proposed a metric for measuring deviations from the velocity curve, generated by the wellknown Reibstein and Farris model (Reibstein and Farris 1995) . These deviations were subsequently regressed on product related characteristics to identify the factors that generate performance over and above the levels predicted by the velocity curve. According to the author's findings larger packs, private label status and medium-range price levels contribute to improved market share performance. Deviations from the velocity curve of a brand were also used by Hirche et al. (2021a) in the context of a machine learning application, to identify the SKU characteristics that can explain these deviations. The proposed machine learning method was able to predict correctly 83% of the cases as either over-, in-line-or underperforming relative to their velocity curves. In a recent study of the distribution-market share relationship for an emerging market, Guissoni et al. (2021) found that the convexity of the velocity curve varies by channel type, the economic situation in the country and the market share position of the brands under consideration. Convexity was found to be higher in the self-service (e.g. major chain stores) than in the traditional full-service (small owner-managed stores) channel. Furthermore, increases in distribution were found to be more effective in the self-service channel during times of economic decline and the degree of convexity in this channel was found to be higher for high-share brands. Previous empirical studies that have incorporated distribution variables into various forms of market response models include the following: log-linear sales models (Brissimis and Kioulafas 1987) , new product diffusion models (Jones and Ritz 1991) , VAR market share models (Bronnenberg et al. 2000) , EGARCH market share volatility models (Varkatsas 2008) and logit choice models (Bucklin et al. 2008) . Krider et al. (2005) suggested an alternative methodology based on state-space diagrams to explore the lead-lag relationship between distribution and demand. This method is particularly suited for time series that are short in length and nonstationary. It was further extended in Krider et al. (2008) to assist econometric modelling decisions and improve inferences (e.g., regarding the speed of the market response) when long time series are available. In this study, a semiparametric modelling method based on the SCAN*PRO sales model is used. Using locally weighted scatterplot smoothing in the context of the backfitting-algorithm, the proposed model generates twodimensional plots of the nonlinearities characterizing the sales-distribution relationship. Apart from facilitating the visual inspection, the addition of a distribution variable also improves the specification of sales response models and enables marketing managers to perform market response predictions for given levels of distribution. Following the analysis by Reibstein and Farris (1995) , in the remaining paragraphs of this section we explain the main shapes of the sales/market share-distribution relationship that are likely to be encountered in retail markets: • Convex: This shape of the sales/market share-distribution relationship should be expected in the following cases: (i) when market share is not only a cause but also an effect of retail distribution and (ii) when the structure of the retail network in the market is such that there are a few large stores that stock many brands and many small stores that stock only the leading brands. This pattern of distribution effects is further amplified when search loyalty in the market is low. Friberg and Sancturary (2017) provided similar insights with respect to this shape of the distribution curve. Specifically, competing with successively fewer products (smaller assortments) in smaller stores as distribution expands, combined with a lower resistance to compromise by the consumers contributes to a convex relationship. More examples of this relationship are reported by Wilbur and Farris (2014) . • Concave: This shape of the sales/market share-distribution relationship is more likely to exist when: (i) all stores carry approximately the same brands, but the distribution policy of the manufacturer concentrates on the stores with the greatest potential (initial handlers with higher shares); (ii) initial stores engage in significant in-store support activities that are not matched by later handlers; and (iii) brand loyalty is high in initial handlers, while subsequent stores in the brand's distribution network are characterised by non-loyal consumers; (iv) a trade-off exists between the positive brand advertising effect associated with repeated purchases and the negative cannibalization effect associated with more intense competition between handlers (Nishida 2017). • S-shaped function: This shape of the sales market/sharedistribution relationship is more frequently encountered when the ability of the brand to attract consumers tends to be associated with its availability and only a proportion of the customers is loyal. It can also be encountered in retail environments where in-store support programs and some levels of consumer loyalty co-exist. Examples of S-shaped functions are provided by Hirche et al. (2021a) . • Linear function: A linear sales/market share-distribution function tends to exist when distribution expansion provides access to consumers who prefer the brand and these distribution increases are associated constant within-store shares in the new handlers. Examples of linear market share-distribution functions are provided by Hirche et al. (2021b) . A common problem in studies of distribution effectiveness concerns the lack of sufficient variation in the aggregated measures of distribution that are usually available to marketing managers. This is especially true for mature and well-established brands that are already available for purchase in the most important stores of a product category. As a result, there is limited scope for further development of the distribution network, and distribution aggregates exhibit only limited variability. To overcome this limitation, Bucklin et al. (2008) used buyer-level data and the distances between buyers and car dealers in order to construct three distribution intensity measures. The resulting cross-sectional variation enabled an examination of the impact of the distribution intensity on market outcomes, which otherwise would not be possible with aggregated distribution variables. In this study, a different approach is used. Detailed SKU level data are combined in order to create panel datasets that can be used for estimating brand level sales response models. The cross-sectional units in the panel dataset consist of all the SKUs included in a brand portfolio, and the time period is weekly observations. For each brand portfolio examined in this study, there were several SKUs available in the market that consisted of new and established SKUs. There were also SKUs towards the end of their life cycle that were gradually being withdrawn from the market. Since most SKUs in each brand portfolio differed in terms of their life cycle stage, they also exhibited different levels of distribution intensity. Consequently, incorporating all the SKUs in a common (panel data) sales response model provides sufficient variation to nonparametrically estimate the sales-distribution relationship at the brand level. Fast-moving consumer goods manufacturers typically develop their retail distribution strategies at the brand level, and policy decisions with regards to distribution are then applied to all the SKUs in the respective brand portfolio. Therefore, by incorporating all the SKUs of a brand in a common modelling framework, it is possible to identify: (i) the shape of the sales-distribution relationship and (ii) whether the distribution strategy of the brand was successful in increasing sales. Nonparametric and semiparametric models have been previously used in the marketing literature as more flexible alternatives to parametric models. This is especially true for cases where the relationships between the marketing variables are characterised by nonlinearities. Previous studies using nonparametric techniques in marketing applications include models for market shares (Hruschkla 2002) , sales and price effects (Kalyanam and Shively 1998; Van Heerde et al. 2001; Sloot et al. 2006; Martínez-Ruiz et al. 2006; Steiner et al. 2007; Michis 2009 ), brand choice (Abe 1995; Guhl et al. 2018) , media usage (Rust 1988 ) and sales forecasting (Michis 2015) . The inflexibility of parametric models can be especially problematic in cases of incorrect model specifications, which provide inconsistent parameter estimates and therefore lead to misleading conclusions regarding marketing effectiveness (Albers 2012) . In contrast, nonparametric models do not impose any strict restrictions concerning the form of the joint distribution of the model's data. However, the convergence rate of nonparametric estimators is generally slower and the precise estimation of regression surfaces requires significantly larger samples (Van Heerde 2000) . This is not the case for correctly specified parametric models that are asymptotically efficient (maximal convergence rate to the true parameter values). Semiparametric models combine elements of the two modelling methods in an attempt to benefit from the flexibility of nonparametric models and the efficiency of parametric models. However, as a midway solution they are not as flexible as nonparametric models and not as efficient as parametric models. Since they include both a nonparametric and a parametric component, the dimensionality of the nonparametric function is reduced, which also reduces the curse of dimensionality problem in the estimation procedure (Van Heerde 2000) . To evaluate the sales-distribution relationship in this study, a semiparametric extension of the SCAN*PRO model is proposed as follows: where S j it : sales volume of SKU i in brand portfolio j in period t ; P j it : average price per unit of SKU i in brand portfolio j in period t ; P j t : weighted average price of brand j in period t ; P g it : average price per unit of SKU i in competitive brand portfolio g in period t ; P (1) a, k , r , i , g , l and d : regression coefficients; and j it : error term of the model. This specification differs from the standard SCAN*PRO model (Andrews et al. 2008) in two important aspects: (i) the weighted distribution ( WD j it ) is included in nonparametric form (f) as an additional explanatory variable in the model and (ii) observations vary across SKUs (i) and weeks (t). In contrast, in the standard SCAN*PRO model, observations vary across stores and weeks. As explained above (see "Variability in distribution" section ), the differences in the distribution intensity across the different SKUs in each brand portfolio provide sufficient variation in order to nonparametrically estimate the sales-distribution relationship at the brand level. Furthermore, as is common for panel data models with fixed effects (Allison 2009 ), separate indicator variables ( I j i ) were included in the model for each SKU i in the dataset that is included under brand portfolio j. The semiparametric model in Eq. (1) can also be extended to include additional nonparametric components as follows: Consequently, in addition to weighted distribution ( WD j it ) Eq. (2) now includes a second nonparametric term for ownprice effects ( P j it ∕P j t ). Empirical estimates of the shapes of these partial regression functions are presented in the "Ownprice flexibility " section, using three laundry detergent brands from the Netherlands. Model (1) can be estimated with the iteratively reweighted least squares method, which can be combined with the backfitting algorithm developed by Hastie and Tibshirani (1990) in order to estimate the nonparametric part of the model. This algorithm proceeds in three stages: I. The least squares estimation of the parametric part of the model. II. The partial residuals obtained from step I are then used in a scatterplot smoothing procedure against the weighted distribution variable that enters the model in nonparametric form. This generates a first estimate of its partial regression function f (log ND). III. The partial regression function generated in step II is then replaced in the regression model, and steps I and II are repeated several times until the estimates for the partial regression function stabilize (convergence). Consequently, the backfitting algorithm involves an iterative smoothing procedure of the partial residuals against estimates of the partial regression function. In this study, scatterplot smoothing was performed using the local polynomial regression method. The estimation procedure using the backfitting algorithm and the associated smoothing procedure are best explained by considering the semiparametric regression model and the corresponding linear regression model with transformations in means The backfitting algorithm starts by estimating the transformed model with the least squares method, which produces an initial estimate of the partial regression function f 0 r = b r (x ir − x r ) . The partial residuals, which retain the relationship between y i and x ir but remove from y i its linear relationship to all the other explanatory variables, can be written as follows: The backfitting algorithm then proceeds to smooth the partial residuals against the indicator variable x ir that enters the regression model in nonparametric form. This first application of the smoothing procedure generates a first estimate of the partial regression function f 1 r , which is then replaced in the regression model. The partial residuals are calculated again and smoothed against the indicator variable x ir in order to generate a second estimate of the partial regression function f 2 r . This new estimate is again replaced in the regression model, and the procedure is iteratively repeated until the partial regression function estimates stabilize and the iteration procedure converges (see Fox 2000, p. 32) . Consequently, the iterative scatterplot smoothing procedure is central to the operation of the backfitting algorithm. In the empirical applications of the next section, smoothing was performed with the local polynomial regression method, which is frequently employed in the context of the backfitting algorithm. Given a sample of observations for variable x ir , this method starts by fitting the following polynomial equation around a focal point x 0r through a weighted least squares procedure that minimizes the weighted residual sum of squares ∑ n i=1 w i E 2 i : The fitted value at point x 0r is equal to ê r |x 0r = A. The same estimation procedure is then performed at several other focal points across the sample of observations in order to generate fitted values and form the smooth function associated with indicator variable x ir . In this study, the weights used in the weighted least squares estimation procedure were based on the following kernel function, which gives greater weights to observations near the focal point x 0r : The smoothness of the partial regression function can be controlled with the bandwidth or smoothing parameter ( h ). In this study, the nearest neighbourhood method was used, based on which the bandwidth parameter is adjusted to include a fixed portion of the data around each focal point. The final value of the bandwidth parameter in each application was selected based on the procedure suggested by Fox (2008, p. 482) . This procedure starts with cross validation in order to obtain an initial estimate of the parameter. It then proceeds with visually guided adjustments in order to achieve the smallest value of the parameter that provides a sufficiently smooth fit to the data. The contribution of indicator variable x ir to the semiparametric model can be tested with an F test (see Fox 2000, pp. 34-35) . This test is formed by using the differences in the residual sum of squares (RSS) and degrees of freedom between two models: one that includes the indicator variable x ir and has df 1 degrees of freedom (the full model) and one that excludes the indicator variable x ir and has df 0 degrees of freedom. The test has the following form, where n refers to the sample size: The semiparametric sales response model in Eq. (1) was estimated for three leading detergent brands in the Netherlands. Due to confidentiality restrictions, the names of the brands cannot be provided. The dataset was kindly provided by The Nielsen Company and consists of weekly observations for all the SKUs included in each brand portfolio. For each SKU, Nielsen collects weekly scanning observations for all the variables included in model (1) using a representative sample of grocery stores in the Netherlands that also includes the major supermarket chains. Brand level aggregated data are derived through contemporaneous aggregation of the SKU level data. There are 120 weekly observations for each SKU in the dataset that cover the period from week 38 of 2002 to week 53 of 2004. In order to incorporate a weighted distribution measure in the proposed modelling framework, we have utilised the weighted distribution metric produced by the Retail Measurement Services of The Nielsen Company in the analysis. To calculate this metric, numeric distribution is weighted to the product category's turnover volume within stores (PCV). Similarly with numeric distribution, it is also expressed as a percentage and it is available for the same time periods and SKUs in the database. As a weighted distribution metric, it provides an indication of the importance of the stores (in terms of turnover) into which a brand or SKU is available for sale. This is an important consideration for marketing managers since brands and SKUs can be associated with high numeric distribution levels in the market but still be listed in stores that do not account for a large percentage of the particular product category's turnover. In such cases, the respective weighted distribution levels of the brands (or SKUs) will be low. As explained in the "Variability in distribution" section, the brand level model in Eq. (1) will not be estimated using aggregated brand level data that exhibit limited variability in their retail distribution. In contrast, the SKUs are used as cross-sectional units in a panel dataset since each brand portfolio encompasses several SKUs that differ in their distribution development over the period examined in the dataset (e.g., new vs. established SKUs). In this way, sufficient variability is achieved for the estimation of the partial regression function associated with the weighted distribution in model (1). All brand portfolios examined in this study include both new and established SKUs, which permits the examination of the sales-distribution relationship throughout the range of values associated with the weighted distribution. Table 1 provides descriptive statistics for the main variables associated with each brand. Each brand portfolio includes at least 8 different SKUs, and the minimum value of the weighted distribution variable in all cases is zero. This suggests the existence of new products in each brand portfolio (introduced in the market after week 38 of 2002) or the withdrawal of established SKUs from the market (before week 53 of 2004). Furthermore, in all cases, the maximum value of this variable exceeds the 83% distribution level, suggesting the existence of established and well distributed SKUs in the market. There is also considerable variability in prices (at the average SKU and weighted average brand levels), both within each brand portfolio (e.g., due to the existence of large and small packs) and across brands (e.g., brand 2 vs. brand 3). There are also considerable differences in terms of the average sales volume between the 3 brands and for each brand all the available types of promotional activities were used (feature, display, feature & display and special packs). It is therefore of interest to examine the impact of distribution changes on sales while controlling for the effect of the other marketing activities. For all brands, model (1) was estimated with the backfitting algorithm using a smoothing parameter of h = 0.1 in the local polynomial smoothing procedure of stage II. The estimation results are presented in Table 2 . In all cases, the model provided a good fit to the data, as demonstrated by the high (adjusted) R 2 values. As a result of their detailed marketing mix parameter specifications, sales response models like the SCAN*PRO model are frequently associated with high R 2 values (see for example Brissimis and Kioulafas 1987 , Andrews et al. 2008 and Kopalle et al. 1999 . All linear coefficients have the expected sign and in most cases are statistically significant at the 1% or 5% level. Similarly, the weighted distribution variable provided a statistically significant contribution in all models based on the results of the F test for the significance of the smoothing terms. The explanatory variables corresponding to nonstatistically significant coefficients were included in the final model specifications based on the F test for the joint significance of all coefficients. All brands are price elastic with brands 1 and 2 exhibiting the highest price sensitivity. Furthermore, in all cases the signs and magnitudes of the estimated own and cross-price elasticities are consistent with the average SCAN*PRO model parameter estimates reported by Leeflang et al. (2000) . With respect to the promotional variables, the results differ by brand. The "feature and display" promotional indicator ( D j it,3 ) provided the highest impact on the sales of brands 1 and 2, which is consistent with the SCAN*PRO model estimation results reported by Foekens et al. (1994) , while the indicator variable for "promotional packs" ( D j it,4 ) provided the highest impact on the sales of brand 3. All the coefficients in Table 2 are statistically significant, except from the "Display" coefficients ( D j it,1 ) for brands 2 and 3. The coefficient for the "feature and display" promotional variable ( D j it,3 ) is also statistically significant for brand 3, albeit at the 10% significance level. For brand 3, the "Promotion" variable ( D j it,4 ) which concerns the use of special promo packs has the highest coefficient. This specific promotional activity is also important for brands 1 and 2, since it provided the second highest coefficient among the four types of promotional activities included in the respective models. The weighted distribution ( WD ijt ) partial regression functions generated by the estimation procedure are presented in the three panels of Fig. 1 . The shaded areas indicate 95% confidence intervals for the estimates. Each graph demonstrates the sales volume development for different levels of the weighted distribution while holding the other marketing activities constant. These partial regression functions can be used to monitor the expansion of brand distribution networks and to identify the inclusion of less important stores in the retail network of a brand at higher levels of distribution coverage. For brand 1, the shape of the curve is slightly concave. Even though the weighted distribution function is approximately linear during the initial stages of its distribution development, there are indications of decreasing returns in its partial regression function when moving progressively to higher weighted distribution levels, eventually providing a slightly more concave response function compared to brand 2. For brand 2, the shape of the weighted distribution partial regression function suggests that the relationship with sales is mostly linear, even at higher weighted distribution levels. This type of positive sales effect tends to be observed when distribution expansions are associated constant within-store shares in the new handlers as explained in the "Literature review" section. With the introduction of a new brand into the market, an initial period of rapid distribution development starts with a high impact on sales. For established brands, a concave sales-distribution relationship at higher distribution levels suggests the presence of decreasing returns due to the addition of less important stores (in terms of sales volume) in the distribution network. In Fig. 1 , this is most evident for brand 3. Despite an increasing trend during the early stages of its distribution expansion, there are indications of decreasing returns in its partial regression function when moving to higher weighted distribution levels, thus giving rise to a concave weighted distribution function. A similar non-monotonic concave function was also reported in an empirical study by Nishida (2017) . This finding suggests that the SKUs of brand 3 could have benefitted from a better selection of stores as part of the brand's distribution network expansion at higher distribution levels, or from the introduction of in-store support activities. As noted in the "Literature review" section, the slightly concave shape of the weighted distribution partial regression function is also indicative of support activities by initial handlers with higher shares that were not matched by latter handlers. Consequently, a possible course of action for the marketing manager of brand 3 would be the introduction of support programs in later handlers, in order to increase the brand's sales returns. For cases similar to brand 3 above, store evaluations for inclusion to the brands' distribution networks can be improved when supported by indicators of the sales importance of the different outlets. Examples of such indicators include: the category's sales volume, the sales volumes of other similar product categories or store turnover information from a retail census database. Furthermore, engaging in channel practices like supporting retailers' investments (e.g. training of the sales personnel) and the introduction of in-store promotional activities can also increase the sales potential of the stores, towards levels that are more compatible with a linear shape of the sales-distribution relationship. Table 3 includes the estimation results for model (2) above that includes nonparametric functions for both the ownprice and the weighted distribution variables. It can be Table 2 , where only the weighted distribution variable is included in nonparametric form. The shapes of the partial regression functions generated by the estimation procedure are presented in the six panels of Fig. 2 . The own-price partial regression functions included in the first row provide some interesting insights. First, for all brands the estimated sales-price partial regression functions are highly nonlinear, which provides further support for the suitability of the semiparametric model specification. Second, in all cases a negative slope is observed that is consistent with economic theory. Third, for all brands there are indications of saturation effects in the shapes of the curves, since beyond a certain price reduction level (very large price discounts) the sales response is significantly reduced. Similar shapes and saturation effects were also reported by Van Heerde et al. (2001) and Hirche et al. (2021a) . The only exception is the lowest price range in the case of brand 1. It also worth mentioning that the lower price segment of the own-price partial regression function for brand 2 is associated with a particularly large confidence interval and should therefore be used with caution. With respect to the weighted distribution partial regression functions in the second row of Fig. 2 , it can be observed that the shapes of the sales-weighted distribution relationships are very similar to the ones presented in Fig. 1 (except from some small fluctuations in the response curve for brand 1) and the adjusted R 2 values are very similar to the ones derived from model (1) in Table 2 . The modelling framework proposed in this study provides marketing managers with two important insights: (i) estimation of the sales-distribution relationship for their brands and (ii) correct identification of the marketing drivers that can increase brand performance over and above the levels predicted by the sales-distribution relationship. In practical terms, these insights improve managerial understanding of "what to expect" if a distribution expansion is planned but also indicate which marketing tools are suitable for improving the performance of specific brands. These are key insights in the brand management process that can improve the effective allocation of marketing Fig. 2 Own price and weighted distribution partial regression functions resources. For example, the distribution levels of brand 2 can be increased further since the sales-distribution relationship is mostly linear without any strong indications of decreasing returns. A distribution expansion can also be combined with "feature and display" and "special pack" promotions, which provided the highest returns according to the model's results. Furthermore, due to the brand's high price sensitivity, any significant price per unit increases should be avoided. For brand 3 distribution expansions should be planned carefully, with due diligence in the store selection process since there are already indications of decreasing returns in its velocity curve. The same is true for brand 1. New store introductions should preferably be combined with in-store support programs that can include "special promo packs" and "displays" in the case of brand 3 and "features and displays" and "special promo packs" in the case of brand 1. Another possible course of action for the marketing managers of brands 1 and 3 would be to consider increasing demand with marketing activities first, before focusing on costly distribution expansions, and replace those SKUs that are not providing the expected distribution returns. Our analysis is also useful for retailers who are considering methods for expanding the range of their assortments, through the allocation of shelf space to new brands. In addition to the insights mentioned above, retailers can work closely with their suppliers in order to choose the right promotional support programs for their brands and carefully adjust their prices when this is needed. For example, for the three brands examined in this study price decreases tend to increase sales when the prices (per unit) are already in the "medium-to-high" range. In contrast, due to saturation effects large price reductions in the "very low-to-low" price per unit range will not tend to be equally effective. Furthermore, the brand specific nature of our proposed method enables comparisons between brands, which entails useful category management insights for retailers. Finally, for marketing managers considering the delisting of a product from a subset of its existing handling universe, the proposed model can provide indications of the likely impact on sales from the resulting distribution reduction. Consequently, our proposed modelling framework is also suitable for scenario analysis, by generating predictions regarding the possible market outcomes from changes in distribution, as well as, the price and promotional support activities of the brand portfolios. In this study, a semiparametric version of the SCAN*PRO model that can be used to evaluate the effectiveness of a brand's retail distribution network was proposed. The model is calibrated on panel data, where the cross-sectional units consist of all the SKUs included in a brand portfolio and the time period is weekly observations. By incorporating all the SKUs of a brand portfolio in a common panel dataset, sufficient variation is achieved that enables a nonparametric estimation of the sales-distribution relationship. Estimation is performed using the backfitting algorithm and commonly available scanning data, which produces least squares coefficient estimates for the price and promotional drivers and two-dimensional plots of the nonmonotonic relationship between the retail distribution coverage and sales. These plots demonstrate the nonlinearities that frequently characterize the sales-distribution relationship and facilitate visual inspection. Marketing managers can use the modelling framework proposed in this study to evaluate the distribution network of their brands and perform market response predictions for given levels of distribution. This framework also enables the identification of gaps in the distribution coverage while at the same time controlling for the effect of other marketing activities on sales. It can also be used for the joint evaluation of all the main marketing drivers that can increase performance over and above the levels predicted by the sales-distribution relationship and facilitates "whatif" scenario analyses, which are important for planning the efficient allocation of marketing resources. The two-dimensional plots of the sales-distribution relationship should be updated frequently for the main brands in a product category. This is because changes in the number and characteristics of SKUs in a brand portfolio and changes in the population and the locations of stores in a market can change the partial regression functions. The proposed model can also be estimated for specific market segments (e.g., channels or geographical areas) and can be used to compare the distribution expansion of brands produced by the same manufacturer. In evaluating the results, it is also useful to calculate the distribution gaps of all SKUs relative to their brand totals (brand weighted distribution-SKU weighted distribution) and the SKU sales per point of distribution (SKU sales volume/weighted distribution). This analysis enables the identification of the SKUs with high sales per point of distribution whose distribution network can therefore be expanded. It is also worth mentioning that an updated and thoroughly informed retail census database can provide valuable inputs in the distribution evaluation process. Apart from basic business demographics, this database must contain a sufficient number of variables that can be used as indicators of the size and importance of each retail store. Lehmann and Winer (2001, p. 179 ) provide some useful guidelines in this direction. A retail census database can also be combined with geographical information systems and econometric models to predict store performance. Retail distribution networks should be designed carefully with the proper identification of the important stores in a market prior to distribution expansion, and the impact of distribution changes on sales should be evaluated on a continuous basis. Further research should examine the application of our proposed semiparametric SCAN*PRO model in emerging markets where data are usually available with lower frequency (e.g. monthly or bi-monthly) and the structure of the retail trade is different (e.g. higher share of full-service stores). A nonparametric density estimation method for brand choice using scanner data Optimizable and implementable aggregate response modelling for marketing decision support Fixed effects regression models Estimating the SCAN*PRO model of store sales: HB, FM or just OLS? Intern Seven rules of international distribution An analysis of advertising and distribution effectiveness The emergence of market structure in new repeat-purchase categories: The interplay of market share and distribution Distribution intensity and new car choice Online shopping adoption during COVID-19 and social isolation: Extending the UTAUT model with herd behaviour The relationship between distribution and market share A comparison and an exploration of the forecasting accuracy of a loglinear model at different levels of aggregation Multiple and generalized nonparamatric regression Applied regression analysis and generalized linear models Determinants of distribution intensity The effects of retail distribution on sales of alcoholic beverages Estimating time-varying parameters in brand choice models: A semiparametric approach Distribution effectiveness through full-and self-service channels under economic fluctuations in an emerging market Market response models: Econometric and time series analysis Generalized additive models Predicting Under-and overperforming SKUs within the distribution-market share relationship SKU performance and distribution: A large-scale analysis of the role of product characteristics with store scanner data Market share analysis using semi-parametric attraction models Incorporating distribution into new product diffusion models Estimating irregular price effects: A stochastic spline regression approach The Dynamic effect of discounting on sales: Empirical analysis and normative pricing implications Demand and distribution relationships in the ready-to-drink iced tea market: A graphical approach The lead-lag puzzle of demand and distribution: A graphical method applied to movies Building Models for Marketing Decisions Analysis for marketing planning Marketing models Using daily store-level data to understand price promotion effects in a semiparametric regression model Regression analysis of marketing time series: A wavelet approach with some frequency domain insights A wavelet smoothing method to improve conditional sales forecasting First-mover advantage through distribution: A decomposition approach Competing during a pandemic? Retailers' ups and downs during the COVID-19 outbreak Changes in consumption patterns during the COVID-19 pandemic: Analyzing the revenge spending motivations of different emotional groups Market share and distribution: A generalization, a speculation and some implications Flexible regression Modeling emergingmarket firms' competitive retail distribution strategies The short and long-term impact of an assortment reduction on category sales Flexible estimation of price response function using retail scanner data Biases in demand analysis due to variation in retail distribution Non-and semiparametric regression model Non-and semiparametric regression models Semi-parametric analysis to estimate the deal effect curve How promotions work: SCAN*PRO-based evolutionary model building The effects of advertising, prices and distribution on market share volatility Distribution and market share SCAN*PRO: The estimation, validation, and use of promotional effects based on scanner data Acknowledgements The author would like to thank Barbara Van De Kerke for invaluable help in understanding the data used in this study and three anonymous reviewers for helpful comments and suggestions. Conflict of interest There is no conflict of interest.Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Antonis A. Michis is an economist at the Central Bank of Cyprus. He has substantial research and professional working experience which also includes positions at the Nielsen Company and Grohe AG. His previous teaching appointments as a visiting lecturer include the University of Cyprus and the Cyprus University of Technology. He has published research in refereed academic journals in the fields of banking, finance, empirical industrial organization and marketing science.