key: cord-1039920-1snkd2sa authors: Liu, Jue; Pang, Zhan; Qi, Linggang title: Dynamic Pricing and Inventory Management with Demand Learning: A Bayesian Approach date: 2020-08-18 journal: Comput Oper Res DOI: 10.1016/j.cor.2020.105078 sha: 1b26f997df8cd9690600540e8344f56c1f1bcfd6 doc_id: 1039920 cord_uid: 1snkd2sa We consider a retail firm selling a durable product in a volatile market where the demand is price-sensitive and random but its distribution is unknown. The firm dynamically replenishes inventory and adjusts prices over time and learns about the demand distribution. Assuming that the demand model is of the multiplicative form and unmet demand is partially backlogged, we take the empirical Bayesian approach to formulate the problem as a stochastic dynamic program. We first identify a set of regularity conditions on demand models and show that the state-dependent base-stock list-price policy is optimal. We next employ the dimensionality reduction approach to separate the scale factor that captures observed demand information from the optimal profit function, which yields a normalized dynamic program that is more tractable. We also analyze the effect of demand learning on the optimal policy using the system without Bayesian update as a benchmark. We further extend our analysis to the case with unobserved lost sales and the case with additive demand. Firms operating in competitive markets often face significant volatility in demand and rapidly changing market environments. The on-going public health crisis of the COVID-19 pandemic is not only threatening the lives of human beings but also disrupting global supply chains and destroying the economy, exerting sudden and unprecedented challenges on business operations. Under various social distancing measures and regulations, consumers have to change their consumption and purchase behavior to adapt to the new situation. With little experience in such an extreme crisis, firms have to respond to the changing market conditions by dynamically adjusting their operational strategies while collecting data to learn the uncertainty. Pricing has been used as one of the most important operational levers to navigate the COVID-19 crisis (Abdelnour et al. 2020) . In some sectors, from groceries to medical supplies, demand has spiked to highest-ever levels, leading to significant shortages and putting upward pressures on prices, whereas in other sectors, from air travel to durable goods like automobiles, sharp drops in demand due to the reduced shopping behavior under social distancing measures led to excess inventories and drove down prices. The volatility of prices on Amazon, both in terms of frequency and magnitude of price adjustments, has been increased significantly since the outbreak of COVID-19 (Burns 2020) . For example, the price of a five-pound bag of Nishiki medium grain rice that was at around $10 has been changed up and down, hitting a peak of $59.99 on March 21 and then dropping to $20 on April 24 (Harrison et al. 2020 ). The advances of information technologies and the dramatic growth of e-commerce have enabled firms to readily adopt the dynamic pricing strategies. Last two decades have witnessed the success of dynamic pricing in both hospitality industries (e.g., airline, hotel and car rental) and retailing Elmaghraby and Keskinocak 2003 and Talluri and van Ryzin 2004) . Manufacturers such as Ford and Dell Computer have also integrated dynamic pricing with production and distribution strategies to improve supply chain efficiency and profitability (Biller et al. 2005) . Amazon developed analytical tools that help sellers use dynamic pricing on Amazon marketplace where sellers can set pricing rules based on the Buy Box and inventory availability to automate pricing decisions. It was reported that Amazon changed product prices 2.5 million times a day with an average listed product changing prices every 10 minutes, fifty times more often than Walmart and Best Buy (Mehta et al. 2018) . Dynamic pricing, however, is a double-edged sword which, if not managed properly, can hurt customers and ruin the reputation. When Amazon first tested its dynamic pricing strategy on DVDs in 2000, customers who found out they were charged more than others for the same products got angry, forcing Amazon to apologize and issue refunds ( ABC News 2000) . More recently, Amazon's dynamic pricing algorithm drove the price of a science textbook about fly from about $100 up to $24 million (Solon 2011) . In 2011, the British largest grocery retailer, Tesco, failed to revive its sales despite spending £500 million on price cuts and as a result its CEO quit (BBC 2012) . The volatile and rapidly changing market environments require firms to actively learn the new demand while effectively integrating their pricing and inventory management strategies with demand learning. Bayesian decision theory provides a powerful statistical decision framework to combine learning and dynamic decision making (DeGroot 1970) . In this paper, we adopt the Bayesian approach to address this problem. More specifically, we study a single-product periodicreview inventory system with the objective of maximizing expected total discounted profit over a finite horizon. Ordering costs are proportional with order sizes, while inventory holding and stockout costs all depend on the end-of-period inventory level and shortfall. For generality, we allow partial backordering which embraces both the backordering case and the lost sales case. Demands in consecutive periods are independent and sensitive to prices. The demand model is specified as a multiplicative model which is specified as a product of a deterministic price-dependent function and a random perturbation. The deterministic component captures the demand dependence on the price with its curvature determining the price elasticity of demand. It can be constructed through the price-dependent purchasing probability under the willing-to-pay (WTP) distribution. In market research, the WTP distribution is typically estimated through pricing experiments (Bodea and Ferguson 2012) . For simplicity, we assume that the price-dependent component is known. The random perturbation represents the market size of which the distribution depends on an unknown parameter. The multiplicative form of the demand model has an appealing economic interpretation: The market size captures the maximum market potential and the price-dependent purchase probability captures the market share at the price (Bodea and Ferguson 2012) . We assume that the firm forms the prior belief on the distribution of the unknown parameter and updates its belief over time using the accumulative demand data according to the Bayes rule. We first formulate this problem as a Bayesian dynamic program with a two-dimensional state space, one for inventory level and the other one for a sufficient statistic which consists of historical demand information. The sufficient statistic can be either the time series of historical sales data or the updated demand distribution based on these sales data. We identify a set of regularity conditions on the demand model which allow us to show that the expected truncated demand is jointly concave and supermodular in the order-up-to level decision and the expected demand level. These desired properties ensure the optimality of a state-dependent base-stock list-price policy. We then employ another set of conditions from the Bayesian inventory management literature to decompose the optimal profit-to-go function into two parts: a scale factor that depends on the demands accumulated over time and a profit-to-go function of a normalized model that is independent of the market size. The optimal order-up-to level is also decomposed as a product of the scale factor and the optimal order-up-to level of the normalized model, while the optimal pricing decision depends on the scaled inventory level. This approach reduces the dimensionality of the problem, which allows us to design efficient algorithms to compute the optimal policy. To investigate the effect of demand learning on the optimal policy, we compare the optimal policy under demand learning to that without demand learning. In particular, we show that a myopic base-stock list-price policy is optimal for the stationary system without demand learning. Using the myopic policy as a benchmark, we show that demand learning may lead to lower future prices but the effect of learning on future inventory decisions depend on the realizations of random demand. Using a numerical example, we show that demand learning may lead to a lower basestock level and lower optimal prices for high inventory levels in the current decision period. We also discuss the effect of demand variation on the optimal policy under demand learning. We further discuss two extensions. The first extension is to generalize the model with complete demand information to the model with unobserved lost sales (i.e., censored demand). We show that the dimensionality reduction approach also applies. The second extension is to address the case with additive demand. We show that the optimality of base-stock list-price policy remains true under the regularity conditions but the dimensionality reduction approach no longer applies. Our contributions are threefold. First, we bridge the inventory-pricing literature and the Bayesian inventory management literature to integrate the dynamic pricing and inventory control with demand learning in the Bayesian framework. Second, our model allows partial backordering which includes the complete backordering and lost-sales cases as special cases. There are limited results on the multiplicative demand in the lost-sales case in the literature. We add to this literature by identifying a set of demand regularity conditions that ensure the concavity of the expected sales revenue and hence the optimality of the state-dependent base-stock list-price policy. Third, we show that the dimensionality reduction approach applies to the Bayesian inventory-pricing model under the multiplicative demand, which simplifies the algorithm design and sheds new light into the dynamic pricing and inventory management problem. Furthermore, we extend the dimensionality reduction analysis to the case with unobserved lost sales. The remainder of this paper is organized as follows. Section 2 reviews the related literature. Section 3 formulates the problem as a Bayesian dynamic program. Section 4 identifies the regularity conditions and characterizes the structure of the optimal policy. Section 5 introduces the dimensionality reduction approach. Section 6 discusses the effect of learning. Section 7 discusses two extensions: unobserved lost sales and additive demand. Section 8 summarizes the conclusions. All technical proofs are in the appendix. Our paper is related to two streams of dynamic inventory management literature: dynamic inventory-pricing problem and Bayesian inventory management. There is a growing body of literature on dynamic pricing and inventory control problems; see Chen and Smichi-Levi (2012) for a review of recent developments. Most of inventor-pricing control models make the assumption that the demand distribution is known with certainty, which often leads to simple forms of policy structure that is relatively easy to calculate. For example, in a periodic-review backlog model without fixed order cost, Federgruen and Heching (1999) show that the base-stock list-price policy is optimal. That is, there is a (state-independent) base stock level and list price combination such that if the inventory level is below the base stock level, then an order is placed to increase the inventory level to the base stock level and the list price is charged; otherwise, nothing is ordered and an (inventory sensitive) price discount is offered. Their model is generalized by Chen and Smichi-Levi (2004) to the case with fixed cost and backorders, by Song et al. (2009) and Pang (2011) to the case with fixed order cost and lost sales, by Pang et al. (2012) to the case with positive lead times, and by Chen et al. (2014) to perishable inventory systems for both backlog and lost-sales cases. We generalize this literature by incorporating demand learning with inventory-pricing decisions, accommodating both lost-sales and backlogging cases. It is notable that to our knowledge the only paper in this literature that addresses multiplicative demand and lost-sales is Song et al. (2009) . Their analysis, however, is restricted to stationary settings while our model is nonstationary due to the nature of information update in learning. Bayesian dynamic inventory models incorporate demand learning with the inventory decisions in the Bayesian framework. A Bayesian dynamic inventory problem can usually be modeled as a dynamic program with a multi-dimensional state space (see, e.g., Scarf 1959) . However, due to the curse of dimensionality, the computation of such a dynamic program is usually very difficult if not intractable. Assuming demand distribution is in the exponential family, Scarf (1960) introduces a state space reduction approach and shows that the solution of the Bayesian model with unknown demand distribution parameters can be easily obtained by solving a non-Bayesian model with known demand distribution parameters. Azoury (1985) extends Scarf's results to the class of demand distributions including uniform, Weibull, and Gamma distributions. Miller (1986) shows that the reduction approach can be used in exponentially smoothed forecasts. Lovejoy (1990) provides a unified framework within which both the Bayesian model and exponential smooth model fall, and shows that myopic policies are optimal under some conditions. The key idea of the state reduction approach is to identify a class of predictive demand intensities which are scalable (see, e.g., §10 of Porteus 2002) and then to show that the value functions can be expressed by a product of a value function of a normalized problem and a scaling function of the sufficient statistics. A common feature of all their models is that the demand is exogenous and only the inventory decision is considered. We generalize this literature by considering the joint pricing and inventory decisions. Our paper attempts to generalize the Bayesian inventory models by incorporating pricing decisions, bridging the two streams of literature. There are a few papers along the same line. Subrahmanyan and Shoemaker (1996) propose a Bayesian dynamic program formulation to address the joint inventory and pricing decision problem in general retail situations but they do not provide structural analysis. Petruzzi and Dada (2002) study a setting with a deterministic demand process which is unknown initially and will be revealed as long as there is leftover inventory. Considering an additive demand model with backorders, Zhang and Chen (2006) also formulate a Bayesian dynamic program and characterize the optimal policy. Their model, however, is not scalable due to the additive demand and hence the dimensionality reduction approach does not apply. Different from these models, we focus on multiplicative demand and allow partial backordering including backordering and lost-sales cases as special cases. We identify regularity conditions under which the state-dependent base-stock list-price policy is optimal and apply the dimensionality reduction approach to decompose the profit-to-go function and ordering solution as products of the sale factor and a normalized profit-to-go function and ordering solution, simplifying the analysis and providing new insights into the dynamic pricing and inventory replenishment decisions under uncertainty. Also related is the literature on joint demand learning and pricing problems; see den Boer (2015) for a comprehensive overview of this area. In particular, a stream of literature considers the design of pricing experimentations to minimize the regret that is defined as the gap between the expected revenue achieved by the clairvoyant and the joint learning and pricing policy (see, e.g., Cheung et al. 2017 ). These models do not address the inventory replenishment decisions and their analysis does not seem to readily apply to the joint pricing and inventory control models. Consider a firm selling a nonperishable product in a new market with price-sensitive random demand. Without knowing the demand distributions, the firm aims to integrate demand learning with the pricing and replenishment decisions to maximize the expected total discounted profit over a finite planning horizon of T periods, indexed by t = 1, ..., T , with a discount factor α ∈ (0, 1]. The sequence of events is as follows. At the beginning of each period, a belief on the demand distribution is drawn based on the past demand information, and a simultaneous decision is made regarding the size of a new replenishment order (if any) and the selling price. For each ordering decision, a variable ordering cost c is incurred for each unit of inventory and the replenishment leadtime is zero. The objective of the system is to maximize the total discounted profit over the T periods with a discount factor α ∈ (0, 1]. Let p t ∈ P t be the retail price in period t where P t := [p t , p t ] with p t ≥ c where c is the unit ordering cost. The demand function is assumed to be of the multiplicative is a deterministic real function of p and t is a continuous nonnegative random perturbation factor with an unknown distribution. The random perturbation t captures the magnitude of the market size and is referred to as the demand scale factor. Following the Bayesian inventory literature (see, e.g., Azoury 1985 , Lariviere and Porteus 1999 , Petruzzi and Dada 2002 and the references therein), we assume that the random perturbations are independently and identically distributed over time. The price-dependent function d t (p) models the customers' sensitivity to price, and it is often expressed as the purchase probability at price p, P r(W t ≥ p), where W t is customer willingto-pay (WTP). The WTP distribution is typically learned through econometric analysis or pricing experiments (Bodea and Ferguson 2012) . For simplicity, we assume that the WTP distribution, or equivalently, d t (p), is known and we will focus on learning the market size in this paper. Note that d t (p) can be time varying, which captures the time dependence of the demand model. The multiplicative random perturbation is commonly adopted in the inventory-pricing literature (see, e.g., Karlin and Carr 1962 , Petruzzi and Dada 1999 , Elmaghraby and Keskinocak 2003 , Chen and Smichi-Levi 2004 , 2012 , Aydin and Porteus 2008 , Granot and Yin 2008 , Song et al. 2009 and references therein) as well as the revenue management literature (see, e.g., Gallego and van Ryzin 1994 , Talluri and van Ryzin 2004 , Araman and Caldentey 2009 and references therein). The multiplicative randomness implies that the coefficient of variation of the demand is constant (Aydin and Porteus 2008) . As shown in Bodea and Ferguson (2012) , a convenient way to empirically estimate the multiplicative model is to regress the natural logarithm of the sales against the price or its transformation, which ensures the estimated demand model to be nonnegative. Another common demand model in the literature is of the additive form D t (p) = d t (p) + t (see, e.g., Petruzzi and Dada 1999 , Chen and Smichi-Levi 2004 , Pang 2011 . As to be shown later, the multiplicative demand provides scalability for the system with a family of conjugate distribution pairs, which allows us to provide a sharp characterization of the optimal policy whereas the additive demand does not has the scalability. Hence, we restrict our analysis to multiplicative demand in our main analysis and discuss the case with additive demand in §7.2. The price elasticity of the multiplicative demand is defined asẽ t (p) = −pd t (p)/d t (p) which can also be equivalently expressed as the derivative of the natural logarithm of the demand with respect to the natural logarithm of the price, i.e.,ẽ t (p) = − d d ln(p) ln(d t (p)). In particular, if the d t (p) is of the exponential form, i.e., d t (p) = e a t −b t p with a t ≥ 0 and b t > 0, thenẽ t (p) = b t , i.e., the price elasticity is a constant, and such a demand model is also called iso-elastic or constant-elasticity demand which is commonly used in practice to empirically estimate the average demand elasticity through a linear regression for ln(D t ) against ln(p t ) (Talluri and van Ryzin 2004, Bodea and Ferguson 2012) . Assume that d t (p) is strictly decreasing in p, which implies choosing a price p ∈ P t is equivalent Note that although in theory the price can be set to infinity at which the demand reduces to zero, in practice firms rarely take such an extreme action in retailing. Note that if E[ t ] = 1 then d t (p) is equal to the expected demand level under the price p. For convenience, we call d t (p) the expected demand level in the following analysis. Letting P t (d) be the inverse demand function, the price elasticity of demand is re-expressed in terms of d as e t (d) = − P t (d) P t (d)d . To learn the distribution of the random perturbation, the firm adopts a Bayesian framework (see §10 of Porteus 2002 and the references therein) to assume that the probability density function of t is of the form ψ(·|ω) with an unknown parameter ω. Let s t denote the sufficient statistic for the sample 1 , 2 , · · · , t−1 of the past demand observations at the beginning of period t. The dynamics of the s t are denoted by s t+1 = s t • t . Let g(ω|s) represent the prior probability density function of the unknown parameter ω, which represents the belief about the underlying distribution of the t for any given statistic s in period t. We assume that g(ω|s) is chosen from a conjugate family of distributions such that the posterior distribution of ω (after updating using Bayes' rule based on the sufficient statistic and the new demand observation) has the same distributional form as the prior. Then, given a statistic s, the predictive demand density of period t is given by By Bayes' rule, conditioned on the values of s and , the posterior density on ω is The conjugate prior assumption implies that g(ω|s • ) has the same functional form of g(ω|s); see Porteus (2002, Chapter 10) for more discussions on the family of conjugate prior distributions. Let x t be the inventory level at the beginning of period t, just before placing an order, and y t be the inventory level after placing an order (i.e., order-up-to level) with y t ≥ x t . For each positive order, only the variable ordering cost c is incurred for each unit of inventory and the replenishment leadtime is zero. At the beginning of each period t, the firm reviews its inventory level x t and statistic s t and then decides the order quantity (y t − x t ) + and the price level p ∈ P t (or equivalently, the expected demand level d ∈ D t ). The demand D t , or equivalently, t , is realized by the end of this period. The excess inventory is carried over to the next period while the unmet demand may be partially backlogged or lost. The dynamics of inventory levels are That is, if D t > y t , a fraction λ of the unmet demand is backlogged while the others are lost. This treatment encompasses both the full backordering case (λ = 1) and lost-sales case (λ = 0). In particular, the lost-sales case is more common in retailing of consumer products. Clearly, δ is a piece-wise increasing convex function. Note that δ(kx) = kδ(x) for any real value x and constant k > 0, i.e., δ is homogenous of degree one. LetL(x) := h + x + + h − x − denote the inventory holding cost, where h + and h − represent the unit holding and shortage costs, respectively. In addition, we assume that h − > (1 − α)c and h − ≥ αc, which implies that the shortage (backlog) cost is greater than the cost saving from delaying the purchasing to the next period, which removes the motivation to intentionally delay the purchase in face of the shortage. Hence, it suffices to require that the order-up-to level y t is nonnegative. The sales of each period, denoted by γ(D t , y t ), is constrained (or truncated) by the inventory order-up-to level y t . We assume that all the lost sales are observable. The case with unobservable lost sales, i.e., censored demand, will be discussed later. Then we have where min(D t , y t ) is the sales truncated by the inventory level y t . It is easy to verify that γ(D t , y t ) is also homogenous of degree one with respect to (D t , y t ). The problem can be formulated into a dynamic program with two state variables, (x, s), where x is the inventory level before ordering and s is the sufficient statistic up to current period. Let f t (x|s) denote the maximum expected present value of the profit over periods t through T starting with inventory level x and with a sufficient statistic s t at the beginning of period t. The Bellman equations, for t = 1, · · · , T , are: Then we have f T +1 (x|s) = 0 and where is convex and homogenous of degree one. Apparently, f t (x|s) is a non-increasing function of x as the constraint set of the maximization problem (2) shrinks as x increases and its objective function is independent of x. Let y * This section characterizes the structure of the optimal policy. It know that the base-stock list-price policy is optimal for the backordering case (Federgruen and Heching 1999 and Smichi-Levi 2004) and for the lost-sales case with additive demand (Pang 2011) . A sufficient condition for the optimality of the base-stock list-price policy is that the expected revenue function R t (y, d|s) is jointly concave in (y, d) and that f t (x|s) is a non-increasing and concave function of x. Pang We first identify conditions under which the expected revenue P t (d)d is concave in d. The first condition, proposed by Chen and Smichi-Levi (2004) , is that P t (d)d is concave in d: The economic interpretation of condition (A1) is that the marginal revenue contribution of the demand decreases in the demand level. The higher the demand level the less the marginal revenue contribution of increasing the expected demand level (i.e., reducing the price). An alternative condition, as identified by Pang (2011) , is Clearly, (A2) implies that the expected revenue P t (d)d is strictly concave in d, which is stronger than (A1), since P t (d) is strictly decreasing in d. We next verify conditions (A1) and (A2), respectively, for some common demand models. The results are summarized in Table 1 . See Pang (2011) and Chen et al. (2014) for a similar analysis for additive demand. Table 1 Structural Table 1 shows that linear, log, logit and exponential models above all satisfy (A1) and (A2). For the iso-elastic demand, (A1) is satisfied if b ≥ 1 while (A2) is always violated. Note that d t (p) and d t (p)p are both convex in p for the iso-elastic demand with b > 1 and hence the analysis of Federgruen and Heching (1999) and Kocabiyikoǧlu and Popescu (2011) who require that d t (p) is concave in p does not apply to the iso-elastic demand. For the power model, In the backlog case, the demand condition (A1) or (A2) ensures the optimality of base-stock list-price policy (see, e.g., Chen and Smichi-Levi 2004) . However, in the lost-sales case, (A1) or (A2) alone cannot guarantee the joint concavity of the expected truncated revenue P t (d)E[min(d t , y)] in terms of d and y. For this reason, Federgruen and Heching (1999) argue that the base-stock list-price policy may fail to be optimal in the lost-sales setting. For the lost-sales case with additive demand holds and the price elasticity of the lost-sales probability, defined asẽ L (p) = − dP r(D t ≥y) dp p P r(D t ≥y) , is greater than one. Note that the notation of the price elasticity of the lost-sales probability was first proposed by Kocabiyikoǧlu and Popescu (2011) in a newsvendor setting to identify the conditions under which the expected profit is jointly concave in price and ordering decisions. Pang (2011) extends their analysis to dynamic models with additive demand and fixed ordering cost and identify conditions under which an (s, S, p) policy is optimal. We next extend the analysis of Pang (2011) to the case with multiplicative demand in the Bayesian learning model. With the multiplicative demand D t = d t (p) t , the probability of lost sales at an inventory level y and expected demand level d isΦ t (y/d|s). Then the price elasticity of lost-sales probability (in terms of p) is where ρ t (x|s) = xφ t (x|s) Φ t (x|s) is the generalized hazard rate. Using the inverse function P t (d), the price elasticity of lost-sales probability can be expressed in terms of d as Note that ρ t (y/d t (p)|s) = 0 for y = 0, which implies that the price elasticity of lost-sales probability does not have a positive lower bound for any value y. That is, the analyses in Pang (2011) bound of y * t (s) under the optimal policy, which implies that it suffices to restrict the order-up-to level decision to the value above such a lower bound. Observe that the single-period expected profit π t (y, d|s) is concave in y for any d and s. Taking the first derivative of π t (y, d|s) with respect to y yields Then the maximizer of π t (y, d|s) isŷ t (d|s) = dẐ t (d|s) wherê The monotonicity of P t (d) implies thatẐ t (d|s) is strictly decreasing in d ∈ D t for any λ < 1. Define u t = min k≥t y k where y k is a uniform lower bound ofŷ t (d|s) for any d and s, defined as The following lemma shows that u t is indeed a lower bound for the optimal order-up-to level y * t (s). Lemma 1 (Bound). For all t and s, the optimal order-up-to level y * t (s) is greater than u t . Lemma 1 implies that it suffices to restrict the optimal order-up-to level decision to values above u t . We are now ready to introduce the additional demand regularity conditions: Conditions (B1) and (B2) imply that the price elasticity of the lost-sales probability are greater than one and two, respectively. Clearly, (B2) is stronger than (B1) as the former requires a greater price elasticity of the lost-sales probability. Since u t is a uniform lower bound of the optimal orderup-to level y * t (s), conditions (B1) and (B2) apply to distributions with domains containing zero (e.g., Gamma distribution or Weibull distribution). Note that Since h − ≥ αc, we have ∂ŷ t (d|s) ∂d > 0 if (B1) or (B2) holds, which implies that y t = inf{ŷ t (d t |s)|∀s}. Remark 1. In the newsvendor setting, Kocabiyikoǧlu and Popescu (2011) show that if ρ t (y/d t (p)|s)ẽ t (p) ≥ 1, then the expected profit function is jointly concave and submodular in (y, p). However, they do not impose any constraint on the value of y. As discussed above, ρ t (y/d t (p)|s) approaches zero when y approaches zero, which implies that when y is sufficiently small, the con- Our analysis fixes this problem by identifying the lower bound of the optimal order-up-to level. We are now ready to employ the regularity conditions to characterize the structure of the optimal policy. The following theorem extends the analyses of Federgruen and Heching (1999) and Pang (2011) to the Bayesian formulation with multiplicative demand. Theorem 1 (Base-stock list-price policy). Suppose one of the following two pairs of conditions hold: (i) (A1) and (B2), (ii) (A2) and (B1) for all t and s > 0. Then a state-dependent base-stock list-price policy is optimal: for any t and s, if the starting inventory level x < y * t (s) then it is optimal to order up to the base stock level y * t (s) and set the list price as p * t (s); if, otherwise, x ≥ y * t (s) then it is optimal to place no order and set the price as p * t (x|s). Moreover, when x ≥ y * t (s), the optimal price p * t (x|s) is nonincreasing in x and satisfies p * t (x|s) ≤ p * t (s). Theorem 1 proves the optimality of the intuitive base-stock list-price policy structure under Bayes update when the unmet demand is partially backlogged. The analysis for the model with multiplicative demand and lost sales appears to be much more complicated than that of Federgruen and Heching (1999) and Pang (2011) . Regularity conditions (i) and (ii) guarantee the joint concavity and supermodularity of the expected single-period profit π t (y, d). These properties in turn ensure the optimality of the basestock list-price policies and the monotonicity of the pricing decision. Note that (A1) is weaker than (A2) but (B2) is stronger than (B1). So (i) is neither more general nor more restrictive than (ii). The dependence on the statistic s captures the effect of demand learning from historical data. Intuitively, everything else being the same, the greater the demands realized in the previous periods, the higher the expected demand in the future and therefore the more to order. In particular, when λ = 1, the model reduces to the fully backordering case. Following Chen and Smichi-Levi (2004), it suffices to impose the condition (A1) to ensure the optimality of the base-stock and list-price policy. Theorem 2 (Backordering Case). Suppose (A1) holds for all t and s > 0. Then a statedependent base-stock list-price policy is optimal. Remark 2. Our analysis uses the expected demand level and the order-up-to level as decision variables whereas Federgruen and Heching (1999) treat the price decision variable directly. Their analysis requires two conditions (a) d t (p) is concave in p and (b) the expected holding/shortage cost E[L(y − D t (p))] is jointly concave in (p, y). Clearly, these two conditions are satisfied when d t (p) is linear. However, it is easy to verify that d t (p) is convex in p for the exponential and isoelastic demand models. More importantly, it remains a challenge to show whether (b) holds for more general nonlinar demand models. Therefore, it is more convenient to work on the inverse demand function instead of treating the price variable directly. Remark 4. The expected single-period profit in (2) can be alternatively expressed Computing the optimal policies according to the Bayesian dynamic program (2) involves not only the inventory level but also the information state s which could be multiple dimensions, which may require an extensive computational effort. Scarf (1960) and Azoury (1985) identify a class of scalable demand distributions and develop conditions under which the dimensionality of the problem is reduced: The solution of the original problem can be obtained from the solution of a much simpler normalized problem that is independent of the scale (size) of the market. Following Azoury (1985) , we impose two conditions. (C1) There exists a positive real-valued function q t (s) of the sufficient statistic s which is a scale parameter for φ t ( t |s) such that φ t ( |s) = (1/q t (s))φ t ( /q t (s)), where φ t (·) is a known probability density function for period t which is independent of s, and (C2) the function q t+1 (·) satisfies where U t+1 is a known continuous function and satisfies The following examples satisfy conditions (C1) and (C2). Example 1 (Gamma-Gamma Model). Suppose t follows a Gamma distribution with where k (> 0) is a known shape parameter and ω (> 0) is the unknown positive scale parameter. The Gamma distribution reduces to exponential distribution with rate ω if the shape parameter k = 1. The mean and variance of the Gamma distribution are k/ω and k/ω 2 , respectively, so the coefficient of variation is 1/k which depends only on the shape parameter. Hence, assuming that the scale parameter is unknown implies that the mean demand is unknown. The prior on ω also has a Gamma distribution with shape parameter a t > 0 and scale parameter Λ t where the sufficient static s t = (a t , Λ t ). Correspondingly, the mean and variance of the prior distribution are a t /Λ t and a t /Λ 2 t , respectively. For t > 0 the predictive density is given by φ t ( t |s t ) = Γ(a t +k)Λ a t t k−1 t Γ(k)Γ(a t )(Λ t + t ) a t +k with s t+1 = (a t + k, Λ t + t ). This implies that, given any initial sufficient statistic s 1 = (a 1 , Λ 0 ), we have a t = a 1 + (t − 1)k and Λ t = Λ 1 + t−1 τ =1 τ for all t ≥ 1. Here a t and Λ t are the shape and scale parameters of the predictive distribution respectively. In addition, (C1) implies that φ t ( t ) = φ t ( t |a t , 1). In particular, for k = 1 and a t > 1, we have E[ t ] = Λ t a t −1 . As shown by Scarf (1960) , the Gamma-Gamma problem is scalable with q t (s t ) = Λ t and U t+1 (x) = 1 + x. Example 2 (Weibull-Gamma Model). If t follows a Weibull distribution with ψ( t |ω) = ωk k−1 t e −ω k t , where k (> 0) is a known shape parameter and ω (> 0) is the unknown positive scale parameter. The Weibull distribution reduces to exponential distribution with rate ω if the shape parameter k = 1. The mean and variance of the Weilbull distribution are ω −1/k Γ(1 + 1/k) and ω −2/k [Γ(1 + 2/k) − (Γ(1 + 1/k)) 2 ], respectively, so the coefficient of variation [Γ(1 + 2/k)/(Γ(1 + 1/k)) 2 − 1] which depends only on the shape parameter k. If the prior on ω has a Gamma distribution with shape parameter a t > 0 and scale parameter Λ t where s t = (a t , Λ t ) is the sufficient static, then the predictive density is given, and s t+1 = (a t + 1, Λ t + k t ). This implies that, given any initial sufficient statistic s 1 = (a 1 , Λ 1 ), we have a t = a 1 + t − 1 and Λ (Azoury 1985) . Lariviere and Porteus (1999) note that the predictive demand distribution is (first-order) stochastically decreasing in the shape parameter a t and stochastically increasing in the scale parameter Λ t . The posterior density functions indicate that the shape parameter increases by one after each observation and can be illustrated as the accumulate number of observations. Since the coefficient of variation of both Weibull and Gamma distributions is strictly decreasing in the shape parameter, the relative uncertainty about the scale parameter decreases as the number of observations increases with the limit of the exact value of the scale parameter. The scale parameter of the prior also increases each period by the market size observation raised to kth power for the Weibull prior and by the market size observation for the Gamma prior. For the Gamma-Gamma model, the limit for the mean of the prior is then lim t→∞ a t /Λ t = 1/(lim t→∞ t i=1 i /t) which is equal to true scale parameter (with probability one) of the demand distribution. Note that t i=1 i /t is an unbiased estimator of the prior mean. We are now ready to apply the dimensionality reduction approach by applying conditions (C1) and (C2). The key idea is to first solve a normalized problem with nonstationary known demand distributions in which the scale parameter is set to one and then obtain the solution to the original problem by incorporating the scale parameter into the solution of the normalized problem. Let v T +1 (x) = 0. Define functions v t recursively, for t = 1, · · · , T : The above equalities define a normalized system where the scale parameter is always one. The underlying distributions of the normalized model can also be interpreted as the distribution of the unknown parameter of the prior is always updated as if the observation of the market size t were always zero and hence the scale parameter, which started from one at the beginning of the planning horizon, remains unchanged over time though the number of observations was increased. Let y t (x) and d t (x) (and p t (x) = P t (d t (x))) be the optimal solutions of the normalized system. Letŷ t andd t be the corresponding global maximizers of Π t (y, d) (andp t = P t (d t )). The following theorem shows the relationships between f t (·|s) and v t (·). Theorem 3 (Dimensionality Reduction). If conditions (C1) and (C2) are satisfied, then for all t, x and s x/q t (s)) and p * t (x|s) = min(p t , p * t (x/q t (s))). In particular, y * t (s) = q t (s)ŷ t and p * t (s) =p t . Theorem 3 shows that the optimal value functions to the original problem are equal to the product of the normalized profit functions and the scale function q t (s). In other words, the value function of the original problem is decomposed into two factors where the scale function q t (s) represents the scale (size) of the market and the normalized profit function v t allows us to focus on the normalized market demand. Such a separation leads to a much simpler model. The firm only needs to derive the optimal replenishment and pricing decisions based on the normalized model that is independent of the scale (size) of the market and then scales the solution based on the updated estimation of the market size. More specifically, under the optimal policy, the optimal order-up-to level of the original problem, y * t (x|s), can be derived by scaling the optimal order-up-to level of the normalized problem, y * t (x), with the scale rate q t (s), while the optimal pricing decision of the original problem is equal to the pricing decision of the normalized problem over the normalized stock level, x/q t (s). In particular, q t (s) = Λ t for the Gamma-Gamma Model and q t (s) = Λ 1/k t for the Weibull-Gamma model, where Λ t represents the accumulate sales over time. Hence, the more demands observed over time, the greater the value of the scale rate q t (s) and therefore the greater the base stock level y * t (s). That is, the higher the historical demand and the more will be ordered. It is evident that the dimensionality reduction approach reduces the computational effort significantly. Taking the Gamma-Gamma model as an example, we have s t+1 = (Λ t + t , a t + 1) for each period t. When we compute the optimal policy according to the optimality equations (2) instead of the decomposed formulation, then to evaluate the term ∞ 0 f t+1 (δ(y − d )|s • )φ t ( |s)d we have to assess the profit-to-go function f t (·|(Λ t + t , a t + 1)) for all possible values of t , which is not computationally tractable as t is a continuous variable with a support (0, ∞). However, with the decomposed formulation (5), we only need to evaluate the single-dimensional normalized value function v t+1 to compute the optimal normalized policy and then employ the transformations introduced in Theorem 3 to obtain the optimal policy, which is clearly more tractable and efficient. Bayesian ordering decision rule, y * t (x|s) = q t (s)y * t (x/q t (s)), can be seen as a data-driven decision rule with the data contained in the statistic s. Note that the functional forms of the scale factor q t (·) and the optimal order-up-to decision of the normalized model y * t (·) = y * t (·|a, 1) does not depend on the statistic s but on the shape parameter a (or the total number of observations). Such a data-driven decision rule is similar to the operational statistics proposed by Liyanage and Shanthikumar (2005) as an alternative approach to the conventional estimation-then-optimization statistical learning approach. Their approach aims to find a decision rule that maximizes the priori expectation of the performance uniformly for all possible values of the unknown distribution parameters under a class of structured decision rules. Liyanage and Shanthikumar (2005) identify a class of operational statistics that are homogenous of order one for newsvendor models and find that the optimal operational statistic of this class can be derived by solving a normalized model. Chu et al. (2008) propose a Bayesian approach to find the optimal operational statistic with a non-informative prior. In our model, the optimal ordering decision rule derived from the normalized model, y * t (x) = y * t (x|a, 1), plays the same role of the optimal decision rule of operational statistics as it maximizes the priori expectation (conditionally on the current statistic) for the class of operational statistical rules, expressed as y os t (zq t (s)|s), that are scalable with the factor q t (s) and s is updated as that of Λ t . The link between Bayesian decision rules and operational statistics sheds new light on more general operational statistics in dynamic settings. This section aims to identify the effect of learning on the optimal decisions by comparing the optimal policies in the system without demand learning and the system with demand learning. In the system without Bayesian demand learning, the replenishment decisions are made based on the initial belief at the beginning of the planning horizon. That is, the multiplicative random perturbation of demand in each period has the same density function φ 1 (·|s 1 ). Let g t be the valueto-go function (corresponding to f t ) in period t and g T +1 (x|s 1 ) = 0. The optimality equations are Letŷ t (x|s 1 ) andd t (x|s 1 ) be the optimal solutions and (ŷ t (s 1 ),d t (s 1 )) the global optimal solution of the optimization problem in (6). In addition, letp t (x|s 1 ) = P t (d t (x|s 1 )) andp t (s 1 ) = P t (d t (s 1 )). The following theorem reveals the structure of the optimal policy without demand learning. Theorem 4 (Optimal Policy without Learning). Suppose one of the following two pairs of conditions holds: (i) (A1) and (B2), (ii) (A2) and (B1) for all t. (a) A base-stock list-price policy is optimal: for any t and s 1 , if the starting inventory level x <ŷ t (s 1 ) then it is optimal to order up to the base stock levelŷ t (s 1 ) and set the list price asp t (s 1 ); if, otherwise, x ≥ŷ t (s 1 ) then it is optimal to place no order and set the price asp t (x|s 1 ). Moreover, the optimal pricep t (x|s 1 ) is nonincreasing in x and satisfiesp t (x|s 1 ) ≤p t (s 1 ) for x ≥ŷ t (s 1 ). (b) In particular, if the system is stationary in the sense that all parameters are time invariant, a myopic base-stock list-price policy is optimal with the optimal myopic base-stock and list-price levels being (y M (s 1 ), p M (s 1 )) such that p M (s 1 ) = P t (d M (s 1 )) and (y M (s 1 ), d M (s 1 )) = arg max y≥0,d∈D t {π t (y, d|s 1 )}: If the inventory level x < y M (s 1 ), it is optimal to place an order up to y M (s 1 ) and set the price as p M (s 1 ); if, otherwise, x ≥ y M (s 1 ), it is optimal to place no order and set the price as p M (x|s 1 ) = P t (d M (x|s 1 )) where d M (x|s 1 ) = arg max d∈D t π t (y, d|s 1 ). Theorem 4 shows that the base-stock list-price policy is optimal for the system without Bayesian update. In particular, when the system is time invariant and the distribution of random demand perturbation remains as the initial one, a myopic base-stock list-price policy is optimal. Using the dimensionality reduction approach in Section 5, under the assumptions (C1) and (C2), To gain sharper insights, we next derive a closed-form solution of the optimal myopic base-stock and list-price levels for a special case in the system without learning. Lemma 2 (Optimal myopic policy). Consider a stationary backordering system (i.e., λ = 1) with an exponential-Gamma perturbation distribution with (i.e., φ t ( t |s 1 ) = a 1 Λ a 1 1 /(Λ 1 + t ) a 1 +1 , a 1 > 1). The expected operating profit in period t can be expressed as The optimal myopic base-stock level is y M (s 1 ) = Λ 1 ρ −1/a 1 − 1 d M (s 1 ) and the optimal myopic list price is p M (s 1 ) = P t (d M (s 1 )) with the expected demand level In addition, d M (s 1 ) (p M (s 1 )) is increasing (decreasing) in a 1 and independent of Λ 1 and y M (s 1 ) is increasing in Λ 1 . Lemma 2 provides a closed-form solution to the optimal myopic base-stock list-price policy in a system without Bayesian learning. Recall that the predictive demand distribution is stochastically decreasing in the shape parameter a 1 and stochastically increasing in the scale parameter Λ 1 . Hence, intuitively, as the shape parameter becomes smaller or the scale parameter becomes greater, the demand becomes larger (stochastically), which drives a greater stock level and a higher price. We show that the optimal myopic list price is indeed decreasing in the shape parameter (but independent of the scale parameter) and the optimal myopic basestock level is increasing in the scale parameter. However, it is unclear whether the optimal myopic basestock level is decreasing in the shape parameter. Figure 1 below provides a numerical example which shows that both the optimal myopic basestock level and the optimal myopic pricing decisions tend to be lower as the shape parameter a 1 becomes greater: the smaller the demand the less to order and the lower price. We next compare the myopic base-stock list-price policy in the system without learning, (y M (s 1 ), p M (s 1 )) to that under demand learning, (y * t (s t ), p * t (s t )), in a stationary system. For ease of interpretation, we restrict our discussion on a two-period setting (T = 2). Effect of learning in period 2. In period 2, the objective function is π 2 (y, d|s 2 ) where s 2 = (Λ 2 , a 2 ) with Λ 2 = Λ 1 + 1 ≥ Λ 1 and a 2 = a 1 + 1. The preceding analysis implies that (y * 2 (s 2 ), p * 2 (s 2 )) = (y M (s 2 ), p M (s 2 )). Since p M (s 1 ) is decreasing in a 1 and independent of Λ 1 , we know that p * 2 (s 2 ) ≤ p M (s 1 ), which implies that demand learning leads to a smaller list price in the last period. For the optimal base-stock levels, y * 2 (s 2 ) ≥ y M (s 1 ), or equivalently, , which implies that the optimal basestock level in the last period is determined by accumulated demand perturbations (observed market sizes) over the planning horizon: the greater the observed total market size over time the more will be ordered in the last period, and the optimal basestock level with demand learning should be greater than that without learning if and only if sufficient market sizes were observed. Effect of learning in period 1. In period 1, pricing and inventory decisions are made taking into account the effect of learning on the future decisions. We can employ the dimensionality decomposition method to derive the optimal policy under Bayesian learning. Note that q t (s 1 ) = Λ t and U t+1 (ξ) = 1 + ξ. The optimal bases-stock and list-price levels can be computed by backward induction according to (5). Since y * 2 (1, a 2 ) is the optimal base-stock level in period 2, v 2 (x) remains constant at v 2 (y * 2 (1, a 2 )) = π 2 (y * 2 (1, a 2 ), d * 2 (1, a 2 )) for x ≤ y * 2 (1, a 2 ) and then decreases in x for x > y * 2 (1, a 2 ). For any y ≤ y * 2 (1, a 2 ) and t > 0, we have (y − d 2 )/(1 + t ) < y * 2 (1, a 2 ) and hence v 2 ((y − d 2 )/(1 + t )) = v 2 (y * 2 (1, a 2 )). Since y * 2 (1, a 2 ) = y M (1, a 2 ) ≤ y M (1, a 1 ) and (y M (1, a 1 ), d M (1, a 1 )) is the global maximizer of π t (y, d), we know that y * 1 (1, a 1 ) must be no less than y * 2 (1, a 2 ). Note that the monotonicity of v 2 implies that the term ∞ 0 v 2 (δ 1 (y − d )/(1 + )(1 + )φ 1 ( )d is decreasing in y and increasing in d, which provides a negative driving force for the inventory decision and a positive (negative) driving force for the expected demand level (price) compared to the myopic policy. However, the supermodularity of the objective function Π 1 (y, d) implies that a greater value of y will lead to a greater optimal value of d, and vice versa. Although it is difficult to analytically compare (y * 1 (1, a 1 ), p * 1 (1, a 1 ) and (y M (1, a 1 ), p M 1 (1, a 1 ) ), we know that d * 1 (x|(1, a 1 )) ≥ d M (x|(1, a 1 )) (i.e., p * 1 (x|(1, a 1 )) ≤ p M (x|(1, a 1 )) ) for x ≥ max(y * 1 (1, a 1 ), y * 1 M (1, a 1 )), which implies that when inventory level is sufficiently high the positive effect of demand learning on pricing decisions dominates the positive effect on ordering decisions. To gain some insight, we provide a numerical example to compare (y M (1, a 1 ), p M 1 (1, a 1 )), (y * 1 (1, a 1 ), p * 1 (1, a 1 ) and (y * 2 (1, a 2 ), p * 2 (1, a 2 ). Figure 2(a) shows that the optimal first-period basestock level under demand learning is slightly lower than the optimal myopic one and both of them are greater than the optimal normalized basestock level in the second period as expected. Since q 1 (s 1 ) = Λ 1 = 1, the optimal normalized basestock level in the first period is equal to the optimal basestock level. Figure 2(b) shows that both the optimal myopic pricing decision and the optimal price in the first period are greater than that in the second period under demand learning. The optimal list price level in the first period under demand learning is slightly greater than the optimal myopic list price but then crosses over the latter from above as the inventory level becomes greater, which verifies the finding that the negative effect of demand learning on pricing decisions dominates the positive effect on inventory decisions. Comparing optimal policies with and without demand learning (φt( t|s1) = a1Λ a 1 1 /(Λ1 + t) a 1 +1 , Λ1 = 1, The effect of demand learning on the optimal pricing and inventory control policy can also be impacted by the magnitude of the demand uncertainty which is typically measured by the coefficient of variation (CV) of the demand. It is known that for both Weibull and the gamma distributional specifications of the demand perturbation the CV does not depend on the unknown scale parameter ω which determines the mean demand (Porteus 2002) . The distribution of the scale parameter ω will be updated with the observations as time goes by. For example, for the Gamma-Gamma model specified in Example 1, the CV of the demand is 1/ √ k for a given value of the scale parameter ω. That is, the greater the value of the shape parameter k the smaller the variation of the demand. The CV of the posterior distribution of ω is 1/ √ a t = 1/ a 1 + k(t − 1) which decreases in the number of observations t − 1. We next provide a numerical study on the effect of demand volatility on the optimal policy under a Gamma-Gamma model. We focus on the last period for simplicity. Figures 3 and 4 compare the optimal normalized order-up-to levels and price levels for three levels of the shape parameter of demand (k = 1, 2, 3) with respect to the posterior distributions of ω with shape parameters a T = 3 and a T = 10, respectively. The CVs of the demand corresponding to different values of k are 1, 1/ √ 2 and 1/ √ 3, respectively, while the CVs of the posterior distribution of the unknown scale parameter ω are 1/ √ 3 and 1/ √ 10, respectively. Note that the greater the value of a T the greater the accuracy of the posterior distribution of ω. From Figures 3 and 4 , we can observe that for both levels of a T , as k decreases, or equivalently, the demand volatility increases, the optimal normalized order-up-to levels (as well as the optimal base-stock level) decrease while the optimal list-price level increase. But when the inventory level is relatively high it is possible that a greater degree of demand variation may lead to a lower price for a larger value of a T (a more precise posterior distribution of ω). These results imply that the ordering and pricing decisions can be used to leverage the profitability of the system. When the inventory level is relatively low, a higher level of demand volatility drives a higher price, which leads to a lower demand level, and a lower order-upto level to match the demand and supply. When the inventory is relatively high and the posterior distribution is more precise, as the demand volatility is higher, instead of using a higher price to contain the demand level, it is better to set a lower price to speed up the inventory depletion. The preceding analysis is based on the assumption that unmet demand, backorders or lost sales, is fully observable. In this subsection, we assume that unmet demand is lost and unobserved. This is common in retailing where consumers would purchase the product if it were available but may simply leave without identifying themselves when facing an empty shelf. In other words, the demand information in the presence of stock out is censored. With censored demand (i.e., only sales are observed), the replenishment and pricing decisions interact with the demand learning process, which implies that the firm needs to take into account how its replenishment and pricing decisions affect the demand information. To address this problem, we need to modify the Bayesian dynamic programming formulation. For simplicity, we restrict our attention to the full lost-sales case (λ = 0). When the unmet demand is not observed and lost, the demand information is censored and hence t cannot be observed. By Bayes' rule, given a sufficient statistic s and a censored demand information min( , y/d), the posterior density on ω is updated as Note that a conjugate prior for uncensored demand may no longer be a conjugate for the censored demand. One exception is the Weibull-Gamma family. With the Weibull-Gamma model, the posterior distribution conditioned on d t ≥ y is a Gamma distribution with s t+1 = (a t , Λ t + (y/d) k ). Letting f T +1 (x|a, Λ) = 0, with Weibull-Gamma distributions, the optimality equation becomes Clearly, f t (x|a, Λ) is decreasing in x. Lariviere and Porteus (1999) show that the censored lostsales model is scalable with the Weibull-Gamma distributions. Define the normalized case as when Λ = 1. Suppress the scale parameter Λ and write v t (x|a) = f t (x|a, 1), φ(ξ|a) = φ(ξ|a, 1) and π t (x, y, d|a) = π t (x, y, d|a, 1). Let q t (Λ) = Λ 1/k and U t+1 (x) = (1 + x k ) 1/k . Similar to Theorem 3, one can readily show that f t (x|a, Λ) = q t (Λ)v t (x/q t (Λ)|a) where v t is the profit-to-go function of the normalized problem. Letting v T +1 (x|a) = 0, the profit-to-go functions of the normalized problem satisfy the following optimality equations: For t = 1, · · · , T , v t (x|a) = max y≥x,d∈D t Π t (y, d|a) := π t (y, d|a) + α where Let (y * t (x|a), d * t (x|a)) be the optimal solution to the optimization problem (7). Then the solution to the original problem and the solution to the normalized problem have the following relationship: y * Compared (7) to (5), the term αΦ t (y/d|a)U t+1 (y/d)[v t+1 (0|a + 1) − v t+1 (0|a)] (multiplied by the scaling factor q t (s)) addresses the cost due to the censored demand in period t. Unfortunately, we are not able to show whether the base-stock list-price policies are optimal for the normalized problem. In fact, even in the absence of the pricing decision, it is unclear whether the base-stock policy is optimal (see, e.g., Lariviere and Porteus 1999) . Another common price-responsive demand model is of the additive form D t (p) = d t (p) + t with E[ t ] = 0; see Federgruen and Heching (1999) and Smichi-Levi (2004, 2012) for the case with backorders and Pang (2011) for the case with lost sales. Our model can be readily modified to address the case with additive demand. With the additive demand, the optimality equations, for t = 1, ..., T , can be expressed as Following the analysis of Pang (2011) , we know that if (A2) holds and the price elasticity of the lost-sales probability for additive demand, expressed as − P t (d)−αc y−d|s) , is greater than one then R t (y, d|s) is jointly concave and supermodular in (y, d). Using similar arguments of Pang (2011), one can readily show that the state-dependent base-stock list-price policy is optimal. Similar to Pang (2011) , with additive demand, our analysis can be further extended to address the case with fixed order cost and show that the state-dependent (s, S, p) policy is optimal. It is natural to ask whether the dimensionality reduction approach applies to the model with additive demand; see Zhang and Chen (2006) for an attempt. It is notable that the preceding dimensionality reduction analysis replies on the scalability of the multiplicative demand. When the demand is of the multiplicative form, the sales of each period, (1 − λ) min(d , y) + λd , is scalable for any d ∈ D t , which leads to the scalability of the revenue and profit functions. However, when the demand is of the additive form, scaling the random variable t requires scaling the decision variable d while the revenue function is nonlinear in d and therefore not scalable. As a result, the dimensionality reduction approach does not apply to the model with additive demand. Operating businesses in the face of volatile market environments, especially the unprecedented global public health and economic crisis caused by COVID-19, firms need to redesign their operational strategies with all operational levers while actively learning the changing market conditions. Motivated by this, we study how a firm can coordinate its dynamic pricing and inventory management strategies while learning the demand (market size) dynamically using a Bayesian approach. We consider a price-sensitive demand model of the multiplicative form as a product of a deterministic price-dependent function and a random factor (market size) with unknown distribution. Unmet demand is partially backlogged/lost. Following Bayesian inventory management literature, we assume that the distribution of the market size depends on a random parameter and the firm forms a prior distribution on the random parameter at the beginning of the planning horizon. Focusing on the class of conjugate prior distributions, we formulate the problem as a Bayesian dynamic program. We identify a set of regularity conditions under which the objective functions are jointly concave and supermodular in the order-up-to level and the expected demand level, which allows us to show the optimality of the state-dependent base-stock and list-price policy. Furthermore, we employ a dimensionality reduction approach to decompose the profit-to-go functions as a product of a scale function that depends on the accumulate demands over time and the profit-to-go functions of the normalized problem, which significantly simplifies the computation and implementation of the model. We show that the optimal order-up-to levels can also be decomposed as the product of the scale function and the optimal order-up-to levels of the normalized problem. With the decomposition, the firm can first compute the optimal policy for the normalized problem at the beginning of the planning horizon and then update the optimal inventory and pricing decisions with accumulate demand data as time goes by. The dimensionality reduction approach also facilitate our discussion on the effect of demand learning on the optimal policy. Using the system without Bayesian update (i.e., the distribution based on the initial prior throughout the planning horizon), we show that demand learning may lead to lower future prices while the future inventory decisions depend on the realized values of the random demand over time. Demand learning may lead to a lower basestock level and a lower price for high inventory levels in the current period. We also discuss the effect of demand variation on the optimal policy under demand learning in a numerical example and observe that a greater degree of demand variation leads to lower optimal order-up-to levels and greater optimal list-price levels while the optimal price level may decrease in the level of demand variation as the inventory level becomes sufficiently high and the posterior distribution of the unknown scale parameter is more precise. We further extend our structural analysis to the model with unobserved lost sales (i.e., censored demand), and show that the dimensionality reduction result also holds but the base-stock list-price policy may no longer be optimal. We also discuss the case with additive demand and show that the state-dependent base-stock list-price policy is also optimal but the dimensionality reduction does not apply due to the lack of scalability from the additive demand model. Last but not least, we outline several limitations of our paper and future research directions. First, the classic Bayesian approach, though providing strong tractability in both structural analysis and computation, requires the firm to form the prior probability, which can be difficult in some situations. A potential alternative approach is the reinforcement learning (e.g., Q-learning) which requires little knowledge of the transition probability but focusing on the structure of the decision rules. Second, our Bayesian analysis focuses on learning the market size (random factor), assuming that the price-dependent factor is known for tractability. Note that the price-dependent factor can be explained as the purchasing probability (market share) of WTP at the given price. The distribution of WTP is typically estimated through econometric analysis or pricing experiments. It will be interesting to combine different learning approaches to further advance dynamic inventory models. Finally, the current paper focuses on theoretical developments of the empirical Bayesian inventory models though our model is data-driven by its nature. We will empirically examine our model with real-world data in the future. Proof of Lemma 1 For period T , we have y T (s) ≥ u T by the definition of u T . For t < T , it suffices to show that Π t (y, d|s) is increasing in y ≤ u t for any s and d ∈ D t . By induction. Suppose that for some t ≤ T we have y t+1 (s) ≥ u t+1 , which implies that f t+1 (y|s) = f t+1 (y t+1 (s)|s) (or, ∂f t+1 (y|s) ∂y = 0) for all y ≤ u t+1 and s. Note that δ(y − d ) ≤ y and u t ≤ u t+1 . Then, for any y ≤ u t , f t+1 (δ(y − d )|s • ) must be a constant for any d, s and , and so is Since u t ≤ŷ t (d|s) = arg max y π t (y, d|s) for any d ∈ D t and s and π t (y, d|s) is concave in y, π t (y, d|s) is increasing in y for any y ≤ u t . Thus, for any d and s, Π t (y, d|s) is increasing in y ≤ u t , which implies that y t (s) ≥ u t . This completes the induction. We first show that R t (d, y|s) is jointly concave under conditions of (i) or (ii). Thus, it suffices to show that the expected truncated revenue (of lost-sales model),R t (y, d|s) = P t (d)E[min(d , y)|s], is jointly concave in (y, d) for any s. To this end, we need to show that the Hessian matrix ofR t (y, d|s) is negative semi-definite. It suffices to show thatR t (y, d|s) is concave in each component and the determinant of Hessian matrix, ∆(y, d|s), is nonnegative. Taking the second-order derivative with respect to d yields If assumption (ii) holds, i.e., P t (d)d + P t (d) ≤ 0 and ρ t (y/d|s)e t (d) ≥ 1, then the RHS of (8) is negative. If assumption (i) holds, i.e., P t (d)d + 2P t (d) ≤ 0 and ρ t (y/d|s)e t (d) ≥ 2, then the RHS of (9) is negative. That is,R t (y, d|s) is concave in d if either (i) or (ii) holds. Taking the second-order derivative with respect to y yields which indicates thatR t (y, d|s) is concave in y. Taking the cross-partial derivative with respect to y and d yields where the inequality is due to ρ t (y/d|s)e t (d) ≥ 1 and P t (d) < 0. HenceR t (y, d|s) is supermodular. Pairing up the terms yields (10) Clearly, both terms of the RHS of (10) are nonnegative under assumptions (i) or (ii). Hence, the Hessian of R t (y, d|s) is negative semi-definite, implying that it is jointly concave in (y, d) for any s. We are now ready to show the optimality of the base-stock list-price policies by induction. Suppose that f t+1 (x|s) is nonincreasing and concave in x for any s. Note that δ(·) is an increasing and convex function. Then f t+1 (δ(x)|s) is non-increasing and concave in x, which further implies that f t+1 (δ(y − d t )|s) is jointly concave and supermodular in (y, d) for any given s. Since concavity can be preserved under expectation operation, we know that ∞ 0 f t+1 (δ(y − d )|s • )φ t ( |s)d is jointly concave and supermodular in (y, d) for any s. Similarly, the convexity of L t implies that ∞ 0 L t (y − d )φ t ( |s)d is jointly convex and submodular in (y, d) for any s. Since R t (y, d|s) is jointly concave and supermodular in (y, d), Π t (y, d|s) is jointly concave and supermodular in (y, d) for any given s. Following the standard inductive arguments of Federgruen and Heching (1999) and Pang (2011) , the desired results follow immediately. We first prove part (a) by induction. Note that f T +1 (x|s) = 0 automatically satisfies the desired properties. Suppose f t+1 (x|s) = q t+1 (s)v t+1 (x/q t+1 (s)) for some t. We will show that f t (x|s) = q t (s)v t (x/q t (s)). First, by the definitions of π t (y, d|s) and π t (y, d), the assumption of γ t , and condition (C1), we have π t (y, d|s) = q t (s)π t (y/q t (s), d). According to the optimality equations (1), we have f t (x|s) = max y≥x,d∈D t q t (s)π t (y/q t (s), d) + α = max y≥x,d∈D t q t (s)π t (y/q t (s), d) + α q t (s)U t+1 q t (s)   φ t ( /q t (s)) q t (s) d = q t (s) max y/q t (s)≥x/q t (s),d∈D t π t (y/q t (s), d) + α where the first equality follows from (11), condition (C1) and the induction assumption, the second from (C2), the third is to substitute /q t (s) by ξ, the fourth is to substitute y/q t (s) by z, and the last from the optimality equation (5). The induction is completed. For any fixed s, the optimal solution of (13) also maximizes the right-hand side of (12), i.e., y * t (x|s) = q t (s)y * t (x/q t (s)) = q t (s) max(ŷ t , x/q t (s)) and d * t (x|s) = d * t (x/q t (s)) which implies that p * t (x|s) = p * t (x/q t (s)). In particular, the state-dependent base stock level y * t = q t (s)ŷ t and the optimal list price p * t (s) =p t . This proves the part (b). First, following similar arguments in the proof of Theorem 1, the optimality of the base-stock list-price policy and the monotonicity of the optimal policy follow immediately. In addition, both π t (y, d|s 1 ) and G t (y, d) are jointly concave and supermodular in (y, d) and g t (x) is decreasing in x. Clearly, G t (y,d t (y)) = max d∈D t g t (y, d) is concave in y. We next consider the stationary system and prove the optimality of the myopic policy by induction. Suppose that the optimal myopic base-stock and expected demand levels (y M (s 1 ), d M (s 1 )) are optimal in period t + 1, which implies that it is always optimal to order up to y M (s 1 ) in period t + 1 as the beginning inventory level is lower than y M (s 1 ). As a result, the profit-to-go function g t+1 (y) is constant for y ≤ y M (s 1 ) and then decreasing for y > y M (s 1 ). Hence, for y ≤ y M (s 1 ), g t+1 (y −d t (y|s 1 ) t ) is constant and so is ∞ 0 g t+1 (δ(y − d ))φ 1 ( |s 1 )d , which implies that both π t (y,d t (y|s 1 )|s 1 ) and G t (y,d t (y|s 1 )) are increasing in y for y ≤ y M (s 1 ) whered t (y|s 1 ) = arg max d∈D t π t (y, d|s 1 ). In particular, π t (y,d t (y)|s 1 ) reaches the maximum when y = y M (s 1 ), and we have d M (s 1 ) =d t (y M (s 1 )|s 1 ). We next show that y M (s 1 ) maximizes G t (·,d t (y M (s 1 )|s 1 )). For any η > 0, on one hand, we have G t (y M (s 1 ) + η,d t (y M (s 1 )|s 1 )) − G t (y M (s 1 ),d t (y M (s 1 )|s 1 )) = [π t (y M (s 1 ) + η,d t (y M (s 1 )|s 1 )|s 1 ) − π t (y M (s 1 ),d t (y M (s 1 )|s 1 )|s 1 )] +α ∞ 0 [g t+1 (δ(y M (s 1 ) + η −d t (y M (s 1 )|s 1 ) )) − g t+1 (δ(y M (s 1 ) −d t (y M (s 1 )|s 1 ) ))]φ 1 ( |s 1 )d ≤ 0, where the inequality is due to the facts that (y M (s 1 ),d t (y M (s 1 )|s 1 )) = (y M (s 1 ), d M (s 1 )) is the global maximizer of π t (·, ·|s 1 ) and that g t+1 (·) and δ(·) are decreasing and increasing functions, respectively. On the other hand, we have G t (y M (s 1 ,d t (y M (s 1 )|s 1 )) − G t (y M (s 1 ) − η,d t (y M (s 1 )|s 1 )) = [π t (y M (s 1 ),d t (y M (s 1 )|s 1 )|s 1 ) − π t (y M (s 1 ) − η,d t (y M (s 1 )|s 1 )|s 1 )] +α ∞ 0 [g t+1 (δ(y M (s 1 ) −d t (y M (s 1 )|s 1 ) )) − g t+1 (δ(y M (s 1 ) − η −d t (y M (s 1 )|s 1 ) ))]φ 1 ( |s 1 )d ≥ 0, where the inequality is due to the facts that (y M (s 1 ),d t (y M (s 1 )|s 1 )) maximizes π t (·, ·|s 1 ) and that g t+1 remains constant below y M (s 1 ) and δ(y M (s 1 ) − η −d t (y M (s 1 )|s 1 ) ) ≤ δ(y M (s 1 ) −d t (y M (s 1 )|s 1 ) ) ≤ y M (s 1 ). Combining the above inequalities, we know that y M (s 1 ) is a global maximizer of G t (·,d t (y M (s 1 )|s 1 )). In addition, d M (s 1 ) =d t (y M (s 1 )|s 1 ) is a global maximizer of G t (y M (s 1 ), ·) on D t . These results, together wish the joint concavity of G t (·, ·), imply that (y M (s 1 ), d M (s 1 )) = arg max y≥0,d∈D t G t (y, d). Hence, the myopic base-stock list-policy is indeed optimal. Let z = y/d. We have where the third equality is due to the integration by parts For any given d, the optimal order-up-to level can be expressed as y M (d|s 1 ) = dz * (s 1 ) where z * (s 1 ) is a newsvendor solution satisfying ρ =Φ t (z|s 1 ) = (1+z/Λ 1 ) −a 1 with ρ = h + +(1−α)c h + +h − . Here z * (s 1 ) = Λ 1 ρ −1/a 1 − 1 which is increasing in Λ 1 and decreasing in a 1 . Inserting z * (s 1 ) into (14) results in π t (dz * (s 1 ), where the second equality is due to the integration by parts and the definition of z * (s 1 ) ∞ z * (s 1 ) φ t ( |s 1 )d = Λ 1 a 1 − 1 (1 + z * (s 1 )/Λ 1 ) −(a 1 −1) + z * (s 1 )Φ t (z * (s 1 )|s 1 ) = Λ 1 a 1 − 1 ρ a 1 ρ −1/a 1 − a 1 + 1 . The optimal expected demand level can be obtained by maximizing π t (dz * (s 1 ), d|s 1 ) on D t , or equivalently, d M (s 1 ) = arg max d∈D t d(P t (d) − αc + h + ) − d(h + + h − )ρ a 1 ρ −1/a 1 − a 1 + 1 . The corresponding optimal base-stock level is y M (s 1 ) = z * (s 1 )d M (s 1 ) = Λ 1 ρ −1/a 1 − 1 d M (s 1 ) and the optimal list price is p M (s 1 ) = P t (d M (s 1 )). We next analyze the monotonicity of the optimal myopic policy. Clearly, d M (s 1 ) is independent of Λ 1 . Taking the second-order derivative respect to a 1 twice for the function a 1 ρ −1/a 1 − a 1 + 1 yields a 1 ρ −1/a 1 − a 1 = 1 a 3 1 (ln(ρ)) 2 ρ −1/a 1 > 0, which implies the above function is convex in a 1 . Since ρ < 1, the convexity implies that a 1 ρ −1/a 1 − a 1 = ρ −1/a 1 + 1 a 1 ln(ρ)ρ −1/a 1 − 1 < 1 −1/a 1 + 1 a 1 ln(1)1 −1/a 1 − 1 = 0. Hence, π t (dz * (s 1 ), d|s 1 ) is supermodular in (d, a 1 ) (i.e., ∂ 2 ∂d∂a 1 π t (dz * (s 1 ), d|s 1 ) > 0), which, by Theorem 2.8.1 of Topkis (1998) , implies that the d M (s 1 ) = arg max d∈D t π t (dz * (s 1 ), d|s 1 ) is increasing in a 1 . The monotonicity of P t implies that the corresponding optimal price p M (s 1 ) = P t (d M (s 1 )) is independent of Λ 1 and increasing in a 1 . For the optimal basestock level y M (s 1 ) = d M (s 1 )z * (s 1 ), since z * (s 1 ) is increasing in Λ 1 and decreasing in a 1 , y M (s 1 ) is increasing in Λ 1 but it is unclear whether it is decreasing in a 1 . Amazon error may end "dynamic pricing Dynamic pricing for nonperishable products with demand learning Bayesian solution to dynamic inventory models under unknown demand distribution Tesco's UK chief executive quits Dynamic pricing and the direct-to-customer model in the automotive industry Pricing Segmentation and Analytics How COVID-19 affected U.S. consumer prices in March. (Reuters Coordinating inventory control and pricing strategies for perishable products Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case Pricing and inventory management. Philips P. andÖ.Özer, ets. Oxford Handbook of Pricing Management Dynamic pricing and demand learning with limited price experimentation Solving operaitonal statistics via a Bayesian analysis Dynamic pricing and learning: Historical origins, current research, and new directions Dynamic pricing in the presence of inventory considerations: Research overview, current practices, and future directions Combined pricing and inventory control under uncertainty Optimal dynamic pricing of inventories with stochastic demand over finite horizons Price and order postponenment in a decentralized newsvendor model with multiplicative and price-dependent demand Joint inventory and pricing decisions for assortment Why am I paying $60 for that bag of pice on Amazon Prices and optimal inventory policy Stalking information: Bayesian inventory management with unobserved lost sales A practical inventory control policy using operational statistics Myopic policies for some inventory models with uncertain demand distributions An elastity approach on the newsvendor with price sensitive demand Amazon changes prices on its product about every 10 minutes Scarf's state reduction method, flexibility, and a dependent demand inventory model Optimal dynamic pricing and inventory control with stock deterioration and partial backordering A note on the structure of joint inventory-pricing control with leadtimes Pricing and the newsvendor problem: A review with extensions Dynamic pricing and inventory control with learning Foundations of Stochastic Inventory Theory Bayes solution of the statistical inventory problem Some remarks on bayes solution of the statistical inventory problem How a book about flies came to be priced $24 million on Amazon Optimal dynamic joint inventory-pricing control for multiplicative demand with fixed order costs and lost sales Developing optimal pricing and inventory policies for retailers who face uncertain demand The Theory and Practice of Revenue Management Bayesian solution to pricing and inventory control under unknown demand distribution