key: cord-0513250-smz6q2uq authors: Moriwaki, Daisuke; Hayakawa, Yuta; Munemasa, Isshu; Saito, Yuta; Matsui, Akira title: Unbiased Lift-based Bidding System date: 2020-07-08 journal: nan DOI: nan sha: 5d10018bd6cb0f882db16294fc086202dde2bb7e doc_id: 513250 cord_uid: smz6q2uq Conventional bidding strategies for online display ad auction heavily relies on observed performance indicators such as clicks or conversions. A bidding strategy naively pursuing these easily observable metrics, however, fails to optimize the profitability of the advertisers. Rather, the bidding strategy that leads to the maximum revenue is a strategy pursuing the performance lift of showing ads to a specific user. Therefore, it is essential to predict the lift-effect of showing ads to each user on their target variables from observed log data. However, there is a difficulty in predicting the lift-effect, as the training data gathered by a past bidding strategy may have a strong bias towards the winning impressions. In this study, we develop Unbiased Lift-based Bidding System, which maximizes the advertisers' profit by accurately predicting the lift-effect from biased log data. Our system is the first to enable high-performing lift-based bidding strategy by theoretically alleviating the inherent bias in the log. Real-world, large-scale A/B testing successfully demonstrates the superiority and practicability of the proposed system. Online advertising has been playing an integral role in the recent business, which accounts for half of media ad spending in the world [1] . One of the advantages of the online ad is that the performance of the ads can be assessed by simple metrics such as the number of clicks and conversions earned by the ads. These metrics enable ad deliverers, DSPs (Demand-Side Platforms), to charge ad costs for advertisers through the objective billing systems named CPC (cost-per-click), CPA (cost-per-action), and CPM (cost-perimpression). The conventional goal of DSPs is to optimize these metrics by performance-based bidding strategy, which decides bid price based on the probability of users to take the desired action. Despite its industrial success, there are two major emerging issues with the current bidding process. First, the conventional bidding strategy ignores the probability that a user will convert even without an ad. Such a strategy is suboptimal in the sense that it will not reach users having a high probability of changing their action by showing an ad [11] . Moreover, it might destroy the experience of end-users who have high conversion probability even without an ad [6] . The other problematic issue is the inherent bias in training data. Specifically, for supervised machine learning to work, it is necessary for training and test population to follow the same distribution. However, this is not the case in the online advertising domain. This is because the training data is heavily biased by the ad auction selection in the data collection process (i.e., impression bias). Impressions that were assigned to a higher bid price by a past bid strategy would have a higher probability of winning in an auction, thereby having a higher chance to appear in the training data. On the other hand, one has to make a prediction for every impression before the ad auction selection in the testing time. As a result, machine learning models trained on that biased data will overly focus on samples with a high winning probability, resulting in a poor performance in the testing time [14] . These two issues have been solved separately. [11] shows that the conventional strategy optimizing observed clicks or conversions is suboptimal and proposes a lift-based bidding strategy. This strategy decides the bid price based on the predicted lift effect of showing an ad to a user and considers their conversion probability without the ad. However, their proposed method does not address the impression bias theoretically. In contrast, [14] proposes a method to alleviate the bias and unbiasedly predict the outcome under the impression bias. However, the bidding strategy considered in [14] still pursues the observed metrics, not the lift-effect. Therefore, a theoretically grounded method that simultaneously solves these two challenging problems has not yet been proposed in the literature. In this study, we solve the aforementioned issues simultaneously with the aim of maximizing advertisers' revenue while avoiding damaging end-users' experience. We first propose a method to predict the lift of showing ads to a specific user using only an observable biased impression log. Our method builds on a theoretically validated debiasing method in causal inference called inverse propensity scoring and easily implementable with existing machine learning packages such as scikit-learn 1 or XGBoost 2 . We further describe the details of our resulting Unbiased Lift-based Bidding System architecture used in our real production. Our system is scalable and able to optimize advertisers' profit by deciding the bid price for each impression based on our proposed unbiased lift prediction procedure. Finally, we conduct online A/B testing and demonstrate that our proposed system outperforms the naive, conventional performance-based bidding strategy in a real ad campaign. Our contributions can be summarized as follows: • We propose a theoretically grounded, easily implementable method to predict the lift-effect of showing ads from biased impression log data. • We develop Unbiased Lift-based Bidding System, which maximizes the advertisers' profit by bidding based on the predicted lift-effect. • We demonstrate the efficacy of our proposed system in a large-scale online A/B testing on a real-world DSP. This study aims at increasing the effectiveness of an advertising campaign by optimizing bid prices under budget constraints. To explain this, we provide a brief overview of the bidding problem. Marketers craft a whole picture of their advertising campaign and split a fraction of the budget for online ads, spending the rest to other advertisements such as TVs and newspapers. With that specific assigned budget, DSPs join online ad auctions. This predetermined constraint is almost always an equity constraint and the budget DSPs want to consume without any excess and deficiency. Hence the tasks of DSPs are twofold: 1) allocating the campaign budget for online advertisements efficiently to eat up its budget and 2) acquiring an effective impression in RTB (Real-time Bidding). DSPs are unable to know which or when, even how many users show up in the auction, but only able to predict them. Hence, DSP needs to determine the appropriate bid price for each bid request sequentially by learning the quantity and quality of bid requests to keep budget constraints. Then the DSP's problem at kth bid request is to choose the best bid price bid k ∈ R >0 to maximize expected profit over [k,k) period. Let wp k be the winning price of the auction and ω k be the indicator function that tells if the bidder wins or not. wp k is the highest bid for first price auction and second highest bid for second price auction. ω k takes one if the DSP wins and takes zero if not. We treat them as a function of bid k . Let v j be the value obtained from the ad-slot j. The conventional performance-based bidding strategy uses the probability of conversion with an ad as the value as follows. It just uses the observed performance of the ad, but ignores the action changes caused by the ad. In contrast, the lift-based bidding strategy set v j as the lift of conversions caused by the ad impression. Namely, Then the DSP solves the following maximization problem. where the second equation implies that the expected ad inventory cost wp j (bid j )ω j (bid j ) must meet remaining budget at k, B k . 3 With the above optimization problem, DSPs should gain an effective impression having a large lift-effect to maximize the profit. Our focus in this study is to propose a method to predict the lifteffect with the biased impression data and implement a system that controls the bid prices based on the predicted lift-effect. We briefly summarize the background of this study and critical related works. DSPs participate in the online auctions to purchase ad impressions from SSP (Supply-Side Platform). In general, DSPs charges advertisers for their performance measured by an observed metric such as the number of clicks or conversions. Specifically, they claim a fixed price for each click (cost-per-click, CPC), or each conversion (cost-per-action, CPA). As a result, performance-based bidding strategies aim at maximizing the number of clicks or conversions after impression (Eq. (1)). Just focusing on CPC or CPA, however, may yield suboptimal strategy because they omit the probability that the targeted users get converted (e.g., visit a store, purchase a good) without advertising. To incorporate this omitted realm, the DSP needs to predict incremental gain caused by ad impression (Eq. (2)). The attempts to incorporate this lift-effect in bidding strategy are lift-based bidding [11] , incrementality bidding [4] , or more broadly uplift modeling [5, [7] [8] [9] 12] . We advance this line of literature and address the inherent bias in impression log data in a theoretically grounded and tractable manner. Specifically, we propose a method to unbiasedly predict the lift-effect of an ad impression on a specific user from biased impression data. Moreover, our method is easily implementable with the well-known machine learning libraries, while the previous debiasing method for the performance-based bidding strategy needs additional implementation cost [14] . We believe that our method will broaden the application of the promising lift-based bidding strategy. This is because every online ad system is biased by the ad auction selection [14] , and practitioners can implement and try our unbiased lift-effect prediction method immediately. To our knowledge, we are the first to theoretically consider both the lift-effect prediction and ad impression bias in the online advertising literature. In this section, we propose and describe our Unbiased Lift-based Bidding System. We consider a bidding strategy that rewards v for each lift τ so that a DSP earns v for each extra conversion (as in Eq. (2)). Our algorithm calculates the bid price bid i as: where α ∈ (0, 1) is cost rate, a budget pace multiplier that will be detailed in Section 4.3. v is some fixed value a DSP earns for conversion lift. In our algorithm, τ is the predicted lift-effect of advertising a for user i characterized by a feature vector x i . s(a) ∈ S represents the ad exposure state of an ad a under consideration, and S is a set of possible states. We use the number of impressions of a to i as a scalar variable representing the ad exposure state of a to i, and thus S = {0, 1, . . .} here. To formally define the lift-effect τ , we introduce the essential notation called potential outcomes in causal inference [3] . Let y i (s(a)) denote user i's potential outcome associated with the exposure state s(a) of ad a. Each user i has potential outcomes associated with every possible state, i.e., {y(s(a)) | ∀s(a) ∈ S }, however, only one of them is observable. The observed outcome for user i is defined as y obs i = y i (s i ) where s i is a realized exposure state for user i. Note that the potential outcomes associated with every possible state other than the realized one, i.e., {y(s(a)) | ∀s(a) ∈ S\{s i }} is unobservable or counterfactual to the analysts. The lift-effect τ of showing ad a for each user i is sequentially defined as the difference between the expectation of the potential outcomes given the two consecutive ad exposure states (the number of impressions): where E[y i (s(a))|x i ] is the expected potential outcome of i when the number of impressions is s(a), in contrast E[y i (s(a) − 1)|x i ] is the expected potential outcome when the number of impressions is s(a) − 1. Thus, Eq. (6) is the reasonable definition for the lift-effect of showing an ad a one more time to a specific user i who has been exposed to the ad s(a) − 1 times in the past. To predict τ , we train predictors of outcomes for every possible state ∀s ∈ S separately and combine their predictions as follows. where f s(a) and f s( respectively. To accurately predict the lift-effect τ , it is essential to predict the expected probability of conversion under each ad exposure state well. To obtain a well-performing predictor f for each ad exposure state, it is ideal to directly optimize the following generalization error: where f s(a) is a predictor for E[y(s(a)) | x], ℓ specifies a loss function such as mean squared error, p(x, y) is joint probability distribution of the whole population, meaning that the population before the ad auction selection or the testing time. We consider optimizing the generalization error defined over the whole population because we apply f s(a) to predict the potential outcome in the testing time. In reality, however, it is impossible to directly optimize Eq. . The ERM principal works well under the situation of the same train-test distribution, however, the ad impression bias breaks this premise of machine learning. Specifically, the simple empirical approximation of the loss function over D s(a) has a bias, i.e., E p(x,y,s(a)) [L ERM (f s(a) )] L ideal (f s(a) ) for some given f s(a) . The bias issue emerges because users assigned to a higher bid price has higher density in the training data than that in the test data (i.e., p(x, y) p(x, y|s(a))). As a result, the trained predictor f s(a) ERM may perform poorly in the testing time, because it mistakenly overfits to the over-represented samples in the training data. To alleviate this bias issue of ERM in the online adverting situation, we apply the inverse propensity score (IPS) estimation technique to debias the estimation of the ideal loss in Eq. (8) . Our loss function takes the following form: where n = s(a)∈S n s(a) is the total number of the training data, and e s i (x i ) = P(s(a) = s i |x i ) is the probability of user i being assigned to the ad exposure state s i called the propensity score. A fascinating property of the IPS loss in Eq. (9) is that it is unbiased for the ideal generalization error as the following proposition states. under standard identification assumptions in causal inference [3, 8, 10] . The above proposition suggests that our IPS loss function successfully alleviates the bias issue of ERM and approximates the ideal loss from only observable data. Therefore, to unbiasedly predict the lift-effect under the ad impression bias, we optimize the IPS loss and use the resulting predictors to obtain the final lift-effect prediction as: We use α for budget pacing, and it discounts the predicted lift-effect in the auctions as in Eq. (5). Appropriate α is not known a priori, and thus we use PID (Proportional Integral Derivative) control algorithm following [13] to adaptively update it to keep the spending constant. We tune the parameters k p , k i , k d , and parametric specification of the actuator in Algorithm 1 by an offline simulation. We found that the performance is stable when using the exponential actuator, i.e., exp(·) as a parametric specification. We implement the proposed algorithm in a unique DSP specialized for online-to-offline marketing provided by CyberAgent, Inc, a Japan-based ad agency. Our DSP uses location data to determine bid prices and treat users' visits to offline stores as conversions. Require: Default cost rate α * , hyperparameters k p , k i , k d Set α = α * while h < H do err h = remaining budget/remaining hours -budget spending We summarize the whole architecture of our bidding system in Figure 1 . For each bid request, corresponding τ multiplied by cost rate α is returned as the bid price. τ is a predicted lift-effect for a coming impression. The ad impression count is calculated by scanning the history of an ad impression for each user. The PID controller updates the cost rate α in every hour based on the gap between ideal budget spending and realized budget spending. The whole procedure is scalable and done within a few milliseconds and does not harm the user experience. the coming user has been exposed to the ad in the past (impression count). The PID controller updates the cost rate α every hour based on the gap between the ideal and realized budget spendings. Then DSP calculates the bid price by combining the cost rate α and pre-predicted lift-effect of exposing the ad to the user one more time τ . To attribute each conversion to ad impressions, we use user-level learning rather than log-level learning used in existing research. Log-level learning implies randomly sampling data points from users' timeline by windows (the rectangles in Figure 2 ). The method used in [11] can be classified into this group. On the other hand, user-level learning firstly composes user level data and learns the relationship between impressions and conversions for each user. Note that the data size of user-level learning is the number of users, whereas log-level learning generates larger data. User-level learning has several advantages over log-level learning. First, it does not require any hyperparameter for data splitting such as window size. Second, user-level learning incorporates whole data points including those omitted in log-level learning. One drawback of user-level learning is that the data size is small. This, however, poses little problem for our setting, because DSPs usually reaches out to millions of users for each campaign. The log-level learning captures the higher granularity logs than the user-level learning. The log-level learning samples the attributions from the users' timeline giving multiple attribution ids (attr_ids) to the users, and the user-level learning summarizes the whole history of the timeline for each user. The table of log-level learning can have multiple rows (i.e., multiple attr_ids) for a single user, whereas the table of user-level learning has one row per user. Data Sources. We take the training data from two types of sources. When there are similar advertising campaigns in the past, we use the data for training. When no similar advertising campaign is conducted before we introduce lift-based bidding in the middle of the ongoing campaign and use the data generated at that point. Ad sizes. The online ad takes various forms, including creative, template, format, and size. Size is especially critical as it is standardized by IAB 4 . To account for the effect of ad size on target variables, Essentially, we group ads into four by size and train size-specific predictors. Propensity Score Estimation. To use our IPS loss function, we have to estimate the propensity score e s (·) in Eq. (9) . The estimation of the propensity score is the multi-class classification problem. We use multi-class classifier from XGBoost library [2] and train it using the whole training data {(x i , s i )} n i=1 and classifies the ad exposure state s i from the feature vectors. The feature vector to estimate the propensity score includes estimated CVR used for the existing bidder and the number of impressions before the advertising campaign of training period as well as the geographical features of user i. The estimated CVR and the number of impressions before the advertising campaign improve the estimation quality of the propensity score because the higher estimated CVR means higher bid price and the number of impressions in the training period indicates how easy the ad-slots of the user is obtained. For each ad size, we split users into eight classes by the number of impressions s i ∈ S = {0, 1, , 2, 3, 4, 5 − 9, 11 − 20, 20+}. The window size of bins gets larger for the higher number because the distribution of the number of impressions is highly skewed to the right. We train four propensity score predictors for each ad size. Outcome Prediction. We train outcome predictors for each pair of ad exposure states and ad sizes. We use XGBoost regressor, since the outcome variable y is the number of users' visits to an offline store. To train a regressor, we optimize the IPS loss function in Eq. (9) . XGBoost provides a weighting option, thus our IPS loss minimization procedure is easily implementable. The feature vector x includes recency (the last time the user visited the stores), frequency (how many times the user visited the stores), visited POIs (point-of-interests), the number of location logs, census statistics of the user's residential areas such as the share of age groups, averaged size of the households, and land prices. The predicted lift-effect can be negative. While it is natural for the lift-effect prediction framework to predict the negative effect of the ad, bidding negative price is not rational and could cause a problem in the system that does not expect negative bid price. Hence we first set τ to zero for τ < 0. The floored values are still very volatile across the impression count. It is possible that the expected value of the second impression is very high, while the first impression has zero expected value. Then the impression count for the user never gets two because the bidding for the first impression always loses. To deal with the problem, we smoothed the values with 3impression count backward-moving average so that the value of the first impression gets high enough in the above example. To evaluate our unbiased lift-based bidding system, we compared its performance indicators (impressions-per-user, etc.), business KPIs (key performance indicators), and its pacing efficiency with a conventional bidding system through an online A/B testing. The performance-based system employs the community standard bidding system, which decides bid price by Eq. (1). Note that our lift-based bidding system determines the bid price by the predicted lift-effect of showing one additional ad to a user and is described in Eq. (2) and Eq. (5). In the A/B testing, we randomly assigned the two systems to users on an online ad campaign that aims to promotes the app released by a major consumer electronics retailer in Japan 5 . The primary aim of the campaign is to increase the number of app users and visitors to the real stores located across Japan. To alleviate the impacts of the experiment on the business, we make a relatively small group for the lift-based group, which results in the unbalanced experiment groups. However, we keep a sufficiently large group size for statistical analysis. We summarize the results of the A/B testing in Table 1 . Compared to the performance-based bidding system, the lift-based bidding system achieved a higher impressions-per-user (28% more) and a 5 The experiment duration was June 9-14 2020, where COVID-19 had have been a serious issue across the world. Although we understand that the COVID-19 could influence our experiment, we conclude that it did not pose serious issues. First of all, we conducted after the Japanese government called off the state of emergency. Also, the state of emergency even depends voluntarily rather than compulsory, and, unlike other several countries, strict measurements such as "lockdown" were not taken. Note: The lift-based bidding system obtains more impressions and achieves larger reach rate than the performance-based bidding system. We divided each metric by the results of the performance-based strategy for the normalization purpose. #Impressions-per-user is the average number of impressions by users; #Clicks-per-user is the average number of clicks by users; Reach rate is the ratio of the users who got one or more impressions; Share of visitor is the ratio of the users who visited one or more offline stores.; The number of stars represents the statistical significance level (p-value < 0.05*, p-value < 0.01**, p-value < 0.001***). higher reach rate (71% more) but obtained fewer clicks (46% less). Consequently, the lift-based bidding system invited more visitors to the real stores on average (0.9% more) and reached a higher In both time scales, the lift-based bidding system bids a much smaller price with smaller fluctuations than the performancebased bidding system. We first compute the mean bid price for each user in each hour in each time scale. Then, we min-max normalize the data where the highest value in the population takes 1, and the lowest value takes 0. The shaded area represents 95% confidence interval. share of visitors (0.4% more). In other words, the lift-based bidding system successfully reached out to potential customers and encouraged their visits to the real stores. The lift-based system demonstrates its superiority also in the essential key business indicators (KPIs) described in Table 2 . The lift-based strategy won the impressions in an extremely economical way, obtaining each impression, reaching and conversion at 24-30% of the ad inventory cost of the performance-based. Figure 3 plots the distribution of the bidding prices by each bidding system and highlights that the lift-based bidding system bids a much lower price than the performance-based bidding system. The observations above suggest that the lift-based bidding system (i) bid lower prices and/or (ii) win auctions at a lower price. Moreover, our lift-based system excels at its efficiency. The liftbased bidding system bids prices with lower fluctuation than the performance-based bidding system. Figure 4 depicts the transition of the average bidding prices by the two bidding systems on the two different time scales: Hour of Day and Date. In both time scales, the performance-based bidding system is much more volatile than the lift-based bidding system. This difference stems from its competitors. Because the most rival DSPs adopt performance-based algorithm, the performance-based bidding system had to compete against many competitors for the similar ad slots, suffering from low win rates. When the performance-based system could not win the auctions, its pacing parameter α automatically increased and fluctuated bidding prices. In contrast, the lift-based bidding system did not have many competitors in the auctions and kept the pacing parameter α lower, achieving relatively higher win rates. In this study, we developed Unbiased Lift-based Bidding System, which maximizes the advertisers' profit by accurately predicting the lift-effect under the impression bias. A key feature of our proposed system is unbiased lift-effect prediction: we enabled to unbiasedly predict the lift-effect of an additional ad impression using training data biased by a past bidding strategy. We also describe our detailed implementation and system architecture to achieve the lift-based bidding system in practice. Through online A/B testing, we demonstrated the scalability and advantages of our proposed system over the conventional performance-based one. Global Digital Ad Spending XGBoost: A Scalable Tree Boosting System Causal inference in statistics, social, and biomedical sciences Cost per incremental action: Efficient pricing of advertising Uplift Modeling for Location-Based Online Advertising Fatigue-Aware Ad Creative Selection Real-world uplift modelling with significance-based uplift trees Doubly robust prediction and evaluation methods improve uplift modeling for observational data Cost-Effective and Stable Policy Optimization Algorithm for Uplift Modeling with Multiple Treatments Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback Lift-based bidding in ad selection Support vector machines for uplift modeling Feedback Control of Real-Time Display Advertising Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display Advertising