key: cord-0281930-dcbc13p9 authors: Zheng, Yu; Yang, Yongxin; Chen, Bowei title: Incorporating prior financial domain knowledge into neural networks for implied volatility surface prediction date: 2019-04-30 journal: nan DOI: 10.1145/3447548.3467115 sha: c49bf9fbe426c5331c44c197d59b20ac6cebf790 doc_id: 281930 cord_uid: dcbc13p9 In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface. Machine learning algorithms are essentially data-driven models which mainly focus on producing accurate predictions. They are being used for a wider array of macro and micro level prediction tasks. According to Ernst and Young [14] , machine learning applications in finance have become one of the hottest sectors globally, with the expected direct investment growth of 63% from 2016 to 2022. Despite COVID-19, a recent survey from the Bank of England showed that there are still a third of banks would increase their investments in the number of planned or existing machine learning and data science projects [4] . Therefore, it is interesting for machine learning researchers and financial analysts to investigate this sector because it generates a lot of practical questions and challenges, and addressing them will result in positive economic and social consequence. Although with excellent prediction performance, machine learning is usually used as the "black box" model in many financial applications. Compared with the well-developed models from mathematical finance, machine learning algorithms are less interpretable, e.g., features can be not understandable, and the learning process is not transparent or mathematically tractable. More importantly, they are not aligned with the well-developed financial theories. Therefore, many financial institutions are slow to adopt machine learning algorithms (particularly neural networks) into their major business operations. Developing interpretable machine learning models that are consistent with the existing financial markets and theories will resolve the bottleneck and will boost the applications of machine learning into finance. In this paper, we propose a novel neural network model tailored for implied volatility surface prediction. Implied volatility is an important financial metric or indicator that captures the market's view of the likelihood of changes in a given asset price. Technically, an implied volatility is defined as the inverse problem of option pricing, mapping from the option price of the asset in the current market to a single value [12] . When it is plotted against the option strike price and the time to maturity, it is called the implied volatility surface. Prior financial domain knowledge related to implied volatility surface includes: 1) the empirical evidence volatility smile; and 2) financial conditions such as the absence of arbitrage, the boundaries and the asymptotic slope. We use different ways to incorporate these domain knowledge. For the former, a new activation function that produces volatility smile is proposed, and it is used for the hidden nodes that process the underlying asset price. For the latter, financial conditions are embedded into the loss function for neural network training. In the experiments, we validate the proposed model with the option data on the S&P 500 index over a period of 20 years. Compared with the existing studies, our experimental settings are more challenging and this requires our model to be more robust and stable in producing convincing results. Our model outperforms the widely used state-of-the-art model in finance and other benchmarked neural network models on the mean average percentage error in both training and test sets. In the meantime, the incorporated prior financial domain knowledge are met empirically. Technology wise, our study makes a methodological contribution. We propose a framework of incorporating prior financial domain knowledge into neural network design and training. Therefore, the developed model is aligned well with the existing empirical evidence and financial theories related to implied volatility surface. This is an important step for interpretable machine learning, and we hope the framework can motivate many other investigations of machine learning applications in finance. On the other hand, from the application perspective, we develop a best-performing prediction model, and to the best of our knowledge, this is one of the very first neural networks tailored for implied volatility surface. The rest of the paper is organised as follows. Section 2 reviews the related literature. Section 3 introduces our proposed model for predicting implied volatility surface. The used dataset, our experimental settings and results are presented in Section 4. Finally, we conclude the paper in Section 5. Our research in this paper touches upon two streams of literature: mathematical finance and machine learning. For the former, we introduce the basic concepts and the related studies of option pricing and volatility modelling. For the latter, we review a number of recent applications of machine learning in finance, with special focuses on option pricing and volatility modelling. In 1973, Black and Scholes [6] proposed an elegant closed-form pricing formula for the European style call options written on financial assets. Their model is simply called the Black-Scholes option pricing model, in which an underlying financial asset price is driven by a geometric Brownian motion [37] that contains a drift and a volatility, and the volatility term shows the small fluctuations of asset returns representing risk. The seminal work of Black and Scholes opened the floodgates of studying mathematical models in finance, and volatility models have soon become popular since then [15, 39] . Volatility models in finance can be classified into two groups [22] . The first group is called indirect methods, in which an implied volatility is driven by another dynamic model such as local volatility models, stochastic volatility models and Lévy models [21, 24, 27, 34, 38] . Models in this group usually have a limited number of parameters, and the volatility term is fitted by the market data along with the asset dynamics such as the geometric Brownian motion and the mean-revision jump-diffusion process. These models exhibit mathematical elegance but are sometimes invalid empirically. Time-dependent parameters can be included but they will greatly increase computational time and optimisation difficulty in model calibration. The second group is called direct methods, in which an implied volatility is specified explicitly. Direct methods can also be divided into two types. The first type specifies the dynamics of an implied volatility surface and assumes it evolves continuously over time [9, 12] . The second type focuses on the static representation of implied volatility surface that uses either parametric or non-parametric methods to fit an implied volatility surface and then for prediction [13, 23, 26] . Our proposed method is a static model. In this group, the stochastic volatility inspired (SVI) model is the most commonly used method [16] . It models the implied volatility slice for a fixed time to maturity. Gatheral and Jacquier then further improved the SVI model with a simpler representation on the conditions for no static arbitrage, and this improved SVI model is called the surface SVI (SSVI) model, which is the recent advance in mathematical finance and has been widely adopted by investors [17] . Therefore, we choose the SSVI model as one of the benchmarked models in this paper. Applying machine learning to asset pricing and volatility prediction can be traced back to the late 1980's or early 1990's. In the early stages, single hidden layer neural networks were used to estimate option price [32] and to predict the volatility of the S&P 100 index [33] . Then, various machine learning algorithms were introduced, including ensemble methods [3, 18] , kernel machines [11] , Gaussian process [40] , and deep learning models such as hybrid neural networks [28] , gated neural networks [41] , and recurrent neural networks [31] . In addition to the conventional financial data, several other recent studies developed models which use verbal, vocal and social information [35, 36, 42] . It should be noted that our research in this paper focuses on predicting the implied volatility surface when the underlying asset price and the time to maturity of the corresponding option quotes are given. This kind of prediction in essence is an inverse engineering of option pricing and it is not time series forecasting. Therefore, our scenario is different to the above mentioned studies. Our neural network architecture design is inspired by the work of [41] but has three significant differences. First, our model aims to predict the implied volatility rather than option price. Second, we design a new activation function that can incorporate volatility smile. Third, we embed the conditions related to implied volatility into neural network training. In this section, we firstly introduce the preliminaries of implied volatility surface and lays down the mathematical settings. We then discuss our deep neural network architecture design and further explain how do we incorporate the prior financial domain knowledge into neural network. In mathematical finance, the spot price of an asset is usually modelled as a stochastic process ( ) ≥0 that is defined on a filtered probability space (Ω, F , (F ) ≥0 , P), where is the time index, Ω is the sample space, F is the sigma-field, (F ) ≥0 is the filtration, and P is the probability space. The financial market is assumed to be arbitrage-free and a financial product's time to maturity (i.e., the time remaining until a financial contract expires) is always finite. As mentioned previously, implied volatility is the inverse engineering of option pricing. It can be obtained by inverting the Black-Scholes option pricing model [6] , in which one needs to 1 1 2 Weighting network Figure 1 : Neural network architecture design tailored to implied volatility surface. The proposed multi model consists of several single models and their weights are determined by the weighting network. Bias terms are omitted; ⊗ is the multiplication gate operator; ⊕ is the addition gate operator. determine the constant interest rate and dividends from the market data. To avoid dealing with them, the forward measure can be used instead. Let ( , ) ≥0 be the forward price of the asset with maturity date , where 0 ≤ ≤ . Then where ( , ) is the price of a zero-coupon bond at time which will pay one unit at time . The absence of arbitrage ensures there exists an equivalent martingale measure in which ( , ) ≥0 is a martingale [12] . In probability theory, a martingale is a stochastic process for which, at a particular time, the conditional expectation of the next value in the process is equal to the present value and regardless of all the previous values. So the log forward moneyness can be defined and used as the underlying, where = log{ / , } and is the strike price. Another important variable is the time to maturity, which can be defined as where is the annualization factor. Therefore, in our mathematical setting, the implied volatility can be written as a function of the log forward moneyness and the time to maturity . the density function of a standard normal distribution, and (·) denote its cumulative function. The following conditions are required to be met for the implied volatility : Simply, Theorem 1 conditions 1-5 ensure the absence of arbitrage [20] ; conditions 6-7 specify the boundaries [8] ; and condition 8 is the asymptotic slope [29] . In addition to Theorem 1, implied volatility has an important empirical evidence (or stylised fact) called the volatility smile -for a given time to maturity, when the implied volatility is plotted against the strike price, it creates a line that slopes upward on either end, looking like a "smile" [12] . In the following discussion, we will present our neural network model by incorporating these mentioned financial conditions and empirical evidence related to implied volatility surface. volatility . The proposed neural network is constructed from subnetworks with two types of architectural structures: 1) several simple networks (called single networks) which predict the implied volatility separately; and 2) a weighting network which determines the "votings" of the predicted implied volatilities towards the final prediction. Similar to [41] , multiplication and addition gate operators are used to process and merge information related to the log forward moneyness , the time to maturity and single networks. Definition 1 (Smile Function). For any ∈ R, where tanh(·) is the hyperbolic tangent function and is a small value to ensure numerical stability. As shown in Figure 2 , the defined smile function exhibits a skew pattern like volatility smile. We apply the smile function (·) to the nodes that correspond to and the sigmoid function (·) for the nodes that correspond to . It is not difficult to see that the positivity and twice differentiation conditions in Theorem 1 are met and the limiting behaviour condition can be proven theoretically by inverting the Black-Scholes option pricing model [20] . Our proposed neural network can be expressed as follows: where is the number of single networks, is the number of layers in each single network, and is the number of layers in the weighting network. For the th single network, is its prediction output. To ensure it is non-negative, we consider the exponential form of the weightˆ( ) and the biasˆ( ) . For the hidden layers, and¯are the weight and the bias of the th hidden node that corresponds to , and˜and˜are the weight and the bias of the th hidden node that corresponds to . Therefore, there is a total of 5 + 1 parameter values for each single network. The weighting network predicts the weights of single networks towards the final prediction, where is its prediction for the th single network, ·, is the weight of the th hidden node, is the bias of the th hidden node, ·, is the weight of the th single network, and is the bias of the th single network. Since the dimensions of , , , are 2 × , × 1, × , and × 1, respectively, the total number of parameter values in the our model is (5 + + 2) + 3 . Training the designed neural network solves an optimisation problem that finds for parameters which result in a minimum loss when evaluating the samples in the training data. We define a loss function tailored to implied volatility surface by embedding the related conditions from mathematical finance: where ℓ 0 represents the data loss, ℓ 1 , . . . , ℓ 4 are the loss functions that incorporate financial conditions discussed previously, and ℓ 5 is the regularization term to avoid over-fitting. The data loss ℓ 0 is defined as a joint loss from the mean squared log error (MSLE) and the mean squared percentage error (MSPE): where is the total number of the training samples, is the ground truth implied volatility for the th sample,ˆis the predicted implied volatility for the th sample, and are hyperparameters that control the weights of MSLE and MSPE, respectively. We use the joint data loss here because it is efficient in dealing with sensitive data or high-dimensional feature spaces [19] . The monotonicity condition is specified by ℓ 1 , defined by where and are the number of samples, and ( , ) = ( , ) + 2 ( , ). The objective of ℓ 1 is to push ( , ) to be non-negative. This can be achieved by randomly sampling unique values from the domain of and unique values from the domain of . Penalty is added by ℓ 1 if ( , ) is negative for the sampled ( , ) pairs. The absence of a butterfly arbitrage condition is specified by ℓ 2 , defined as follows where The objective of ℓ 2 is to push ( , ) to be non-negative. This can be achieved using the same way as ℓ 1 , by randomly sampling unique values from the domain of and unique values from the domain of . The left and the right boundary conditions are specified by ℓ 3 , defined as follows where 1 The objective is to push both functions to be non-negative. To achieve this, we sample 1 unique non-negative values from the domain of , 2 unique negative values from the domain of and unique values from the domain of . The asymptotic condition is specified by ℓ 4 , defined by where and is a very small value. The aim of ℓ 4 is to ensure ( , ) is positive. Therefore, similar to ℓ 1 , ℓ 2 , ℓ 3 , we sample unique values from the domain of and unique values from the domain of . It is worth mentioning that the values of and in ℓ 1 , · · · , ℓ 4 can also be sampled from the training data. However, the trained neural network may fail to meet those conditions when the given values of and for prediction are out of the scope of the training data. If the training data have limited observations of input variables, creating synthetic data by sampling values from their domains is an effective way to train the model with good generalization capabilities [10, 30] The regularization term ℓ 5 is defined by where || · || 2 is the squared Frobenius norms. The used datasets are firstly introduced. We then provide the details of experimental design, including examined models and their training settings. The experimental results are finally presented and discussed. We use the option and the zero-coupon yield curve data from OptionMetrics, and the Overnight Index Swap (OIS) data from Bloomberg. 1 Our option data is for the S&P 500 index. It is one of the most commonly followed stock market indices which measures the 1 OptionMetrics is a leading provider of historical implied volatility, greeks, and option pricing data for financial markets and Bloomberg is a premier financial services company. Offered Rate (LIBOR). However, after the 2018 financial crisis, the LIBOR-based zero curve is not risk-free [1] . Therefore, adjustments are performed for the period after 01/01/2008. Specifically, we extract the OIS data from Bloomberg and bootstrap the zero rate curve. In addition, we use the cubic spline to interpolate the risk-free rates in order to match the option maturity, and compute the forward price using the put-call parity [5] . The option data is further processed. Option quotes which are less than 3/8 are excluded because they are close to the tick size and can be misleading. The bid-ask mid-point price is calculated as a proxy for the closing price. The in-the-money option quotes are excluded because of the small transaction volume [7] . The existing studies usually do not analyse option contracts with the time to maturity of less than 7 days [2] . However, as these options are getting popular recently (e.g., weekly index options), we here analyse option contracts with a short time to maturity and only exclude the contracts with the maturity of less than 2 days. Analysing options with a short maturity is challenging because this requires our model with high robustness and stability. As shown in Figure 3 , our prepared data finally contains 63,338 option contracts with 2,986,754 valid quotes. The quotes are then used to compute the ground truth implied volatility values by inverting the Black-Scholes option pricing formula. Table 1 summarises the examined models in the experiments. In addition to our proposed model (simply denoted by Multi), we also deploy the SSVI model, and other neural network models. For the former, we aim to investigate if our proposed model can Description SSVI [17] Multi The proposed model specified in Eqs. (2)- (11) . Multi † The Multi model trained without embedding ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 . The single network model so there is no weighting network, and || || 2 and || || 2 are not included in the regularization term ℓ 5 for the model training Single † The Single model trained without embedding ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 . Vanilla The neural network model with the simplest architecture -it has a single hidden layer which only uses the sigmoid activation function and the model's output is censored to be non-negative. Vanilla † The vanilla model trained without embedding ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 . Table 2 : Hyperparameter settings of neural network models, where all models use the same learning rate 0.1 and the same number of iterations 2e+4. achieve a better prediction performance than the state-of-the-art method from mathematical finance. For the latter, we want to see if the designed architecture (i.e., the ensemble of multiple single networks) can improve the prediction capability. To this end, the benchmarked neural networks include: 1) a single network (denoted by Single) where there is no weighting network ; and 2) a simplest neural network (denoted by Vanilla) which has a single hidden layer, only uses the sigmoid activation function and the model's output is censored to be non-negative. Also, to further justify the importance of embedding financial conditions, all the neural network models are trained under a setting where ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 are removed from the total loss function ℓ. To simplify the discussion, these neural network models are denoted with a superscript †. Table 2 presents our hyperparameter settings of the examined neural networks. To avoid the size effect on model performance, the compared neural networks with the same architecture design are specified with the same model size and with the same hyperparameter values. As also mentioned earlier in Section 3, embedding ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 requires synthetic data. In our neural network training, the ratio of real market data and synthetic data is 1/6. The log forward moneyness is sampled in [−6, −3] ∪ [3, 6] Figure 3 . Neural network models are trained using TensorFlow and we use Adam [25] for stochastic optimisation. We train the models using all option quotes from the previous trading day and test the models in the next day. We move the window of training and test split across all the trading days in the option data. The summary statistics of the mean average percentage errors (MAPEs) of the predicted implied volatilities for the examined models in all trading days are presented in Table 3 . To further investigate the effects or differences that the predicted implied volatilities can trigger in option pricing, we use them to compute the corresponding option prices and then report the summary statistics of the MAPEs of the option prices in Table 4 . In both tables, the widely used SSVI model from mathematical finance underperforms the Multi, Multi † , Single and Single † models significantly but it still outperforms the simple neural networks like the Vanilla and Vanilla † models. Compared to conventional mathematical models in finance, data-driven deep learning models have shown great predictive capabilities but they should be with a proper architecture design and hyperparameters settings. Our study has been successfully validated with the results because the proposed Multi model is the best-performing prediction model in both training and test data Tables 3-4 is that excluding some financial conditions in training neural networks will not significantly decrease the models' prediction performance because the Multi † , Single † and Vanilla † models have a comparable performance with their counterparts which were trained with the full settings (i.e., the Multi, Single and Vanilla models). However, as discussed previously, incorporating the prior financial domain knowledge is mainly to ensure the model is consistent with the existing financial theories and assumptions rather than the model's prediction performance. In Table 5 , we check if the monotonicity, boundary, absence of butterfly arbitrage and asymptotic slope conditions in Theorem 1 are empirically satisfied in the test data. Since we have discussed in Section 3 that the positivity and twice differentiation conditions are met in our neural network architecture design and the limiting behaviour condition can be proven theoretically, these three conditions are not checked by Table 5 . It is not difficult to observe that the violation percentage of the test samples for the examined conditions of the Multi † , Single † and Vanilla † models are much higher than the corresponding Multi, Single and Vanilla models, showing the importance and necessity of embedding ℓ 1 , ℓ 2 , ℓ 3 , ℓ 4 loss functions. For illustration purpose only, Figure 5 demonstrates the limiting behaviour of the converted underlying forward contracts with 11, 32, 109, and 704 days duration, verifies the limiting behaviour condition. Further, to get a sense of what an implied volatility surface looks like for our readers, Figure 6 shows the surfaces resulting from the Multi model and the Multi model trained without the regularization term for 11/01/2016. It is clear that the volatility smiles in the former are much smoother than those of the latter, which also verifies the effectiveness of regularization. In this paper, we developed a novel neural network to predict implied volatility surfaces. Unlike many previous studies where the machine learning algorithms are mainly used as the "black box" in finance, our model is tailored to the unique characteristics of implied volatility surface. To the best of our knowledge, this is one of the very first studies which discuss a methodological framework that integrates the data-driven machine learning algorithms (particularly neural networks) with the related financial theories and empirical evidence. The proposed model framework can be easily extended and applied to solve other similar computational problems in finance and business analytics such as inventory pricing and revenue management. In addition to the methodological contribution, we validated the proposed model empirically with the option data on the S&P 500 index. Compared with the existing studies, our experimental settings are more challenging because the used option data is over 20 years and the options with the short time to maturity are examined. Therefore, our model needs to be more robust in order to produce convincing results. As presented in the experiments section, our model outperforms the widely used SSVI model from mathematical finance and other benchmarked neural networks. More importantly, the conventional financial conditions and empirical evidence are met empirically, which resolve the bottleneck of data-driven machine learning applications in finance. Everything you always wanted to know about multiple interest rate curve bootstrapping but were afraid to ask. SSRN Short-term market risks implied by weekly options Semi-parametric forecasts of the implied volatility surface using regression trees The impact of Covid on machine learning and data science in UK banking The term structure of implied dividend yields and expected returns The pricing of options and corporate liabilities Option-implied risk aversion estimates Stochastic skew in currency options A new simple approach for constructing implied volatility surfaces. SSRN Face generation for low-shot learning using generative adversarial networks Stable local volatility function calibration using spline kernel Dynamics of implied volatility surfaces B-spline techniques for volatility modeling The future of FinTech and financial services: what's the next big bet Valuation of volatility derviatives as an inverse problem A parsimonious arbitrage-free implied volatility parameterization with application to the valuation of volatility derivatives. Presentation at Global Derivatives & Risk Management Arbitrage-free svi volatility surfaces Boosting-based Frameworks in financial modeling: application to symbolic volatility forecasting Analytically Tractable Stochastic Stock Price Models A closed-form solution for options with stochastic volatility with applications to bond and currency options Implied volatility surface: construction methodologies and characteristics. arXiv To sigmoid-based functional description of the volatility smile Exact simulation of the Wishart multidimensional stochastic volatility model Adam: a method for stochastic optimization Arbitrage-free implied volatility surfaces for options on single stock futures A jump-diffusion model for option pricing Volatility forecast using hybrid neural network models The moment formula for implied volatility at extreme strikes Generating and exploiting large-scale pseudo training data for zero pronoun resolution A neural stochastic volatility model A neural network model for estimating option prices Using neural networks to forecast the S&P 100 implied volatility Option pricing when underlying stock returns are discontinuous What you say and how you say it matters: predicting financial risk using verbal and vocal cues Volatility prediction using financial disclosures sentiments with word embeddingbased IR models Proof that properly anticipated prices fluctuate randomly Pricing average and spread options under localstochastic volatility jump-diffusion models Continuous-time methods in finance: a review and an assessment Gated neural networks for option pricing: rationality by design Stock volatility prediction based on self-attention networks with social information Machine Learning and Option Implied Information This research has been conducted with the support of: 1) the Financial Innovation Center of the Southwestern University of Finance and Economics; and 2) the Key Laboratory of Financial Intelligence and Financial Engineering of Sichuan Province. 3) the UK Economic and Social Research Council through the Impact Acceleration Accounts Business Booster Funding to the University of Glasgow. The first author also acknowledges the Imperial College Business School with the support of the high performance computing equipment for experiments during his PhD study [43] . The authors would also like to thank anonymous reviewers for their helpful comments on earlier drafts of the manuscript.