key: cord-0713981-ay4wcmk3 authors: Long, Wen; Zhao, Manyi; Tang, Yeran title: Can the Chinese volatility index reflect investor sentiment? date: 2020-10-20 journal: nan DOI: 10.1016/j.irfa.2020.101612 sha: 263e9de63add0dbfda5ddfbf1d62bafbcdb52034 doc_id: 713981 cord_uid: ay4wcmk3 The volatility index is the implied volatility calculated inversely from the option prices. This study investigates whether the official Chinese volatility index, iVX, can represent investor sentiment. In order to describe investor sentiment comprehensively, we build a three-dimensional investor sentiment measurement system composed of macro, meso and micro level, and decompose iVX into three components to obtain short-term, medium-term fluctuations and long-term trend by EEMD method. The relationships between iVX, its components and sentiment indexes at each level have been analyzed separately, and the empirical results reveal all components of iVX can reflect the investor sentiment at the corresponding level but to which extent they can reflect are not the same. Further we introduce the mixed-frequency dynamic factor analysis to extract the common sentiment factor, which shows stronger correlation with contemporaneous iVX, compared with the sentiment indexes at each level. The ADL model in robustness check also demonstrates the results. Our findings confirm iVX can represent the common sentiment and expectations of Chinese investors in different time scales. Since the stock came into being, its risk characteristics have been the focus of investors. Risk refers to the uncertainty of return, which is generally defined as volatility. According to different calibers and methods, volatility can be categorized into historical volatility, predicted volatility, and implied volatility. Historical volatility is calculated as the average deviation from the average price of a financial instrument in the given time period. Predicted volatility is the volatility estimated based on asset conditions, economic situation, historical experience, etc. When calculating the option price, it is brought into the option pricing formula to obtain the price. Implied volatility is the volatility inversely derived by putting the market price of the option and other known parameters except volatility into the Black-Scholes option pricing model. Investor sentiment refers to a belief formed by the expectation of future cash flows and investment risks of assets, which yet does not fully reflect the existing facts (Wurgler and Baker, 2006) . This kind of belief is not only influenced by the fundamentals of assets and the information transmission of capital market, but also strongly related to the education, personal experience and personal preference of investors. Therefore, for the same asset, different investors will hold different beliefs, namely sentiment. Herd effect and other phenomena show that sentiment is contagious, which will lead to consistent actions among people. Due to this irrational behavior, if limited arbitrage exists, asset prices will have systematic bias. In 1993, Chicago Board Options Exchange (CBOE) delivered the world's first volatility index VIX, which is the implied volatility calculated using the S&P 500 index. In 2016, Shanghai Stock Exchange of China also announced the first official certain advantages in VIX futures pricing. Kanniainen et al. (2014) price S&P500 index options based on the GARCH model and the VIX index. Basher at el. (2016) compare the effects of hedging market risks with indicators such as VIX index and oil prices. Some researchers also study the volatility index in other stock markets (Siriopoulos and Fassas, 2012; Badshah I. et al., 2018) . Due to the late release of the Chinese volatility index iVX, there are few studies on iVX. The related researches mainly focus on three aspects: the prediction ability of iVX (Qiao et al., 2019) , the relationship between iVX and the yield, including the "leverage effect" Yue et al., 2019) , and the inclusion of volatility index as an indicator in the sentiment index construction system (Xu and Zhou, 2018) . In order to measure the expectation of investor, scholars put forward the concept of investor sentiment index. Previous methods for obtaining investor sentiment indicators can be divided into three categories: (1) Investor sentiment index composed of questionnaires and interviews. American studies mainly use the University of Michigan consumer survey sentiment index, while Chinese indexes mainly include "CCTV Watch Index", good and bad index by Stock Market Trend Analysis Weekly, -Long and Short Polls‖ by Huading, consumer confidence index et al. (Lemmon, 2006; Schmeling, 2009) . Although the survey method aims to directly quantify investor sentiment, its operation cost is high, the data frequency of constructing sentiment index is relatively low, and the time span is relatively short. (2) Indirect indicators obtained from the historical data of the stock market. This method mainly uses historical data of several proxies that can be observed on the stock market to synthesize the investor sentiment index. At present, the academic community mainly adopts the method proposed by Baker and Wurgler (2007) . They use principal component analysis to extract a sentiment index from six variables of the stock market, including closed-end fund discount, NYSE share turnover, the number and average first-day returns on IPOs, the equity share in new issues, and the dividend premium. Although the approach of using indirect indicators in the stock market is more time-saving and labor-saving than survey and interview, it is limited by some conditions such as index selection and synthesis methods. Furthermore, using market J o u r n a l P r e -p r o o f variables as proxies for investor sentiment may not only reflect investor sentiment, but also reflects the equilibrium results after the interaction of sentiment and other economic factors (Da et al., 2014; Qiu and Welch, 2004) . (3) Analysis of information on the Internet. Many researchers start with the Internet information and apply text analysis to analyze the information released by investors on social platforms or searched content to get the investor sentiment. This method avoids the shortcomings of the first two methods, but is susceptible to network noise information. Text data mainly includes Yahoo Finance posts (Antweiler and Frank, 2004; Das and Chen, 2007; Kim and Kim, 2014; Tsukioka et al., 2018) , WeChat (Shi et al. 2018) , Weibo platform (Checkley et al., 2017; Renault, 2017) , Twitter (Behrendt and Schmidt, 2018; Li et al., 2017) and Google search (Da et al., 2014; Gao et al. 2019) , etc. The purpose of this paper is to investigate the ability of the Chinese volatility index iVX to represent investor sentiment. Since iVX is computed given the option prices and the prices reflect the expectations of the stock market by the investors, thus iVX reveals the investors' predictions towards future of the stocks. And the predictions reflect the investors' sentiment. Therefore, iVX can simultaneously represent the investors' sentiment in theory. Li et al. (2019) find that iVX is negatively associated with the price and return of Shanghai 50ETF separately, which indirectly proves that iVX could reflect investors' panic sentiment, but did not directly measure the sentiment. Regarding the sentiment representation of the VIX created by CBOE, most of the researches take it as the known facts and conduct research on this basis (Pan, 2018; Smales, 2017; Yang et al., 2016) . However, there is little literature that confirms VIX has this effect. Some of them indirectly proves it by analyzing the negative correlation between VIX's change and the return rate of the underlying index (Smales,2016; Whaley, 2000; Whaley, 2009) . The researches which directly prove it only involve the relationship of volatility index on sentiment index based on news, social media or other single-level sentiment (Pineiro et al., 2016; Smales, 2014; Zhang, 2011) . In order to obtain fluctuations on different time scales, this paper will use Ensemble Empirical Mode Decomposition (EEMD) to decompose iVX. EEMD provides a J o u r n a l P r e -p r o o f feasible way to deal with non-linear and non-stationary data. It consists in a local and fully data-driven separation of a signal in fast and slow oscillations (Torres et al., 2011) . It has been widely used in industry and resources (Wang et al, 2014; Wang et al.,2013; Lei and Zuo, 2009) . Meanwhile, Zhang (2008 ) and Yu et al. (2008 have successfully applied this idea to the field of social science, and prove that the integrated empirical mode decomposition method is a useful approach for financial time series analysis. The contributions of this paper include the following three aspects: (1) This study seeks to examine systematically on whether iVX has the ability to represent sentiment, and analyze it at the macro, meso and micro levels, while previous studies rarely discuss this issue, especially for the newly released and short-lived Chinese volatility index iVX. (2) This paper constructs a three-dimensional comprehensive measurement system of investor sentiment, which is composed of three levels: macro-economy, stock market and individual opinions. The frequency is monthly, weekly and daily, respectively, including rational and irrational components, so our measure system can capture more information about the investor attitudes. However, most of the previous studies only focus on one level of sentiment, such as micro-blog sentiment from the micro perspective. (3) By employing dynamic factor analysis on different sentiment indexes with mixed-frequency to extract the common factor, we investigate whether iVX can comprehensively represent investor sentiments at different time scales. The remaining of the article is organized as follows. Section 2 introduces iVX and compiles sentiment indexes at different levels. Section 3 decomposes iVX by EEMD to get three components and analyzes their representation on investor sentiment at different levels. Section 4 applies mixed-frequency dynamic factor analysis to obtain a common sentiment factor, and discuss its relationship with iVX. Section 5 is the robustness test. Finally, this study concludes with a few remarks. J o u r n a l P r e -p r o o f Chinese volatility index iVX, released by the Shanghai Stock Exchange, can be traced back to February 9, 2015, and was suspended on February 22, 2018 for unknown reasons. Using model-free methodologies similarly to VIX, the iVX is estimated from the bid and ask prices of the underlying options for Shanghai 50ETF. This paper collects all the data during the iVX release period, with 740 effective observations from the WIND database. and 50ETF's variation Δ50ETF in the sample period, respectively. It is clearly visible from the graphs that when the Chinese stock market falls sharply in mid-2015, iVX rises to a historical peak of 63.79, which suggests iVX is negatively related to the underlying assets to a certain extent. The left vertical axis depicts ΔiVX and the right depicts Δ50ETF. (EXCEL format , 1-column fitting image), Table 1 reports the summary statistics for iVX and ΔiVX. The mean value of iVX is 23.879 and ΔiVX is approximately zero. The two are both positively skewed. ΔiVX presents excess kurtosis while iVX does not. The above two points suggest that big positive changes of iVX occur more frequently than large negative changes and vice versa. The reported first-order autocorrelations show that iVX is highly persistent, but ΔiVX is not. The Augmented Dickey Fuller (ADF) and PP tests on the levels, reject the null hypothesis of unit root for ΔiVX, while accepts for iVX. hypothesis at the 1%, 5% and 10% significance levels, respectively. Investor sentiment reflects the expectation of investors for the future economy and stock market. This paper builds a three-dimensional measurement system of investor sentiment, which depicts sentiment from three levels: macro, meso and micro. Among them, macro level quantifies investor sentiment towards the whole economy; meso level measures investor sentiment towards the stock market; and micro level shows investor sentiment towards each individual stock. The three levels of sentiment index are monthly, weekly and daily data separately. In order to reflect the latest situation of economic development in time, taking into account the representativeness of China's macroeconomic cycle variables and the availability of data, this paper uses the Purchasing Manager Index (PMI) and Production Price Index (PPI) on the production side, the Consumer Satisfaction Index (SI) , Consumer Price Index (CPI) and total retail sales of consumer goods (RSCD) on the consumer side, and the consistent index of business climate index (CBCI) indicating the economic boom and bust issued by the China Economic Prosperity Monitoring Center, to form a composite macro-level index of sentiment. These data are monthly and from WIND database. (EXCEL format , 1-column fitting image), Further, considering the fact that some variables take longer to reveal the same sentiment (Baker and Wurgler, 2006) , if the variables are synthesized directly in the same period, the composite index will not effectively reflect the market sentiment movement. Referring to Baker and Wurgler (2006) , this paper uses the following two (EXCEL format , 1-column fitting image), The formation of the investor sentiment index at the meso level is mainly based on the method of Wurgler (2006, 2007) . The six proxy variables they selected are: the closed-end fund discount, NYSE share turnover, the number and average first-day returns on IPOs, the equity share in new issues, and the dividend premium. However, compared with the US stock market, the Chinese stock market has its own particularities, mainly containing the following three points: (1) Almost every IPO will hit a daily limit on the first day of listing, which means the average value for first day's earnings is almost 44%. Among the above variables, in theory, the price of closed-end funds should be consistent with the value of the unit net assets of the stock portfolio. However, in reality, closed-end funds often trade at a discount (Lee et al., 1991) . It is generally believed that the higher the absolute value of the discount rate, the less optimistic view investors take about the market outlook. Since FDDR is negative in normal conditions, it is positively correlated with investor sentiment. TO signifies market liquidity. Under the restrictions of market short selling and the participation of irrational investor, high liquidity is often accompanied by overvaluation, which has positive correlation with investor sentiment. In addition, PE and NIA are also positively associated with sentiment. In order to control the impact of fund size, we adopt the following criteria for fund Table 4 reports that there is a certain degree of association between the four sentiment agent variables. J o u r n a l P r e -p r o o f (EXCEL format , 1-column fitting image), We use the same method as section 2.2.1 to select the proxies or their lags, and then perform principal component analysis to calculate the meso-level sentiment index. The selected terms are TO t-1 , PE t , NIA t-1 , and FDDR t . Taking of the sample variance respectively, thus the two factors with cumulative rate of 81.351% capture much of the common variation. Therefore, we take weighted average of the first and second principal components as meso sentiment index, without the macro influence. The resulting sentiment index MeSI is shown in Eq. (2), and its variation throughout the sample period is shown in Fig. 6 . The index falls sharply in the mid-2015, after then appears to be less volatile. (EXCEL format , 1-column fitting image), Based on investors' online opinion posts on the Internet social media, the investor sentiment at micro level is extracted and analyzed from massive text data. In terms of data acquisition, we use web crawler tools to grab the information on the East money stock forum, which has a considerable number of visits and influence among the stock forums in China. According to ranking of visitor volume by Alexa's website and Baidu weight rankings, East money stock forum currently ranks first in the major There are usually two types of sentiment classification, based on sentiment dictionary or classifier, respectively (Liu et al. 2012) . It is hardly possible to meet the professional needs in the field of finance by using a general sentiment dictionary in natural languages, and empirical evidence has shown that the classification results obtained by the latter are better than the former (Wu et al., 2014) . Therefore, on the grounds of previous research, we adopt machine learning algorithm and choose SVM classifier to identify investor sentiment and calculate the micro sentiment index. Considering that online posts contain a great deal of noise, we use two-step classifier for sentiment analysis (Shi et al., 2018) . The first step is to get rid of noise by separating the text into noise and non-noise, and the second step is to divide non-noise text into bullish and bearish ones. We use several technologies including data cleaning, text representation, feature extraction and classification, to compute investor sentiment index with individual sentiments, which will be applied into following study. (1) Data cleaning The posts on the online stock forum involve a lot of punctuation, noise, etc. which cannot be directly used to sentiment analysis. Therefore, in the data cleaning process, we first remove the punctuation and gibberish, and then operate word segment. (2) Text representation In order to enable the computer to read text, words must be changed into digital J o u r n a l P r e -p r o o f data for computer processing. In this paper, Word2vec, which is commonly used in academia, is applied to represent words. It can calculate word vectors according to the context of words, fully capture the semantic information of the context, and has good performance in text classification (Lilleberg et al., 2015; Wolf et al., 2014) . Then we use TF-IDF (term frequency-inverse document frequency) algorithm to compute the weight of words in short text, and provide weighted word2vec text vector. Word2vec is a tool based on deep learning and released by Google in 2013 . This neural network includes two architectures: Continuous Bag-of-Words Model (CBOW) and Skip-Gram. The former predicts the current word based on the context, while the latter predicts surrounding words given the current word (Mikolov et al., 2013) . Compared with the CBOW model, Skip-Gram has higher semantic accuracy, higher computational complexity, and longer model training time. In this paper, the Skip-Gram model is used to predict words through context, the mathematical representation (3) is as follows: where the input is a word in the corpus. The main idea of TF-IDF is if a word or phrase appears frequently in one article and rarely in other articles, we infer that this word or phrase shows good performance on distinguishing different text information. Formula (4) shows TF-IDF is the product of TF term frequency and IDF reverse document frequency. where , is the number of times that the word appears in the file , while the denominator ∑ , is the sum of number of times that all the words appear in the file , |D| represents the total number of files in the corpus, and |{ : ∈ }| represents the number of files containing . By adding the weighted word2vec vector of words in document, we get the new J o u r n a l P r e -p r o o f vector R( ) of document : where 2 ( ) is the word2vec vector of word . (3) Classifiers for sentiment identification We use Support Vector Machine (SVM) to identify investor sentiment, which has already become an important approach for classification due to its outstanding performance (Cortes and Vapnik, 1995; Deng et al., 2012) . In the first step of "noise elimination" identification, non-noise data is labeled with +1 and noise data with -1. In the second step of "bullish-bearish" identification, bullish sentiment is labeled with +1 and bearish sentiment with -1. Through the procedures of classifying the text, we can identify the investors' attitudes towards individual stock. Table 4 reports the classification accuracy rate obtained by the 10-fold cross validation method. It indicates that the classification accuracy and recall rate based on the SVM algorithm have reached more than 70%. After identifying individual investor sentiment, we can combine the views of all individual investors into micro investor sentiment index. In accordance with the method in previous literatures (Antweiler and Frank, 2004; Kim and Kim, 2014; Wu et al., 2014) , we define as total bullish posts in time interval t, and as total bearish posts in time interval t. The calculation for sentiment index is: Eq. (6) is used to compose the CSI 800 daily micro investor sentiment index, named as MicSI, which can reflect the comparison of investors' long or short views. The higher the investor sentiment index, the more investors hold positive expectation on the future stock market. average, investor sentiment is towards bearish, which demonstrates the viewpoint of investors' irrational biases in behavioral finance (Odean, 1998) . (EXCEL format , 1-column fitting image), The U.S. volatility index VIX has the ability of reflecting investor sentiment, thus it is known as "the investor fear gauge". In order to study whether the Chinese volatility index iVX possess this capability, we conduct the following research. First, we decompose iVX into short-term, medium-term and long-term fluctuations, considering macro-level sentiment is the investor's sentiment towards the market, which lasts for a long time, and the persistence of meso level and micro level sentiment decreases in turn. Then we explore the relationship between the decomposed iVX and corresponding sentiment respectively. Compared to that, the relationship between the iVX and sentiment is also examined on each level. With the above process, we can verify whether iVX is a proper representation of investor sentiment. Previous studies show short-term volatility clustering (Gray,1996; Cont, 2007) and long-term mean reverting of the volatility (Bollerslev and Mikkelsen, 1996; Arav et al.,2018) , which indicates the characteristics of stock market on different time scales are not consistent. In order to obtain the fluctuations on different time scales, we need to decompose iVX. Since the duration of three levels of sentiment is different, if iVX has the ability of representing sentiment, the short-term with high frequency, medium-term fluctuations with low frequency and long-term trend obtained after decomposition should have a relationship with corresponding sentiment at the level of macro, meso and micro. We use EEMD to operate decomposition. Compared with wavelet analysis, it can avoid the instability of the result caused by manual selection of wavelet function. Moreover, compared with fast Fourier transform, it can realize the analysis of high-frequency data volatility. Therefore, EEMD is chosen to decompose iVX. (1) EEMD model The EEMD algorithm can be summarized as follows: Step 1. Initialize the number of realizations and amplitudes of added white noise. Set m equal 1. Step 2. Perform the mth EMD decomposition. 1)Add white noise ( ) with the given signal ( ). ( ) = ( ) + ( ) J o u r n a l P r e -p r o o f Where ( ) is the white noise added at the mth time, and ( ) is the signal containing white noise at the mth time. 2)After employing EMD to decompose the signal ( ), we get a group of IMF , ( = 1,2, ⋯ , ), where , is the nth IMF in the mth decomposition. 3)If m