key: cord-0717188-f5xd4mtc authors: Xu, Weijun; Fu, Zhineng; Li, Hongyi; Huang, Jinglong; Xu, Weidong; Luo, Yiyang title: A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model date: 2022-04-04 journal: Stat Anal Data Min DOI: 10.1002/sam.11582 sha: 8837801eda830aaf7d5ca915f4e00a1169bd1191 doc_id: 717188 cord_uid: f5xd4mtc Coronavirus 2019 (COVID‐19) has caused violent fluctuation in stock markets, and led to heated discussion in stock forums. The rise and fall of any specific stock is influenced by many other stocks and emotions expressed in forum discussions. Considering the transmission effect of emotions, we propose a new Textual Multiple Auto Regressive Moving Average (TM‐ARMA) model to study the impact of COVID‐19 on the Chinese stock market. The TM‐ARMA model contains a new cross‐textual term and a new cross‐auto regressive (AR) term that measure the cross impacts of textual emotions and price fluctuations, respectively, and the adjacent matrix which measures the relationships among stocks is updated dynamically. We compute the textual sentiment scores by an emotion dictionary‐based method, and estimate the parameter matrices by a maximum likelihood method. Our dataset includes the textual posts from the Eastmoney Stock Forum and the price data for the constituent stocks of the FTSE China A50 Index. We conduct a sliding‐window online forecast approach to simulate the real‐trading situations. The results show that TM‐ARMA performs very well even after the attack of COVID‐19. In the end of 2019, coronavirus 2019 (COVID-19) suddenly attacked human race, and caused serious damage to the economy of China and even the whole world. The financial market fluctuated violently in the following months and caused huge losses to investors. Four circuit breakers were triggered in the U.S. stock market and the Shanghai Composite Index fell sharply and violently fluctuated between 3127. 17 and 2646.8. The spread of panic was the main reason for the stock market crash. As we know, the rise and fall of any specific stock is also influenced by the fluctuations of many other stocks. Web textual information has great influence on the emotions of investors [3] . The discussions in stock forums affect the emotions of investors who participate in the discussions and browsing the posts, and they spread the sentiment to other investors. Investors and stocks in the stock market form a complex system, which leads to herd behavior. There are some researches that study the impact of COVID-19 on the stock markets. The research of Al-Awadhi et al. [2] based on a panel data found that the number of confirmed cases and death of COVID-19 have significant negative impact on Chinese stocks. Zhang et al. [25] showed that COVID-19 caused the increase of stock markets' risk all over the world. Akhtaruzzaman et al. [1] showed that the conditional correlations between the stock returns of G7 countries and China increased significantly, which is the result of the transmission of financial contagion. However, these literatures rarely researched the textual sentiment and the propagation effect among the investors and stocks. The Chinese stock market was one of the earliest stock markets hit by COVID-19, so it is meaningful to research the impact of COVID-19 on the Chinese stock market. To better measure the impact of inverters' emotions, we take textual information into account. Our textual information is collected from the Eastmoney Stock Forum (https:// guba.eastmoney.com), a leading stock forum in China. There is a specific stock forum for every stock listed in Chinese stock market, but some stock forums are inactive, with relatively few textual information. Fortunately, the stock forums for most of the large capital stocks are active, because many investors pay attention to them. Considering the average number of posts per stock forum, we choose the constituent stocks of the FTSE China A50 Index as the research object. Therefore, our dataset are the posts in Eastmoney Stock Forum and price data for constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. We compute the daily textual sentiment score for every stock by an emotion dictionary-based method. To verify the change of Chinese stock market before and after the attack of COVID-19, we compare the excess returns of prediction models in the two different stages. Autoregressive Moving Average model (ARMA) is a commonly used model for price prediction. Considering the cross impact among stocks and textual information, we propose a new Textual Multiple ARMA (TM-ARMA) model. TM-ARMA constructs a new multiple cross autoregressive term and a new multiple cross textual term to measure the cross impact on price fluctuations and textual emotions, respectively. The adjacent matrix that measures the relationships of stocks is very important for multiple ARMA models. Different from Vector Auto Regression (VAR), Vector ARMA (VARMA) and Vector ARMA with eXogenous regressors (VARMAX) models [17] which consider the adjacent matrix as an all-one matrix, TM-ARMA updates the adjacent matrix every day by the correlation coefficient matrix for the return series of the last 20 trading days, which reflects the relationships of stocks more accurately. To estimate the parameter matrices, we construct a log-likelihood objective function for prediction errors, and apply matrix-form gradient descent methods based on the solutions for partial derivatives of matrices provided by Petersen and Pedersen [11] . ARMA and its extended models are widely used for financial predictions [6, 16] . Many researchers value the importance of interactions between assets and improved multivariate ARMA models. Thornton [21] proposed a mixed-frequency multiple ARMA by employing mixed stock flow data, and it has a good performance in the simulation. Lennon and Yuan [13] proposed a multivariable ARMA model combining digitized Gaussian and Monte Carlo Expectation Maximization method (EM-ARMA) which was shown to be effective in the simulation. Textual information from stock forums is used in the research of sentiment in recent years. Yao et al. [23] constructed an investor attention index by the posts from the Eastmoney Stock Forum and Hexun Stock Forum. Yang et al. [22] used the textual information from Eastmoney Stock Forum to build a panic sentiment index, and studied the impact of textual sentiment on crash risk. The emotion dictionaries are widely used for the computation of sentiment scores. Li et al. [14] mapped the words from the news into a sentiment space of four different sentiment dictionaries, and built sentiment scores for stocks in Hong Kong. Groß-Klußmann et al. [9] proposed an unsupervised learning method based on the positive and negative word lists from emotion dictionaries to build sentiment scores of microblog messages. Siering [20] applied a dictionary-based method that computes the textual sentiment by the number of positive and negative words. Maximum likelihood estimation and gradient descent methods are widely applied to estimate parameters [5, 12, 15] . It is also commonly used in the extended ARMA models. Gong et al. [8] modified the conventional two-stage maximum likelihood estimation method by non-Gaussian QML estimator for the ARMA-GJR-GARCH process. Lennon and Yuan [13] estimated the parameters for their proposed multivariable ARMA model by a log-likelihood function and partial derivatives. Grid search is a commonly considered way of searching for the possible hyperparameter combinations, especially when the number of combinations to be searched is not very large [7, 18, 19, 24] . The performance of models is fully evaluated by grid search. It is also widely used in some extended ARMA models [10] . To make our empirical study consistent with the real world trading, we propose a sliding-window online approach. Before 9:30, the model forecasts price change rates of all the 50 stocks for the new day, and select stocks that are predicted to rise and put them into a portfolio. At 9:30, the strategy suggests to buy all the stocks in the portfolio with equal funds and hold them until 9:30 of the next trading day. Transaction fees and stamp duties are taken into account. The rest of this paper is arranged as follows. Section 2 discusses our TM-ARMA model and its solutions for parameter matrices. Section 3 deploys the empirical study. Section 4 analyses the empirical results. Section 5 is the conclusion. We denote y t ∈ R N×1 as the daily returns for the N stocks at day t, andŷ t ∈ R N×1 is the predicted vector of y t , where 1 ≤ t ≤ T, and T is the number of trading days. Considering the transmission effect of trading sentiment in stock market and textual investor sentiment in stock forums, we construct a new cross-AR term and a new cross-textual term, respectively. We propose a new TM-ARMA model as shown in Equation (1), where ⊙ is the Hadamard product operator, p, q and r are hyperparameters, i , i , i ∈ R N×1 are parameter vectors. A ∈ R N×N is a dynamic adjacency matrix. The larger the A ij , the greater the cross impact of stock i on stock . x t ∈ R N×1 is an exogenous vector which reflects the textual sentiment of each stock at day t. In the right hand side of Equation (1), the first part ∈ R N×1 is the intercept vector. The second part is a multiple autoregressive term of order p, which measures the cross trading sentiment among stocks in prices. The third part consists of the moving average terms of order q. The fourth part is an exogenous cross-textual term of order r, which measures the cross textual sentiment in stock forums. The adjacency matrix of A 1 and A 2 plays important roles in the cross-AR term and cross-textual term. The elements in A 1 and A 2 measure the cross impact between two stocks. We denote Y ∈ R N×T as the series of daily returns, X ∈ R N×T as the series of textual sentiment, and E ∈ R N×T as the series of prediction error. Y a∶b = ( y a , y a+1 , … , y b ) ∈ R N×(b−a+1) denotes the daily returns from a to b, where 1 ≤ a < b ≤ T. Similarly, X a∶b ∈ R N×(b−a+1) and E a∶b ∈ R N×(b−a+1) denote the textual sentiment and prediction error from a to b, respectively. It is well known that the strength of the interaction between stocks change dynamically in response to market sentiment in the past few weeks. To better adapt to the dynamic changes, we set A 1 as the correlation coefficient matrix of the return series in the past W trading days, and A 2 as the correlation coefficient matrix of stocks' sentiment index series in the past W trading days, as shown in Equation (2), where Corr is the function to calculate the correlation coefficient matrix, and W is the length of the training window. A 1 and A 2 are updated daily. To better understand the model, we write Equation (1) as Equation (3) We can also easily find out TM-ARMA's improvement on VARMA and VARMAX by Equation (3). Firstly, the adjacent matrix A is an all-one constant matrix in VARMA and VARMAX, but in TM-ARMA A is dynamically updated every day. In VARMA and VARMAX, the relationships of stocks are constant and the historical data of all stocks are of equal importance when forecasting the return of one specific stock. However, as we know, the short-term relationships of stocks change over time with market sentiment. Secondly, in VARMA and VARMAX, the third term in the right hand side is the cross MA term that is similar to the Cross-AR term in form, which combines the prediction errors of all the N stocks when forecasting. Our experiment shows that the cross MA term causes too big unstable noises and results in instability of prediction values. As a result, to avoid unstable noises, TM-ARMA applies the single MA term that considers only the errors of the stock to be predicted in Equation (3). As the posts of stock forums do not have the labels of sentiment polarity, the methods of supervised learning do not work. In this paper, the textual sentiment series X in TM-ARMA is computed based on the emotion dictionaries. We have a list of positive words and a list of negative words, and they do not share the same words, which are described in detail in Section 3.1.1. We assume that there are postnum t s posts in the stock forum of stock s in day t (from 9:30 in day t to 9:30 of the next trading day), and we denote the hth post as post h . Then the sentiment score of stock s in day t is shown in Equation (4), where Senti is the function to compute the sentiment score of a post. In a specific post post h , we assume that there are posnum h words in the positive list and negnum h words in the negative list. Following Siering [20] , we compute the sentiment score of the post as show in Equation (5). We denote the noises of N stocks as a random vector , and t is the prediction error in day t, namely Following earlier studies, we assume that is a vector of white noise and ∼ N(0, Σ), where Σ ∈ R N×N is the covariance matrix of . The probability density function for is shown in Equation (7), where = ( , , , , ) is the set of parameter matrices, and is a symmetric non-singular non-negative definite matrix with | |> 0. To simulate the trading in real world, we train the model every morning based on the historical data in the past-W days, where W is the length of training window. In a specific day t, the joint probability density function for the pre-W days is , and its log form is as shown in Equation (8), where ln is the natural logarithm function. By dropping the constant, we have the log-likelihood function in day t as shown in Equation (9). To help better reading, we summarize the important symbols in this paper in Table 1 . We apply a maximum likelihood method to estimate the parameter matrices, and derive the partial derivative of the objective function with respect to each parameter matrix with other parameters being fixed. Considering that the parameters are in matrix form, we apply matrix-form gradient descent methods based on partial derivatives of matrices. The estimation works are done every day t before the prediction work. Update with other parameters being fixed We derive the partial derivative for in day t as Equation (10), 1 , a n d B 1 = 0. Then we update by Equation (11), where is the learning rate. In the empirical analysis, Equation (11) iterates a fixed number of times to estimate . The partial derivative for in day t is derived as Equation (12) . 2 Then we update by Equation (13) . We denote ∇ = L( ) as the partial derivative for , and it is derived as Equation (14) . The elements in ∇ 1 The proof of Equation (12) in this paper uses the Formula (85) in Matrix Cookbook [11] . 2 The proof of Equation (14) in this paper uses Formulas (57) and (61) in Matrix Cookbook [11] . are shown in Equation (15), 3 where Tr is the trace of a matrix, K = ( , and B 2 = 0. M 1 (i, ) ∈ R N×p is a matrix with only the element at (i, ) not be 0, as shown in Equation (16), where 1 ≤ k ≤ N, 1 ≤ l ≤ p. Then we update by Equation (17). [ We denote ∇ = L( ) as the partial derivative for . Similarly with Equations (14) and (17), the elements in ∇ are derived as Equation (18), where M 2 (i, ) ∈ R N×q is a matrix shown in Equation (19) . Then we update by Equation (20) . 3 The proof of Equations (17), (20) and (23) We denote ∇ = L( ) as the partial derivative for . Similarly with Equations (14) and (15), the elements in ∇ are derived as Equation (21), where M 3 (i, ) ∈ R N×r is a matrix shown in Equation (22) . Then we update by Equation (23). As shown in Equation (3), E t−max(p,q,r)∶t−1 and are needed in forecasting of TM-ARMA in day t. So we should initialize E and before the forecasting. We arrange the first S days for initialization, and forecast the daily returns from day S + 1 to day T, and S > max(p, q, r) + W. Firstly, we build N (p + r)-dimensional linear regression models with the input of Y 1∶S−1 and X 1∶S−1 , then we set the prediction error as E 2∶S . Secondly, we build N (p + q + r)-dimensional linear regression models with the input of Y 1∶S−1 , E 1∶S−1 and X 1∶S−1 . Thirdly, we update E 2∶S by the new prediction error, set the covariance matrix of E 2∶S as , set the intercept term as , and set the coefficients of Y 1∶S−1 , E 1∶S−1 and X 1∶S−1 as the elements of , and , respectively. To better simulate the real trading, we propose a sliding-window online forecast approach, which is shown in Figure 1 and Algorithm 1. In TM-ARMA, the adjacent matrix A and parameter matrices are updated every day, as shown in Algorithm 1, where G is the training frequency. The model forecasts the returns for the next day from day S + 1, and slides forward day by day. The model trains the parameter matrices based on the data in the past W days before the forecasting in every morning. Compute the partial derivatives of , , , and based on Equations (10), (15) , (18) , (21) and (12), respectively; 6: Update , , , and based on Equations (11) , (17), (20) , (23) and (13), respectively. Update t based on Equation (6); 10: end for Our dataset is based on the constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. There are 487 trading days in our dataset, including the first S days arranged for initialization. To make the textual sentiment scores effective, the textual data should be as many as possible. We count the total number of posts in the Eastmoney Stock Forum for constituent stocks of the major stock indices in China, and calculate the average daily number of posts per stock, as shown in Table 2 . From Table 2 , we find that the constituent stocks of FTSE China A50 Index have more average daily posts than the constituent stocks of any other stock indices. So FTSE China A50 is the best choice. We also consider that the FTSE China A50 Index is internationally compiled with its futures contracts traded in Singapore, while the SSE 50 is not representative because it does not include any stock from the Shenzhen Stock Exchange. As a result, we choose the constituent stocks of FTSE China A50 Index as the research object. The constituent stocks of FTSE China A50 Index are adjusted quarterly but the stocks should be fixed in empirical experiment. Considering that there is only little change of FTSE China A50 Index's constituent stocks from January 1, 2019 to December 31, 2020, we choose the constituent stocks in December 31, 2020 of the FTSE China A50 Index as our research object. We obtain 1,686,259 posts in the stock forums for the constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. To make the sentiment scores better reflect the real sentiment of investors, we remove some invalid posts. For example, the institutional accounts of the top 20 users often sends some irrelevant routine news, which has nothing to do with the sentiment of a specific stock and lead to noise. After deleting these posts, we obtain 1,560,365 valid posts. As we know, words segmentation is necessary for text mining of Chinese textual information. Our words segmentation work is implemented by Pkuseg (https:// github.com/lancopku/PKUSeg-python), a powerful words segmentation tool developed by the Language Computing and Machine Learning Group of Peking University. To improve the segmentation accuracy, we apply four external lexicons for the word segmentation, namely Sogou Financial Accounting Lexicon (https://pinyin.sogou. com/dict/detail/index/20659), Baidu Stop Words Lexicon (https://github.com/goto456/stopwords/blob/master/ baidu_stopwords.txt) and two emotion dictionaries (which are introduced below). Colloquial expression and financial terminologies are the two characteristics of the forum posts. Firstly, considering that the language of the forum is colloquial and belongs to non-standard text, we introduce the BosonNLP emotion dictionary (https://bosonnlp.com/dev/resource), a powerful social media polarity emotion dictionary constructed from millions of emotion labeling data from microblog, news, and forums. BosonNLP emotion dictionary includes F I G U R E 2 Data used for forecasting every day many internet terms and informal abbreviations, and has a high coverage of non-standard texts. Secondly, considering that there are many financial terminologies, we apply the Chinese Financial Sentiment Dictionary (https://papers. ssrn.com/sol3/papers.cfm?abstract_id=3446388), a professional Chinese finance emotion dictionary which covers most of the common terms in the securities market. We merge the two dictionaries and obtain the list for positive words and list for negative words, which are applied in the computation of textual sentiment scores. The trading time of Chinese stocks is from 9:30 to 15:00, but the textual data is available everyday 24 hours, which are shown in Figure 2 . The announcement of the new policies and breaking news often occur during 15:00 to 9:30 of the next day, which led to a heated discussion on the stock forums. So the textual data from 15:00 to 9:30 of the next day is very important for forecasting. Therefore, TM-ARMA predicts the daily return rate at 9:30 in every trading day. Following earlier research [4] , we set y t as the daily log return times 100, as shown in Equation (24), where Open t ∈ R N×1 is the prices at 9:30 in day t. It should be noted that when computing the profits for the strategies, we use the original daily returns (Open t+1 ∕Open t − 1), which is consistent with the real trading. Our price data is obtained from the database of Tushare, and the data has been adjusted for stock split, rights offerings, and dividends. When suspension happens to some stocks in a particular day, we set the daily returns and corresponding predicted values of the suspended stocks to 0. This solution solves the problem of missing data, and do not influence the training and forecasting of normal stocks, and it is consistent with the real trading situations. In fact, the Chinese stock market implements the call auction from 9:15 to 9:25 and the continuous bidding starts at 9:30. Our model sells the stocks of the previous day during the call auction (between 9:15 and 9:25) and buys today's stocks at 9:30. Therefore, the profitability of the model mainly depends on the price gap between 9:30 on the current day and 9:25 on the next trading day. In order to make the time series model works well, the y t in our model is still set as the price of 9:30 every day. But when evaluating the model's performance, we calculate the profit based on the price at 9:25 on the next trading day' and 9:30 on the current trading day, which is consistent with the real trading scenario. We conduct all the experiments by the Python language. AR, ARMA, VAR, EM-ARMA [13] and TM-AR are chosen as the baseline models to compare with TM-ARMA, where TM-AR model is composed of cross-AR term and cross-textual term, as shown in Equation (25) . As we know, stationary series is required for ARMA and its extended models. Our empirical results show that the second-order differencing makes the dataset stationary and model converges well. VARMA and VARMAX have cross MA terms. We have tried hard to apply VARMA and VARMAX on our dataset, but they fail to converge on the data even after differencing. However, the models with single MA terms (such as ARMA and TM-ARMA) converge well. As a result, VARMA and VARMAX are not in the baseline models. Our research shows that too big training window length and hyperparameters do not help models perform better, but lead to heavy computation. According to the conclusions of Markov process, the information decays over time and too old information has little effect on price prediction. Most of traders agree that stock prices are strongly influenced by sentiment over the past 5 days and are correlated with volatility over the past 20 days, as the number of trading days in a week and month is approximately equal to 5 and 20, respectively. So we set the grid search scope of hyperparameters as 5, and the length of training window W = 20. Considering that S ≥ max(p, q, r) + W, we set S = 25. The learning rate is set as 0.001, as a lot of earlier research did. Our research shows that models converge well under the training frequency of 50, so we set G = 50. For comparison purposes, all models share the same initial settings, including W, S, , G and the grid search scope for hyperparameters. We set a rule-based investment strategy for these models. The following steps are performed at 9:30 in every trading day. Firstly, each model is trained based on the historical data, and forecasts the returns of the new day (from 9:30 to the 9:30 of the next trading day, as shown in Equation (24)). Secondly, the strategy builds a new portfolio for the stocks with positive predicted returns. Thirdly, the strategy allocates all the funds equally on all the stocks in the new portfolio and holds them until 9:30 of the next trading day. We set the original net asset value to "1" for all models' strategies, and calculate the net asset values daily. Considering that the commissions and stamp duties are about 0.01% and 0.1% of the turnover, respectively, we set the transaction cost rate as 0.01% and 0.11% for buy and sell operations. To save transaction costs in the third step, the strategy sells the stocks that are in the old portfolio and not in the new portfolio, buys the stocks that are in the new portfolio and not in the old portfolio, and buys or sells part of the stocks that are both in new and old portfolios. Models are trained to reduce the prediction error, but the real trading requires the strategies' profitability. So we evaluate the performance of models from both prediction performance and investment performance. Root Mean Square Error (RMSE) shown in Equation (26) is the most common evaluation indicator for prediction error, where y t ,ŷ t ∈ R N×1 represent the real and predicted returns in day t, respectively. Annualized Return Rate (ARR), Maximum Drawdown Ratio (MDR) and Calmar Ratio (CR) are common indicators for investment performance [26] . The drawdown for day t is the difference of the net asset value in day t and the highest previous net asset value. The MDR equals the biggest drawdown from day S + 1 to T divides into the corresponding highest previous net asset value. CR is the quotient of ARR and MDR. The RMSE is a negative indicator measures the forecast error. ARR is a positive indicator measures the profitability of the model. MDR is a negative indicator measures the sustainability of loss. CR is a positive indicator that measures the stability of profits. To compare the general performance of different hyperparameter combinations, we calculate the average performance indicators of all hyperparameter combinations for all models, as shown in Table 3 , where "↑" and "↓" mean positive indicators and negative indicators, respectively. The results in Table 3 show that EM-ARMA works much better than AR, ARMA and VAR, and TM-ARMA performs the best. To compare the prediction performance of all models, we report the performance for the hyperparameter combination of the best RMSE in Table 4 . We can find that the prediction performances of EM-ARMA TM-AR are significantly better than the traditional models, and TM-ARMA has the smallest prediction error. It is worth mentioning that the investment performance of TM-ARMA, TM-AR and EM-ARMA are also very good, which means that they give good consideration to both reduction of prediction error and increase of investment returns. To compare the investment performance of all models, we show the performance for the hyperparameter combination of the best ARR in Table 5 . We can find that TM-ARMA outperforms all baseline models. Comparing Tables 4 and 5, we find that the RMSE of EM-ARMA model is smaller than TM-AR, but TM-AR has better investment performance and TM-ARMA performs best in both aspects. To intuitively compare the investment performance, we draw the net asset value curves in Figure 3 for models with the hyperparameter combination same as Table 5 . The net value is sampled every 20 trading days when plot net value curves in Figure 3 . From Figure 3 , we can find that the traditional models have poor performance, but TM-ARMA, TM-AR and EM-ARMA show better recovery capability under the attract of COVID-19. To verify the impact of COVID-19 on profitability of all models, we compare ARR and excess ARR of all models in two stages in Table 6 . The excess ARR equals the difference between ARR of the model and ARR of the FTSE China A50 Index, and it measures the model's ability to achieve a higher return than the benchmark. From Table 6 , we can find that TM-ARMA performs well both in Pre-COVID stage and Post-COVID stage. This paper proposes a new TM-ARMA model that considering the cross impacts of textual emotions and price fluctuations, with the adjacent matrix updated dynamically. The textual sentiment scores are computed by a emotion dictionary-based method, and the parameter matrices are estimated by the maximum likelihood method. We conduct the empirical study based on the textual data and daily price data for constituent stocks of the FTSE China A50 Index from January 1, 2019 to December 31, 2020. The results show that TM-ARMA outperforms all the baseline models, and its performance in the Post-COVID-19 Stage is even better than that in the Pre-COVID-19 Stage. Financial contagion during covid-19 crisis Death and contagious infectious diseases: Impact of the covid-19 virus on stock market returns Stock markets' reaction to COVID-19: Cases or fatalities? Return equicorrelation in the cryptocurrency market: Analysis and determinants Moment consistency of the exchangeably weighted bootstrap for semiparametric m-estimation, Scand Ordinal-response GARCH models for transaction data: A forecasting exercise Distributed generalized cross-validation for divide-andconquer kernel ridge regression and its asymptotic optimality Measuring tail risk with gas time varying copula, fat tailed garch model and hedging for crude oil futures Buzzwords build momentum: Global financial twitter sentiment and the aggregate stock market Forecasting UK consumer price inflation using inflation forecasts The matrix cookbook. Technical University of Denmark Gratis: Generating time series with diverse and controllable characteristics Estimation of a digitised Gaussian ARMA model by Monte Carlo expectation maximisation Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong Tensor graphical model: Non-convex optimization and statistical inference Neural network for univariate and multivariate nonlinearity tests Varmax-modelling of blast furnace process variables Robust multicategory support matrix machines Adaptively weighted large-margin angle-based classifiers The economics of stock touting during internet-based pump and dump campaigns Exact discrete representations of linear continuous time models with mixed frequency data How the individual investors took on big data: The effect of panic from the internet stock message boards on stock price crash Idiosyncratic skewness, gambling preference, and cross-section of stock returns: Evidence from China, Pac. Basin Financ Robust multicategory support vector machines using difference convex algorithm Financial markets under the global pandemic of covid-19 Mhier-encoder: Modelling the highfrequency changes across stocks, Knowl.-Based Syst The authors wish to thank Guodong Long, Guifang Liu, Qin Zhang, and Youle Wang for the suggestions for this paper. This work is supported by National Natural Science Foundation of China (71771091, 71720107002), Guangdong Basic and Applied Basic Research Foundation (2019A1515011752, 2021A1515110876) and Philosophy and Social Science Foundation of Guangdong Province (GD19YYJ11). The data that supports the findings of this study can be divided into price data and textual data. The price data is available on Ricequant platform (ricequant.com), in the section of investment research. It can also be acquired by many other financial database such as WIND financial terminal (wind.com.cn), CSMAR database (gtarsc.com) and so on. The textual data is available on Eastmoney Stock Forum (guba.eastmoney.com), in the stock form of every specific stock. Weijun Xu https://orcid.org/0000-0001-9489-4487