key: cord-0670155-tzuv6lmw authors: Du, Jiawei title: A Research on Cross-sectional Return Dispersion and Volatility of US Stock Market during COVID-19 date: 2020-07-06 journal: nan DOI: nan sha: a4dccda3e93e77a0f5837c07d8d09d53fd6f0959 doc_id: 670155 cord_uid: tzuv6lmw We studied the volatility and cross-sectional return dispersion effect of S&P Health Care Sector under the covid-19 epidemic. We innovatively used the Google index to proxy the impact of the epidemic and modeled the volatility. We also studied the influencing factors of the log-return of S&P Energy Sector and S&P Health Care Sector. We found that volatility is significantly affected by both the epidemic and cross-sectional return dispersion, and the coefficients in front of them are all positive, which means that the herding behaviour did not exist and as the cross-sectional return dispersion increases and the epidemic becomes more severe, the volatility of stock returns is also increasing. We also found that the epidemic has a significant negative impact on the return of the energy sector, and finally we provided our suggestions to investors. Covid-19 epidemic swept the world in 2020, first in China, then in Japan, South Korea, Iran, Italy, Spain, the United Kingdom and the United States. The epidemic caused a sharp economic downturn. According to the information of the United States Department of Commerce, the real GDP of the United States fell by 4.8% in the first quarter. Herding and return dispersion have been studied by many researchers. Many literatures showed that herding exists widely in stock market, especially in developing countries (Chiang, T. C., and Zheng, D.,2010) . In this paper, we want to explore whether there is herding effect in American stock market, whether the epidemic has an impact on the return of American stock market, and what factors affect the stock volatility in this period. Different from the similar research, we not only use the traditional indicators such as the number of infected persons to represent the abstract variable of epidemic situation, but also use Google index to represent it, which enables us to better reflect the real impact of the epidemic situation when the data of COVID-19 in the United States is distorted in the early stage of the epidemic. In terms of sector selection, we selected S&P 500 health care sector and energy sector for comparison, and calculated their respective CSAD and CSSD indexes. Since the outbreak time in the United States is very close to now, and the epidemic is still developing, we chose two time periods, namely short time horizon from March 2 to May 29 (63 time observations), and long time horizon from February 20, 2020 to June 12, 2020 (80 time observations). The basis for this choice will be explained later. Finally, we will make investment suggestions to investors on our results. Chang, E. C., Cheng, J. W., and Khorana, A. (2000) have made a remarkable contribution to the study of herd behaviour and herd behaviour. They have found that herd behaviour is more obvious in some emerging markets than in mature markets, especially during periods of rising market prices. Not only normal investors, Choi, N., and Sias, R. W. (2009) found that herding behaviour also existed in institutional investors. Fei, T., Liu, X., and Wen, C. (2019) regarded the cross-sectional stock return dispersion as a kind of herd behaviour indicator like previous researchers did, the cross-sectional return dispersion index in our later paper refers to their research . Maio, P. (2016) studied the relation between stock return dispersion and the future stock return, he found that compared with other indicators, return dispersion have a stronger prediction, especially for the largescale stocks and growth stocks. Prior to this, Stivers, C. T. (2003) conducted similar studies, but he used data on monthly returns dispersion. He also studied the relationship between returns dispersion and future volatility, and found that these relationships are very significant. Chortareas, G., Jiang, Y., and Nankervis, J. C. (2011) focus on the euro volatility, and they found that data with high-frequency and long-time horizon could improve volatility forecasting performance significantly. In terms of the method for studying volatility, the conventional models are the ARCH model (Engle, R. F. ,1982) and GARCH model (Bollerslev, T. ,1986) , but many researchers have used the improved model of the GARCH model. For example, Hwang*, S., and Satchell, S. E. (2005) have used the GARCHX model. Many researchers also used the GJR-Garch model proposed by Glosten et al. (1993) or the TGARCH model proposed by Zakoian (1994) or the EGARCH proposed by Nelson and Cao (1991) . Fei, T., Liu, X., and Wen, C. (2019) chose the GARCH, GJR-GARCH, and HAR model, one of their findings is that the cross-sectional stock return dispersion did have significant influence on the forecast volatility. Onali, E. (2020) did a very updated research on the relation between the stock volatility and COVID-19 epidemic, he added the COVID-19 factor both into mean equation and volatility equation of the Garch(1,1) model, interestingly, he first regarded the increasing infections in Iran as a good index for American stock market, while in the end, he did not find the significance. Even, the infections and deaths cases from most of the countries (include America itself) he studied showed no influence on the US stocks returns, but showed significant influence on volatility. In December 2019, an unexplained pneumonia virus outbreak occurred in Wuhan. The epidemic was first discovered in the Wuhan South China Seafood Wholesale Market. With the development of the epidemic, infected people also appeared in other provinces in China and in other countries (Li et al. 2020 ). On February 11, 2020, WHO named the virus COVID-19. One month later, on March 12, 2020, the National Health Commission of China announced that the peak period of the covid-19 epidemic had passed, but other countries represented by US, Iran, Italy, and South Korea had an outbreak. At about the same time, WHO announced covid-19 Epidemic Global Pandemic. Data from Johns Hopkins University, as of April 28, more than 1 million people in the United States were infected with COVID-19, and more than 2 million as of June 8. The data left behind by web searches for research and prediction has a long history. H. A. Johnson et al. (2004) collected the visit data of health-related websites and found that the number of visitors of flu-related articles and other flu-related data had a strong correlation with the influenza number data provided by the CDC. Later, j. Ginsberg et al. (2009) studied the relationship between Google search data and the incidence of influenza diseases, and found that through the modeling of the two, the outbreak of influenza could be predicted 1-2 weeks in advance of the traditional method. N. Askitas et al. (2009) took the unemployment rate of Germany as the research object. He conducted statistical analysis on the keywords related to unemployment rate in Google search and found that these keywords had a strong correlation with the actual unemployment rate. N. P. Lincoln (2011) studied the sales volume of apple's electronic products and empirically obtained the significant correlation between the sales volume of apple's electronic products and the Internet search data of Google by using the Internet search data provided by Google search. We have reasons to believe that we can use Google index data to estimate the extent of the covid-19 epidemic. Stocks data we chose the constituent stocks in the S&P 500 Health Care Sector and S&P 500 Energy Sector from Yahoo Finance, time horizon from Feb 20, 2020 to June 12 th , 2020 (80 time-observations). Also, we chose the time horizon from Mar 2 nd to May 29 th (63 time-observations), during which, Covid-19 in America is very severe, and after that, there was a large black march in the United States, which had a certain impact on our variables such as VIX and stock returns. It would be more scientific to select a narrower period. In this paper, we focus on the three important factors. First is COVID-19. The herding behavior we study is in the context of the COVID-19 epidemic, thus, we need to choose a proxy to represent the COVID-19. One is easy to understand, the infections in the US, we take the log of it (ln(1+x)), namely, 'Inlg'. However, the lack of nucleic acid testing reagent and people's lack of awareness in the early stage of the epidemic in the United States caused distortion of the data, we should find another proxy to represent it, finally, we choose Google Index with keywords 'COVID-19'. Why we choose it? The impact of the COVID-19 epidemic is twofold. On the one hand, it is the practical impact, such as the direct economic impact caused by the inability of infected people to work and the inability of factories or companies to function normally. On the other hand, the impact of the epidemic on people's psychology. Once people panic about the COVID-19 epidemic, they will reduce their consumption and sell assets such as stocks, causing indirect economic losses. The increase in the number of covid-19 infections and people's fear of the epidemic are both contributing to the rise of the Google index. Therefore, we believe that the Google index can better reflect the true impact of the epidemic with the distortion of the epidemic data. We take the log of it (ln(1+x)) as 'Golg'. Also, we take log of (ln(1+x)) VIX Index, as another kind of panic indicator. In simple terms, see the table below. The second important factor is herding behavior, one of our targets. We followed Christie and Huang (1995) and use CSAD and CSSD to represent it. Where is the mean log-return. The log-return, CSAD and CSSD plot see Figure 1 and 2. model and the GJR-arch(1)-X model. In next part, we will introduce it. For convenience, Table 2 showed the introduction of parameters. Table 2 Parameter Introduction Splg The first order differential of S&P 500 Index log-return Hclg The first order differential of S&P 500 HealthCare Sector Index log-return Elg The first order differential of S&P 500 Index Energy log-return Inlg The first order differential of log-infections Golg The first order differential of log-Google-Index Vixlg The first order differential of log-VIX-Index Hccsad CSAD of S&P 500 HealthCare Sector Hccssd CSSD of S&P 500 HealthCare Sector Ecsad CSAD of S&P 500 Energy Sector Ecssd CSSD of S&P 500 Energy Sector Before the establish of the model, we need to do the ADF Test and normal distribution test first, the Table 3 and 3 showed the test results. In Table 3 , we see that except 'Hccsad' and 'Hccssd' , all the variables showed the P-value less than 0.01, since the dataset is small, we still believed that the 'Hccsad' and 'Hccssd' (P-value less than 0.1) do not exist unit root, they are stationary. Data with longer time horizon are more significant. Garch (1,1)-X Model: Where we use Inlg, Golg, VIXlg, HCcsad, HCcssd, LaggedHCcsad and LaggedHCcssd as respectively. Besides, for comparation, we would fit the data by two time-horizons. Time one from Mar 2nd to May 29 th and time two from Feb 20 th to June 12 th . EGarch (1,1)-X Model: Mean equation is the same with the GARCH(1,1)-X model. And volatility term has been written as: Where is i.i.d. standardized random variables with N(0,1). Still, we would fit the data by two time-horizons. GJR-Garch (1,1)-X Model: In addition, we also want to study if the COVID-19 and lagged return dispersion affect the log return of Health Care Sector and Energy Sector, since we suspect that the epidemic will have a negative impact on energy stocks and a positive impact on health care stocks. Therefore, we establish the multiple linear regression model. = 0 + 1 1, + 2 2, + 3 3, + Where 1, is the S&P 500 Index, 2, is the COVID-19 proxy, 3, is the return dispersion index (we use CSAD here due to the results are very closed). Table 5 and 6 showed the Garch-X(1,1) results, and table 7 and 8 showed EGarch-X(1,1) results. Both the models fit narrower time horizon data well, we think that is because the narrow time horizon is the most severe time period in the United States during COVID-19, after which, the stock market has digested many factors of the epidemic and there were protests against racism in the United States, and these protests may affect our proxies and stock returns. The time before march, COVID-19 infections are very low in the United States, which also affect the model performance, that is why we could not run the results when using long time horizon. Remark: Garch-X(1,1) with Het Inlg, Golg, VIXlg: Stata cannot run the results. In table 5, the coefficients of the arch term and 'Garch' term are very significant in all the models from model 1 to model 7, we found that the coefficients of 'Inlg', 'Golg', 'Vixlg' are significantly positive, which implies that the COVID-19 affects the stock market volatility significantly, when COVID-19 more severe, investors more panic, and the stock volatility get higher. For herding behavior index, CSAD and CSSD, whether it is the current CSAD(CSSD) or the CSAD(CSSD) lagging one period, the coefficients in front of them are significantly positive, which means that there is no herding effect. But as far as these two indicators are concerned, the return dispersion has led to an increase in volatility. We believe that the reason could be that US dollar assets are safe-haven assets, and this COVID-19 epidemic is a global epidemic. When the global asset risk increases, Investors will increase the allocation of U.S. dollar assets, resulting in the negative factors caused by the epidemic and the positive factors brought about by risk aversion repeatedly affecting the market, resulting in return dispersion and rising volatility. In table 7, the coefficients of 'egarch_a_L1' and 'egarch_L1' are significant in model 1, 6 and 7, which indicate there exist the leverage effect, however, as for the whole, we cannot say that the leverage effect is significant. Table 8 showed part of the results. Some models Stata does not run the results, also, the 'earch' term of the model with current CSAD(CSSD) and vixlg are not significant. Meanwhile, these three models are not the focus of our discussion, so we will not discuss them here. Remark: EGarch-X(1,1) with Het Inlg, Golg, LaggedHCcsad, LaggedHCcssd: Stata cannot run the results. Table 9 showed the GJR-Garch-X(1,1) results, compared with Egarch(1,1)-X, GJR model showed that the leverage effect ('tgarch_a_L1' term) is not significant, partly confirmed the results of the Egarch(1,1)-X model. We think the reason is that the COVID-19 epidemic is both good and bad effect for the stocks in health care. On the one hand, the epidemic makes the production of enterprises slow down and the economy under pressure. On the other hand, the market demand for the medical industry increases at this time, and finally the leverage effect is not significant. Nevertheless, of all the seven GJR-Garch-X(1,1) models, we are pleased that the Xt term of volatility estimation are very significant, findings like Garch-X(1,1) model, Covid-19 has a positive impact on the volatility of the U.S. stock market. Herding effect is not significant during this period, but the dispersion of returns makes the volatility of stock returns increase. We have the three main conclusion. The first conclusion is that herding does not exist (in terms of S&P health care sector), however, the coefficients in front of CSAD(lagged) and CSSD(lagged) are significantly positive, which showed that crosssectional return dispersion increased the volatility of stock returns. The second conclusion is that the COVID-19 epidemic had a significant positive impact on the stocks return volatility of health care sector. The higher the number of infections and Google search volume, the higher the stocks return volatility. The third conclusion is that the return-dispersion with lags one period has no significant impact on stock returns during the epidemic. The impact of the epidemic on different sectors is different, it has no significant impact on the health care sector, but has a significant negative impact on the energy sector. We have three suggestions for investors. The first one is that investors can short the energy sector during the pandemic, and the specific operation basis can be the freely available Google index (with keywords COVID-19). The second suggestion is that for risk-averse investors, investment in the stock market should be reduced during the COVID-19 outbreak, because the volatility will increase with the development of the epidemic. The third suggestion is, for risk averse investors, if the return dispersion of the previous day is high, they can sell stocks or hedge stocks today to reduce the negative effects of high volatility. And for high-frequency traders, they can enter the market today to seek trading opportunities with high volatility. 4090 Words. (1) (yyy2) cssd_1=np.array ( 8 4.1 ADF Test, Normal Distribution Test and Arch-LM Test Google econometrics and unemployment forecasting An examination of herd behavior in equity markets: An international perspective Google trends: a web-based tool for real-time surveillance of disease outbreaks An empirical analysis of herd behavior in global stock markets Institutional industry herding Forecasting exchange rate volatility using high-frequency data: Is the euro different? Following the pied piper: Do individual returns herd around the market? COVID-19 United States Cases by County Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation GARCH model with cross-sectional volatility: GARCHX models Cross-sectional return dispersion and volatility prediction On the relation between the expected value and the volatility of the nominal excess return on stocks Analysis of web access logs for surveillance of Influenza The relationship between internet marketing, search volume, and product sales (Doctoral dissertation Cross-sectional return dispersion and the equity premium Conditional Heteroskedasticity in Asset Returns: A New Approach Covid-19 and stock market volatility. Available at SSRN 3571453 Firm-level return dispersion and the future volatility of aggregate stock market returns Threshold heteroskedastic models