key: cord-0143367-1lwgvele
authors: Dey, Asim Kumer; Haq, Toufiqul; Das, Kumer
title: Quantifying the impact of Covid-19 on the US stock market: An analysis from multi-source information
date: 2020-08-25
journal: nan
DOI: nan
sha: 2e4444a27fd4a95c1fac557519e921c7ef6bc78e
doc_id: 143367
cord_uid: 1lwgvele

We investigate the impact of Covid-19 cases and deaths, local spread spreads of Covid-19, and Google search activities on the US stock market. We develop a temporal complex network to quantify US county level spread dynamics of Covid-19. We conduct the analysis by using the following sequence of methods: Spearman's rank correlation, Granger causality, Random Forest (RF) model, and EGARCH (1,1) model. The results suggest that Covid-19 cases and deaths, its local spread spreads, and Google searches have impacts on the abnormal stock price between January 2020 to May 2020. However, although a few of Covid-19 variables, e.g., US total deaths and US new cases exhibit causal relationship on price volatility, EGARCH model suggests that Covid-19 cases and deaths, local spread spreads of Covid-19, and Google search activities do not have impacts on price volatility.

The stock market reacts to different local and global major events. Cagle (1996) ; Worthington and Valadkhani (2004) ; Worthington (2008) ; Cavallo and Noy (2009) , and Shan and Gong (2012) study the impact of natural disasters, e.g., hurricanes and earthquakes, on the stock markets. Hudson and Urquhart (2015) ; Schneider and Troeger (2006) ; Chau et al. (2014) ; Beaulieu et al. (2006) , and Huynh and Burggraf (2019) evaluate the effect of political uncertainty and war on the stock market. The influences of the outbreak of infectious diseases, e.g., Ebola and SARS, on the stock indices are assessed in Nippani and Washer (2004) ; Siu and Wong (2004) ; Lee and McKibbin (2004) , and Ichev and Marinč (2018) .

Investor sentiment is another crucial determinant of stock market dynamics. However, quantifying investor sentiment is not an easy task because of its unobservable and heterogeneous behaviors (García Petit et al. (2019) ; Gao et al. (2020) ; Baker and Wurgler (2007) ; Bandopadhyaya and Jones (2005) ). In recent years, due to data availability, Google search volume has become a popular index of investor sentiment (Bijl et al. (2016) ; Kim et al. (2019) ; Preis et al. (2013) ). Bollen et al. (2011) determine Twitter feeds as the moods of investors and use the Twitter mood to predict the stock market. Alanyali et al. (2013) ; Schumaker and Chen (2008) ; Bomfim (2003) , and Albuquerque and Vega (2008) evaluate the relationship between financial news and the stock market and find that news related to the asset significantly impact the corresponding stock price and volatility.

After the Covid-19 pandemic started spreading worldwide, the US stock market collapsed significantly with the S&P 500 dropping 38% between February 24, 2020 and March 20, 2020. Similar declines have occurred in other stocks too. In recent months a number of studies have appeared to assess the impact of the Covid-19 outbreak on the stock market. Nicola et al. (2020) provide a review on the socioeconomic effects of Covid-19 on individual aspects of the world economy. Baker et al. (2020) analyze the reasons why the U.S. stock market reacted so much more adversely to Covid-19 than to previous pandemics that occurred in 1918 in -19, 1957 in -58 and 1968 in . Wagner (2020 gives a picture of post-Covid-19 economic world. Onali (2020) ; Zaremba et al. (2020) ; Arias-Calluari et al. (2020) , and Cao et al. (2020) perform statistical modeling to analyze the effect of Covid-19 on the stock market price and volatility.

However, there are still a number of important questions that need to be investigated. For example, 1. Do the number of Covid-19 cases and deaths exhibit any causal effect on stock price? though we focus on S&P 500, the methodology is applicable to any other stock indices. The rest of the paper is organized as follows. Section 2 describes the data, constructs a temporal network for Covid-19 spreads, and defines the variables used in the study. The methodology is described in Section 3. Section 4 presents findings and a discussion of the results. Finally, Section 5 concludes.

The S&P 500 closing price from June 3, 2019 to May 29, 2020 data are obtained from Yahoo! Finance. Google search data from January 2, 2020 to May 29, 2020 are obtained from Google Trends. We get US County level Covid-19 case data from New York Times and US county information from US National Weather Service.

We evaluate the impact of Covid-19 on abnormal S&P 500 index. We define the daily abnormal S&P 500 price (AP) between January 2, 2020 and May 29, 2020 by subtracting the average price of the last seven months from the daily price and by dividing the resultant difference from the standard deviation of last seven months (i.e., 148 days) as follows:

where, P t is the daily closing price for day t, σ P is the standard deviation of the last 148 days closing price (Kim et al. (2019) ; Bijl et al. (2016) ). We use daily squared log returns of prices P t as a proxy for daily volatility (V ol) (Brooks (1998) ; Barndorff-Nielsen and Shephard (2002) ):

We study the impact of a number of Covid-19 variables (C), e.g., daily US total cases, daily US new cases, daily World total cases, etc. to AP t and V ol t . For a complete list of Covid-19 variables see Table 1 . We standardized each Covid-19 variable on the basis of a rolling average of the past 7 days and corresponding standard deviation as:

where, C t is a Covid-19 variable (e.g., US total cases) at day t, µ C and σ C are the mean and standard deviation of the corresponding variable within the sliding window of days [t − k, t − 1].

A complex network represents a collection of elements and their inter-relationship. A network consists of a pair G = (V, E) of sets, where V is a set of nodes, and E ⊂ V × V is a set of edges, (i, j) ∈ E represents an edge (relationship) from node i to node j. Here |V | is the number of nodes and |E| is the number of edges. The degree d u of a node u is the number of edges incident to u i.e., for u,v ∈ V and e ∈ E,

The largest connected component (GC) is the maximal connected subgraph of G. The elements of the n × n-symmetric adjacency matrix, A, of G can be written as

Higher-order network structure, e.g., motif, represents local interaction pattern of the network. In a disease transmission network motif provide significant insights about the spread of the diseases. For example, the presence of dense motif or fully connected motif can increase the spread of the disease through the network, while chain-like motif can decrease the spread of the disease (Leitch et al. (2019) ). A motif is a recurrent multi-node subgraph pattern. A detailed description of network motifs and their functionality in a complex network can be found in Milo et al. (2002) Temporal Network is an emerging extension of network analysis which appears in many domains of knowledge, including epidemiology (Valdano et al. (2015) ; Demirel et al. (2017) ; Enright and Kao (2018) ), and finance (Battiston et al. (2010) ; Zhao et al. (2018) ; Begušić et al. (2018) ). A temporal network is a network structure that changes in time. That is, a temporal network can be represented with a time indexed graph G t = (V (t), E(t)), where, V (t) is the set of nodes in the network at time t, E(t) ⊂ V (t) × V (t) is a set of edges in the network at time t. Here t is either discrete or continuous. Figure 2 depicts a small 15-node temporal network with time t = 1, 2, and 3. In order to quantify the county level spread of Covid-19 we construct a complex network (G t ) in each day (t) between Jan 2, 2020 to May 29, 2020: G = {G 1 , . . . , G T }, where T = 130. We evaluate the occurrences of different motifs in each G t . An increase number of motifs, i.e., T and M , and other network features e.g., E, indicate a higher spread in local community. These increases of higher order network structures have potential impacts on AP , and V ol.

Let C be the set of counties in US, I is the set of Covid-19 new cases identified in C on a day t, and D is the of pairwise distance matrix in miles among centroid of the counties in C. We use the following three steps to construct the Covid-19 spread network (G t ) at time t and compute the occurrences of motifs in G t :

1. Each County in C with γ or more Covid-19 new cases, γ ∈ Z + , makes node in the network (G t ).

2. Two counties (i.e., nodes), i and j, are connected by an edge if (1) both counties have λ or more Covid-19 new cases, λ ∈ Z + , and (2) the distance between i and j is less than δ, δ ∈ R ≥0 . Therefore, the adjacency matrix, A t , is written as

3. We compute occurrences of nodes (V t ), edges (E t ), different 3-node motif (T (t)), different 4-node motifs (M (t)), and size of the largest connected component (GC(t)) in G t .

In this study we choose γ = 5, λ = 5, and δ = 100. That is, if two counties both have 5 or more Covid-19 cases and if the distance between these two counties is less than 100 miles they are connected by an edge. Fig. 3 shows Covid-19 spread network in US counties on April 11, 2020. We consider different network features e.g., E, T , M , etc. as metrics of the local spread of Covid-19. We normalize each of the network variables based on Eq. 3 as where, N t is a network variable (e.g., E) at day t, µ N and σ N are the mean and standard deviation of the corresponding variable within the sliding window of

A number of studies, e.g., Preis et al. (2010) ; Bijl et al. (2016) , and Kim et al. (2019) , show that there is a significant correlation between stock variables (e.g., return, volume, and volatility) and related Google searches, and Google search data can be used to predict future stock. We investigate whether Google trend data affect the abnormal price, AP , and volatility, V ol, and if we can use Google search volumes to predict AP and V ol. We obtain the volume of the Covid-19 related daily Google searches (e.g., "Coronavirus") from Jan 2, 2020 to May 29, 2020. We select the location of a query in "US" and in the "World". We standardized each Google search variable similar to Eq. 3 as

where, G t is a Google search variable at day t, µ G and σ G are the mean and standard deviation of the corresponding variable within the sliding window of days [t − k, t − 1]. Table 1 provides an overview of the data sets and variables that are used in this study. 

We investigate the impact of Covid-19 cases and deaths, local spread spreads of Covid-19, and Covid-19 related Google search volumes on the abnormal stock price and volatility.

A correlation test is widely used to evaluate relationship between stock market and potential covariate ( 2019)). In this study, we use Spearman's rank correlation to study correlation between stock market (AP and V ol) and each of the Covid-19 related variables.

To assess potential predictive utilities of Covid-19 cases, local spreads, and Google search interests on abnormal price formation (AP ) and V ol, we apply the concept of Granger causality (Granger (1969) ). The Granger causality test evaluates whether one time series is useful in forecasting another. Let Y t , t ∈ Z + be a p × 1-random vector (AP t or V t ) and let F t (Y) = σ{Y s : s = 0, 1, . . . , t} denote a σ-algebra generated from all observations of Y in the market up to time t. Consider a sequence of random vectors {Y t , X t }, where X can be either Covid-19 cases, local spreads or Google search volumes. Suppose that for all

represents the direction of causality (White et al. (2011); Dey et al. (2020) ).

We fit two models, where one model includes X and another does not include X (base model), and compare their predictive performance to assess causality of X to Y using an F -test, under the null hypothesis of no explanatory power in X. For univariate cases we compare the following two models:

versus the base model

If V ar(e t ) is significantly lower than V ar(ẽ t ), then x contains additional information that can improve forecasting of y, i.e., G x y . We can also fit two linear vector autoregressive (VAR) models, with and without X, respectively, and evaluate statistical significance of model coefficients associated with X.

To quantify the forecasting utility of the covariates (X), i.e., Covid-19 cases, US county level spreads of Covid -19, and Google searches, we develop predictive models with and without X and compare their predictive performances. In order to conduct such a comparison, Box-Jenkins (BJ) class of parametric linear models are commonly used. However, different studies, e.g., Kane et al. (2014) ; Dey et al. (2020) , show that flexible Random Forest (RF) models often tend to outperform the BJ models in their predictive capabilities. We present the comparative analysis based on the RF models. However, any appropriate forecasting model (e.g., autoregressive integrated moving average (ARIMA(p, d, q)), can also be used to compare the predictive performances of the covariates.

A RF model sorts the predictor space into a number of non-overlapping regions R 1 , R 2 , · · · , R m and makes a top-down decision tree. A common dividing technique is recursive binary splitting process, where in each split it makes two regions R 1 = {X|X j < k} and R 2 = {X|X j ≥ k} by considering all possible predictors X j s and their corresponding cutpoint k such that residual sum of squares (RSS) (Eq. 11) become the lowest.

whereŷ R1 andŷ R2 are the mean responses for the training observations in the region R 1 (j, k), and in R 2 (j, k), respectively. To improve the predictive accuracy, instead of fitting a single tree, RF technique builds a number of decision trees and averages their individual predictions (Hastie et al. (2001) ). RF is a non-linear model (piece-wise linear). Therefore, if there is any nonlinear causality (Kyrtsou and Labys (2006) ; Anoruo (2012); Song and Taamouti (2018)) of X to AP and V , RF model apprehends this causality. We compare predictive performance of a baseline model (Model P 0 ), which includes only the lagged values of the abnormal price, with other proposed models which additionally include a set of covariates. The covariates are selected based on their significant correlations and causalities. Table 2 represents a description of the five models we use in our analysis. 

Predictors Model P 0 AP lag 1, AP lag 2, AP lag 3 Model P 1 AP lag 1, AP lag 2, AP lag 3 , US total deaths lag 1, US total deaths lag 2, US total deaths lag 3, World new deaths lag 1, World new deaths lag 2, World new deaths lag 3 Model P 2 AP lag 1, AP lag 2, AP lag 3 , Edges lag 1, Edges lag 2, Edges lag 3, GC lag 1, GC lag 2, GC lag 3, T 2 lag 1, T 2 lag 2, T 2 lag 3, M 4 lag 1, M 4 lag 2, M 4 lag 3 Model P 3 AP lag 1, AP lag 2, AP lag 3, "Covid-19" US lag 1, "Covid-19" US lag 2, "Covid 19" US lag 1, "Covid 19" US lag 2,"Covid-19" World lag 1, "Covid-19" World lag 2 Model P 4 AP lag 1, AP lag 2, AP lag 3, "Covid-19" US lag 1, "Covid-19" US lag 2, "Covid 19" US lag 1, "Covid 19" US lag 2, T 2 lag 1, T 2 lag 2, US total deaths lag 1, US total deaths lag 2

We consider the root mean squared error (RMSE) as measure of prediction error. The RMSE for abnormal price modeling can be defined as

where y t is the test set of abnormal price (AP ) andŷ t is the corresponding predicted value. We calculate the percentage change in prediction error (RMSE) for a specific model in Table 2 with respect to model P 0 as

where Ψ(P i ) and Ψ(P 0 ) are the RMSE of model P 0 and model P i , respectively. If ∆ > 0, the covariate (X) is said to improve prediction of Y . We compare the ∆ for different models, calculated for varying prediction horizons.

We now turn to evaluate the utility of Covid-19 cases and deaths, US county level spreads of Covid -19, and Google searches in predicting stock market volatility. Let the conditional mean of log return of S&P 500 price (r t ) be given as

where I t−1 is the information set at time t − 1, and t is conditionally heteroskedastic error. We build two exponential GARCH (EGARCH (p, q)) models, Model 0 and Model X, where Model 0 is a standard EGARCH model with no explanatory variables, and Model X includes a set of explanatory variables:

where η t ∼ iid (0,1), i = 1, 2, · · · , q, j = 1, 2, · · · , p (Nelson ( We select a set of eight explanatory variables: X = US total deaths lag 1, US total deaths lag 2, # Edges lag 1, # Edges lag 2, T 2 lag 1, T 2 lag 1, "Covid 19" US lag 1, "Covid 19" US lag 2 with Λ = λ 1 λ 2 · · · λ 8 . All the explanatory variables are in the form of log returns. For simplicity we choose EGARCH (1,1) model. For EGARCH (1,1) with the assumption of η t ∼ iid (0,1) the two propose models (Eq. 14) reduce to Model 0: log e (σ 2 t ) = ω 0 + ω i η t−j + γ j |η t−j | + τ j log e (σ 2 t−j ),

The performances of the two models are compared based on their log likelihood, Akiake Information Criterion (AIC) and Bayesian information criterion (BIC).

We investigate the effect of Covid-19 public health crisis on the stock market, in particular, on S&P 500. We primarily focus on S&P 500 reaction to Covid-19 cases and deaths, local spread, and Covid-19 related Google searches. Figure 4 shows the movements of abnormal S&P 500 price and volatility from January 13, 2020 to May 29, 2020. The top panel reveals the precipitous drop of S&P 500 price compare to last seven months prices (Eq. 1). Historic high volatility (Eq. 2) is depicted in the bottom panel. we start our analysis with the Spearman's rank correlation test. We calculate correlations between the daily abnormal S&P 500 closing price AP and the daily Covid-19 cases and deaths, and daily occurrences of higher order structures in the spread network at different time lags. For example, at lag 1 we compute correlation of AP at day t with Covid-19 cases and deaths, and higher order network structures, all at day t − 1. These lag correlations evaluate the directionality of the relationships. Figure 5a shows the box plots which combined correlations between each Covid-19 cases and deaths variable and AP at different lag. Here we build two box plots at each lag: one for Covid-19 cases and deaths in the US (four valuables), and another for Covid-19 cases and deaths in the World (four valuables). Similarly, Figure 5b represents the box plots that combined correlations between each eleven local spread variables and AP at different lag.

We find that there exists significant (negative) correlation between Covid-19 cases and deaths in US and abnormal S&P 500 in all six lags, lag = 1, 2, · · · , 6. However, there is no significant correlation between Covid-19 cases and deaths in entire world and abnormal S&P 500 (p-value > 0.05) in any lag (see Table 7 in Appendix). We also find that all the local spread variables are significantly (negative) correlated (p-value < 0.05) with abnormal S&P 500 in every lag = 1, 2, · · · , 6. That is, US county level spread of Covid-19 adversely effect the price of S&P 500. However, it is anticipated that the strength of correlations of local spread variables will gradually decrease in higher lags, which is also reflected in Figure 5b . Some of the Covid-19 related google searches, e.g., "Covid-19" in US and "Corona" in world are also significantly correlated (p-value > 0.1) with abnormal S&P 500 in different lags (Table 9 in Appendix).

We now investigate the potential impact of Covid-19 cases and deaths, its local spread, and related Google searches on S&P 500 price formation and risk, i.e., volatility. Table 3 and Table 4 present summaries of the Granger causality tests for predictive utility of Covid-19 cases and deaths, and county level local spreads, respectively. Here the direction of causality is denoted by .

We find that US total new cases and US total death have significant predictive impacts on price and volatility. US total number of cases have predictive relationship only with volatility in few lags. Among world Covid-19 cases and deaths only total new deaths have causality on price and volatility. Almost all the local spread variables have predictive impact on price, but none of them except # Edges at lag 1 have causality on volatility. That is, county level spread of Covid-19 significantly influence abnormal price formation, but, surprisingly, they do not have causal linkage with the volatility. Table 10 in Appendix shows that a number of Google search variables have causality effects on abnormal price. However, only "Coronaviru" US and "Covid 19" US have predictive impacts on volatility at very few lags. Now we turn our analysis to compare the predictive performance of models described in Table 2 . Table 5 percents prediction errors based on Eq. 12 calculated for varying prediction horizons h = 1, 2, . . . , 6. For short term forecasting horizons (h = 1, 2, and 3) model P 3 , which is based on Google search variables yields more accurate performance. For longer term forecasting horizons (h = 4, 5, and 6), model P 2 containing information from local spreads delivers Table 3 : Summary of G-causality analysis of Covid-19 cases and deaths on abnormal S&P 500 (y) on different lag effects (day). P and V ol denote significance in price and volatility, respectively. Blank space implies no significance. Confidence level is 90%. 

the most competitive results, followed by model P 4 , which contains information from Covid-19 deaths, local spreads, and Google searches. Figure 6 represents a comparison of the observed data with fitted values from baseline model (model P 0 ) and four other models, i.e., model P 1 , P 2 , P 3 , and P 4 . For 1 day horizon model P 3 yield a noticeably higher predictive accuracy followed by model P 4 . For 2 day horizon, although it is expected that the prediction performances of all models deteriorates compare to their performances for 1 day horizon, model P 3 again delivers the best prediction accuracy.

We now evaluate the influence of Covid-19 cases and deaths, US county level spreads of Covid -19, and Google searches in S&P 500 volatility. A comparison of the two EGARCH models, Model 0 and Model X (Eq. 15) including the estimated parameters of the explanatory variables for Model X are presented in Table 6 . All EGARCH coefficients expect the constant term (ω 0 ) are statistically significant in both models. However, unexpectedly, the coefficients estimates of all the covariates in Model X are not statistically significant. We also examine the goodness of fit of the two models by comparing their log likelihood, Akiake Information Criterion (AIC) and Bayesian information criterion (BIC). We find that Model 0 tends to describe the S&P 500 volatility more accurately than the volatility model with covariates, Model X. That is, Covid-19 cases and deaths, its local spread and Google searches do not significantly influence the S&P 500 volatility. Figure 7 also suggests that Model 0 captures the spikes of the price returns more accurately than Model X.

The aim of this paper is to evaluate whether Covid-19 cases and deaths, local spread spreads of Covid-19, and Google search activity explain and predict US stock market Crash in 2020.We develop a modeling framework that systematically evaluates the correlation -causality -predictive utility of each of the 6 Appendix Table 7 : Spearman correlations between covid-19 cases and abnormal S&P 500. blue color indicates significant correlation (p-values < 0.05), while black color represents non-significant correlation (p-values > 0.05). Table 9 : Spearman correlations between google trend and abnormal S&P 500. A significant correlation (p-values < 0.05) is represented by blue color, while black color indicates a non-significant correlation (p-values > 0.05). Table 10 : G-causality analysis of Google searches on abnormal S&P (y) on different lag effects (day). P and V ol denote significance in price and volatility, respectively. Blank space implies no significance. Confidence level is 90%.

Lag Causality 1 2 3 4 5 6 7 "Coronavirus" US y -------"Covid-19" US y P P P P P P P "Covid 19" US y V ol P P P P --"Covid -19" US y -P -----"Coronavirus" World y --P ----"Covid-19" World y P P P P P P P "Covid 19" World y -------"Covid -19" World y P P P P ---

Graphlet decomposition: Framework, algorithms, and applications

Quantifying the relationship between financial news and the stock market

Measuring investor sentiment in equity markets

Econometric analysis of realized volatility and its use in estimating stochastic volatility models

The Structure of Financial Networks

Political uncertainty and stock market returns: evidence from the 1995 quebec referendum

Information feedback in temporal networks as a predictor of market crashes

Google searches and stock returns

Twitter mood predicts the stock market

Multivariate leverage effects and realized semicovariance garch models

Pre-announcement effects, news effects, and volatility: Monetary policy and the stock market

Predicting stock index volatility: can market volume help

Natural disasters, insurer stock prices, and market discrimination: The case of hurricane hugo

Covid-19's adverse effects on a stock market index

The Economics of Natural Disasters -A Survey. Working Papers

The correct regularity condition and interpretation of asymmetry in egarch

Political uncertainty and stock market volatility in the middle east and north african (mena) countries

Dynamics of epidemic diseases on a growing adaptive network

On the role of local blockchain network features in cryptocurrency price formation

What network motifs tell us about resilience and reliability of complex networks

Epidemics on dynamic networks

Googling investor sentiment around the world

How information technologies shape investor sentiment: A web-based investor sentiment index

Investigating causal relations by econometric models and cross-spectral methods

The Elements of Statistical Learning

War and stock markets: The effect of world war two on the british stock market

If worst comes to worst: Co-movement of global stock markets in the us-china trade war

Stock prices and geographic proximity of information: Evidence from the ebola outbreak

Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks

Google searches and stock market activity: Evidence from norway

Evidence for chaotic dependence between us inflation and commodity prices

Globalization and disease: The case of sars

Toward epidemic thresholds on temporal networks: a review and open questions

On the invertibility of egarch(p, q)

A one line derivation of egarch

Network motifs: simple building blocks of complex networks

Properties of range-based volatility estimators

Conditional heteroskedasticity in asset returns: A new approach

The socio-economic implications of the coronavirus and covid-19 pandemic: A review

Sars: a non-event for affected countries' stock markets?

Covid-19 and stock market volatility doi

Quantifying trading behavior in financial markets using google trends

Complex dynamics of our economic life on different scales: Insights from search engine query data

Assessing European power grid reliability by means of topological measures

War and the world economy: Stock market reactions to international conflicts

Evaluating a news-aware quantitative trader: The effect of momentum and contrarian stock selection strategies

Investor sentiment and stock returns: Wenchuan earthquake

Economic impact of sars: The case of hong kong

Measuring nonlinear granger causality in mean

Analytical computation of the epidemic threshold on temporal networks

What the stock market tells us about the post-covid-19 world

Linking Granger causality and the Pearl causal model with settable systems

Measuring the impact of natural disasters on capital markets: an empirical application using intervention analysis

The impact of natural events and disasters on the australian stock market: a garch-m analysis of storms, floods, cyclones, earthquakes and bushfires

Infected markets: Novel coronavirus, government interventions, and stock return volatility around the globe

Stock market as temporal network