key: cord-0061543-sfa5fhu0 authors: Deb, Soudeep title: Analyzing airlines stock price volatility during COVID‐19 pandemic through internet search data date: 2021-02-02 journal: nan DOI: 10.1002/ijfe.2490 sha: e740af92a86eae260ba4231d5beb4405d8d4f583 doc_id: 61543 cord_uid: sfa5fhu0 Recent Coronavirus pandemic has prompted many regulations which are affecting the stock market. Especially because of lockdown policies across the world, the airlines industry is suffering. We analyse the stock price movements of three major airlines companies using a new approach which leverages a measure of internet concern on different topics. In this approach, Twitter data and Google Trends are used to create a set of predictors which then leads to an appropriately modified GARCH model. In the analysis, first we show that the ongoing pandemic has an unprecedented severe effect. Then, the proposed model is used to analyse and forecast stock price volatility of the airlines companies. The findings establish that our approach can successfully use the effects of internet concern for different topics on the movement of stock price index and provide good forecasting accuracy. Model confidence set (MCS) procedure further shows that the short‐term volatility forecasts are more accurate for this method than other candidate models. Thus, it can be used to understand the stock market during a pandemic in a better way. Further, the proposed approach is attractive and flexible, and can be extended to other related problems as well. The year 2020 has been hit severely by a pandemic of a new kind, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV 2). Presumably, it started in December of 2019, when a cluster of cases resembling viral pneumonia were associated with a point source of spread from a fish market in Wuhan, Hubei, China (Shereen, Khan, Kazmi, Bashir, & Siddique, 2020) . It has spread to more than 180 countries ever since and as of May 1, 2020, it has already affected more than 3.4 million people, while the number of death tolls stands at more than quarter of a million worldwide. The outbreak was declared a Public Health Emergency of International Concern by the World Health Organization (WHO) on January 30, 2020. WHO named the disease as COVID-19 on February 11, 2020. The reader is referred to Sohrabi et al. (2020) for a detailed review of the disease while Deb and Majumdar (2020) , Roy and Karmakar (2020) , Wu, Leung, and Leung (2020) and the references therein are some studies on understanding and predicting the number of cases affected by the virus. Given that the disease spreads through cough droplets and is very contagious, most countries adopted complete or partial lockdown approaches to mitigate the spread of the disease. Many other policies, for example holiday extension, school and university closure, work from home system, hospitalization and home quarantine, have also been adopted by different countries. Lin et al. (2020) researched on the effect of such individual reaction and governmental actions on the outbreak. Chinazzi et al. (2020) , meanwhile, studied the effect of travel restrictions on the same. These papers confirm that such measures have helped contain the pandemic to some extent. However, an immediate consequence was the effect on economy. As Enserink and Kupferschmidt (2020) pointed out, long lockdowns to slow an epidemic can have catastrophic economic impacts and may thereby devastate public health as well. To that end, it is of utmost importance to understand the immediate and long term effects of COVID-19 on different industries. Guerrieri, Lorenzoni, Straub, and Werning (2020) and McKibbin and Fernando (2020) are worth reading in this regard. In this paper, our focus is on airline industry which is expected to suffer heavily due to the pandemic. Historically, it has been seen that the whole transportation industry is highly subject to the impacts of system risks, which can be triggered by a large number of external factors beyond control and the corresponding damages are inevitable. These factors include financial events (recession, fuel price change etc.), natural calamities (hurricane, tsunami, polar vortex etc.) or man-made disasters (war, terrorist attack etc.). A pandemic is clearly one such factor which can significantly affect the airlines industry. It is no secret that globalization has helped the airlines industry grow considerably over the last couple of decades. But, with COVID-19 gripping over the whole world, tourism has taken a halt almost everywhere. Fernandes (2020) discussed how various countries have been suffering from lack of tourism during the last few months. In another recent study, Iacus, Natale, Santamaria, Spyratos, and Vespe (2020) worked with the historical air traffic data and projected the volume of the same under different assumptions on the ongoing pandemic. It is evident that all airlines companies across the world are incurring huge losses as a consequence. An immediate result was visible on the stock price index. Starting from the end of January 2020, the stock price for most airlines dropped and a higher than usual level of volatility was also observed. This erratic behaviour of the stock market was unprecedented and was not specific to only the airlines industry, cf. Baker et al. (2020) . In order to understand how severe the effect is at the moment and to what extent even the big airlines companies can suffer, in this paper we develop a method which uses the internet concern of certain topics to better capture and forecast the effect of a pandemic on the stock price volatility. Throughout the paper, we work with the data for three major airlines of the United States of America (USA), namely American Airlines, Delta Airlines and United Airlines. We however emphasize that the proposed method can easily be extended to analyse other airlines of interest to the reader. Hereafter, the above three airlines will be abbreviated as AA, DA and UA, respectively. The main takeaways from the paper lie both in the modelling approach and the results. Generalized autoregressive conditional heteroskedastic models (see Section 3.2 for more discussions) and its variants are most popular in forecasting stock price volatility. However, the standard approaches can analyse the effect of a pandemic only by introducing related regressors (for example, number of COVID-19 cases, lockdown policies and travel restrictions). In comparison, we provide a different and more interesting way to understand the effect of a pandemic on the volatility of airlines stock price index. Here, we focus on public reaction to different news and announcements. We combine Twitter data and Google search volumes to quantify the public interest, and show how that can be used in modelling airlines stock price returns. While somewhat similar approaches were taken to forecast crude oil prices (Yu, Zhao, Tang, & Yang, 2019) , to the best of our knowledge, such method has never been developed in the context of aviation industry. Further, we find that the model works better for forecasting short-term volatility for the stock price returns. Naturally, it has the potential to be adapted and used in other applications as well. The rest of the paper is organized as follows. Descriptions of the data and some exploratory analysis are provided in Section 2. The methods are outlined in Section 3 while in Section 4, we discuss the main results of the study. Finally, concluding remarks, limitations of the study and future scopes are listed in Section 5. All of our calculations are carried out in RStudio version 1.2.5033, coupled with R version 3.6.2. As mentioned earlier, throughout this paper, stock price indexes of AA, DA and UA are considered as the primary variables. We take the closing price for each available day, starting from January 1, 2015 to April 30, 2020. This data is obtained from Yahoo Finance. For price P s,t of stock s at time t, we compute the log-return as In Figure 1 , the daily series of the returns for the three airlines stocks are presented. Dates for the COVID-19 epidemic period and the 2017-18 flu season (which was the last time USA suffered extensively due to an epidemic) are also marked in the same graph. It is evident that the return values fluctuated a lot following the F I G U R E 1 Time series of daily returns for the stock price index of three airlines. Left panel shows the plots since 2014, and the right panel shows the plots for the data from only 2020 pandemic announcement about COVID-19, contrary to what was observed in the earlier pandemic. This is more clearly visible on the right panel of the figure, where we see the return series only in 2020. In the following discussion, we will quantify the daily volatility by taking absolute returns. Also, since we are primarily interested in short term volatility forecasting, throughout this paper, we keep last 3 weeks' data (after April 12, 2020) as the test set and use everything else as the training set. That results in 1,327 days worth of data in the training set. Next, we take a look at the numbers for the COVID-19 outbreak in the USA. In addition, just for comparison, we include the 2017-18 flu season as well. It was widespread and as many as 32 states had high activity. Almost a million people were affected while in excess of 60,000 people passed away. In comparison, COVID-19 has so far affected more than 1 million people and has claimed almost 63,000 lives in the USA. An overview of these two epidemics is presented in Table 1 . The COVID-19 data are obtained from JHU-CSSE (2020) and the time range is from January 22, 2020 to April 30, 2020. In the above table, November 15, 2017 is taken as the start of the outbreak for the 2017-18 flu season as the influenza-like-illness activity began to increase around that time. The outbreak reached an extended period of high activity during January and February across the whole country, and remained at that high level through the end of March. On the other hand, WHO made a declaration about COVID-19 on February 11, 2020. The number of cases in the USA started to increase rapidly from around that time and it has not yet showed any sign of decreasing. Figure 2 shows the daily series of number of affected cases in the USA. In continuation with the previous section, we start with a look at the abnormal returns for the airlines stock prices following the COVID-19 outbreaks. We use event study methodology which is based on the hypothesis that capital markets are efficient in processing information by establishing an appropriate new stock price equilibrium as soon as new information becomes available, cf. Fama, Fisher, Jensen, and Roll (1969) . The underlying logic is that the investors in capital markets process the available information on different firm activities and external events, and consider both the impact on current performance and the impact on the future performance of the firms. With more information being available, a stock price index would change and would reflect the investors' revised consensus about future profitability. The changes in investors' beliefs regarding the future profitability of a firm are thus reflected in abnormal returns, which are the risk adjusted returns in excess of the firm's expected return. As pointed out by McWilliams and Siegel (1997) , abnormal returns provide a unique way of associating the impact of a particular event on the firm's expected profitability in future periods. Thus, a first step to understand the effect of a pandemic is to measure the abnormal return of the airlines following the outbreak. Let R s,t denote the return of a specific stock s at time t and R m,t denote the return of the market index at time t. In our analysis, S&P 500 index (data obtained from Yahoo Finance) is used for that purpose. Now, consider the ordinary linear regression: Suppose, T denotes the time of the event (start date of an outbreak) we are interested in. We use all data of the stock s for t ≤ T to fit the above model and estimate the coefficients, hereafter denoted byα s ,β s : Then, for all t ' > T, the abnormal return of the stock s is evaluated by In other words, the abnormal returns are the prediction errors of the model described by Equation (2). They denote the returns over and above the return predicted by general market trend on the days past an event of interest. We assume that the abnormal returns are the result of the pandemics and not some other random event occurring on the same day. Further, we compute the cumulative abnormal return (CAR) of stock s corresponding to a pandemic happening at time T. For a window of length w, it is defined as: In Figure 3 , CARs for different window-lengths (up to 100, which is roughly 4 months) following the COVID-19 outbreak are presented. For comparison, the same results for 2017-18 flu epidemic and the 2007-08 global recession are included as well. Note that the latter has been the worst financial crisis during the last decade and it affected the airlines industry severely. Wang (2013) can be referred for a relevant in-depth discussion. The graphs portray similar story for all of the three airlines. It is evident that the effect was minimal for 2017-18 flu. On the contrary, COVID-19 has already caused a much more severe abnormal return for all three stocks which is more or less similar to the great recession. It was however unprecedented. One can guess that the extent of public reaction to the pandemic and its effect on the airlines stocks could not be judged properly. That naturally brings forth an important question. Is there another way to develop an appropriate modelling technique which will help us in analysing and predicting the stock returns? We aim to tackle that issue in the following section. One of the main assumption of this paper is that the public interest and behaviour during a pandemic can explain the high level of volatility of the airlines stocks. We quantify this by taking the volume of search history in Google. For that, an appropriate set of keywords is first found out with the help of Twitter data. Note that Twitter, at present, has more than 300 million active users worldwide, while in the USA, nearly 50 million active users are recorded per month. Over the last few years, many interesting studies have come forward to establish that the Twitter data can be leveraged wisely in financial modelling. For example, Mao, Wei, Wang, and Liu (2012) showed that S&P 500 stocks are highly correlated with the daily volume of tweets that mention "S&P stocks". A few more papers have established that a sentiment analysis of twitter data can be instrumental in predicting stock market movements. Ranco, Aleksovski, Caldarelli, Grčar, and Mozetič (2015) and Pagolu, Reddy, Panda, and Majhi (2016) are worth mention in this regard. While these studies make use of public tweets directly in the modelling, our objective is not to assess the public sentiment from the tweets, but to solely get a sense of the main topics of interest for general people. It is true that nowadays Twitter acts as a news source for not only public, but even for mainstream media as well. Refer to Moon and Hadley (2014) for some relevant discussion. For our purpose, 10 official Twitter handles, namely the ones from American Airlines, Delta Airlines, United Airlines, Yahoo Finance, Financial Times, CNBC, CNN, MSNBC, NBC News and Fox News, are selected. These are chosen carefully to reflect three aspects that are of interest to public so far as our study is concerneddetails pertaining to travelling, flights and airlines, information regarding economy and stock market, daily news and current affairs. Now, a sample of 30,000 tweets are parsed from the above handles and we do a term-frequency analysis to find out the most common terms. At this stage, the stop words, links and tags are removed and the data is cleaned to reflect the main topics of the tweets. Next, we look at the most frequent words and identify 25 key-phrases that are of public interest for various F I G U R E 2 Daily number of COVID-19 affected cases (log-transformed) in the USA reasons. From now on, we will call this set of main keywords and phrases as "topics". They are listed below. Observe that the topics can be broadly categorized into three categories. First one is the set of topics people would search usually without any pressing concern. We call it "general" and examples are "American Airlines" and "Delta Airlines Stock". Next, the topics like "flight status", "cancellation" and "reservation" are used when someone needs quick update or action on travel plans. We would name this category "travel". Finally, the topics F I G U R E 3 CAR values for different window-lengths following the four pandemics for the returns of three airlines stocks [Colour figure can be viewed at wileyonlinelibrary.com] related to pandemics and relevant concerns, for example "coronavirus", "epidemic" and "community spread", are reserved for the category "COVID". We are primarily interested in the effect of the third category on the airlines stock prices. The full list of topics are provided below: • General (hereafter denoted as G ): American Airlines, Delta Airlines, United Airlines, economy, American Airlines stock, Delta Airlines stock, United Airlines stock, stock price, President Trump, government. • Travel (hereafter denoted as T ): flight cancellation, flight status, flight delay, flight assistance, flight booking, reservation. • COVID (hereafter denoted as C): lockdown, Coronavirus, pandemic, community spread, COVID, travel ban, shutdown, disease, epidemic. We emphasize that the above categorization is subjective and is done carefully using the tweets shared by the official handles. Now, to do our main analysis, for all of the above topics, we want to obtain the Google search volumes to capture the level of public interest on different days. At this stage, we make a minor adjustment and choose not only these exact phrases, but also five most common queries related to the topics. For example, "American Airlines" is related to the queries "flight", "American Airlines Flight", "American Airlines Flights" and "flights". The phrase "community spread" is related to "coronavirus", "coronavirus community spread", "what is community spread" and "community spread virus". Next, using Google Trends, we compute the daily Google search volume for the 25 topics. Let S it denote the search volume for topic i on t-th day. It is worth mention that Google Trends provide the relative interest for every keyword within a topic and for every day on a scale from 1 to 100. So, proper measures are taken to convert the variables to form the daily time series we require. Further, in line with how return series is defined as the change in stock price, we compute daily change in the search volume. We call this "internet concern" and for topic i on t-th day, it is denoted by IC it . A much needed exercise at this stage is to ensure that the internet concerns for the topics are in fact useful to predict changes in volatility. We perform Granger causality tests in that regard, cf. Granger (1969) . In Table 2 , the F-values corresponding to the Granger tests between IC it and daily volatility for the three airlines stocks are reported. Significance at different levels are denoted by asterisks. We can see that the airlines' names are usually significant, and so are "United Airlines Stock", "stock price" and "government". However, other stock price related terms are not significant, suggesting that there has not been any discernible change in daily search volume that can be used to predict shifts in stock return volatility. Similar thing can be said about topics in category T , except for "flight assistance". On the other hand, all but two topics in category C have been found to Grangercause the stock returns. It is worth noting that the results of the Granger causality tests for the reverse relationship are similar as well. Based on the above findings, we decide to take an appropriately parsimonious approach to avoid too many predictors in the model and aggregate the internet concerns of the significant predictors within every category. Here, we consider a topic to be significant if it passes the Granger test for at least one of the three cases. Note that there are only four topics which do not show similar results for all of the airlines and therefore, this simplification will not affect the conclusions much. The three predictors thus created are given below: Here, b i is a binary term which equal to 1 (or 0) represents whether topic i is significant (or not). We shall also use the daily number of new COVID-19 cases in the USA as a predictor in our model. For day t, this is going to be denoted by N t . We start with a brief discussion of auto-regressive moving average (ARMA) methods which have been used by many researchers to analyse stock returns. Consider a time series (Y t ) 1 ≤ t ≤ n and let us assume it to be stationary. An ARMA model of order (p, q) is defined by In the above equation, e i 's are assumed to be zeromean, independent and identically distributed Gaussian random variables. The ϕ coefficients form the autoregressive (AR) part and the θ coefficients denote the moving average (MA) part. One can also use ϕ(x) = 1 − ϕ 1 x − … − ϕ p x p to denote the AR characteristic polynomial, θ(x) = 1 + θ 1 x + … + θ q x q to denote the MA characteristic polynomial and B to denote the backward shift operator, in which case, the above model can be written in a simplified form as ϕ(B)Y t = θ(B)e t . For further reading, (Box, Jenkins, Reinsel, & Ljung, 2015, chap. 3 and 4) is an excellent reference. One major problem with the ARMA models is that they assume homoskedasticity in the error structure, which is often not the case for financial time series data. To tackle that, Engle (1982) developed the autoregressive conditional heteroskedasticity (ARCH) models to model the time-varying volatility in the data. Later, Bollerslev (1986) extended it to develop generalized autoregressive conditional heteroskedasticity (GARCH) methods which have been extensively used for volatility forecasting in economics over the last couple of decades. Worthington et al. (2008) and Kathiravan, Selvam, Maniam, and Venkateswar (2019) are two relevant examples of the use of GARCH models in the context of airlines industry. In the current work, we are going to adopt ARMA-GARCH models, where ARMA is used to fit the mean and GARCH to fit the variance. A standard ARMA (p, q)-GARCH(a, b) model is defined mathematically as follows. Here, Z t is a white noise process. The coefficients in the model are assumed to be positive. In order to select the most appropriate lags in the mean structure, we would use Akaike information criterion (AIC) defined as 2k − 2logL , where k is the number of parameters in the model andL is the value of the likelihood. The GARCH orders are taken to be a = 1, b = 1 throughout the study. Experimentation showed that this specification works the T A B L E 2 F-values for the Granger-causality tests between daily volatility of three airlines stocks and internet concern for different topics Epidemic 33.5*** 23.17*** 24*** Note: Asterisk is used to point out significance at 1% (***), 5% (**) and 10% (*) levels. best for forecasting performances. Further, in order to assess the efficacy of the internet concern predictors [see Equation (6)], we include their lagged values (or lags up to 4 days) as exogenous regressors in both the mean and the variance structure of the model. Finally, the GARCH (1, 1) specification is done through Glosten-Jagannathan-Runkle (GJR) method, introduced by Glosten, Jagannathan, and Runkle (1993). Using  Á ð Þ as the indicator function, this model can be written as the following: To summarize, we start with a unit root test to check for stationarity and in case of nonstationarity, an initial differencing step can be done. Then, we use an ARMA-GARCH model for the return series where the mean structure is defined with lagged values of G t , T t , C t as exogenous predictors and the ARMA orders are chosen based on AIC, while the variance structure is GARCH with order (1, 1) and is defined with same exogenous predictors. Hereafter, for convenience, this model will be called MICP (model with internet concern predictors). For comparison purposes, we also use two other types of ARMA-GARCH models. One of them is an ARMA-GARCH model with lagged values of daily COVID-19 cases (N t ) as exogenous regressors in both mean and variance structures. We call it MCASES. Similarly, another ARMA-GARCH model where only the COVID internet concern predictor (C t ) is taken as the exogenous regressor is used as well. This is denoted by MCOVID. Finally, a standard ARMA-GARCH model without any exogenous predictor is used as Baseline model in our analysis as well. In this section, we discuss the model confidence sets (MCS) procedure (Hansen, Lunde, & Nason, 2011) which is used to test if one of the candidate models has superior predictive power than others. Let M denote the set of candidate models we consider. The forecasting efficiency of these models are evaluated under some loss function for the test set P. Let L m,t denote the loss associated with m M, t P. The relative performance variables are then defined as d m 1 m 2 ,t = L m 1 ,t −L m 2 ,t . Let  d m 1 m 2 ,t ð Þ be the expectation of d m 1 m 2 ,t under the loss function. Then, the objective of the MCS procedure is to choose a model set M * such that The procedure follows a sequence of significance tests with null hypothesis of the form At every step, an equivalence test δ M 0 is conducted to test if any two models in M 0 perform equally well under the loss function. If δ M 0 is rejected, an elimination rule e M 0 eliminates the model with poor performance. After repeating the tests as needed, the set of surviving models M * is obtained. Throughout this paper, we use a significance level of 0.1 for the MCS procedure. So far as the test statistic is concerned, the authors suggested different choices and we are going to use the range statistic T R . If d mm 0 and v d mm 0 ð Þare, respectively, the mean and variance of d mm 0 ,t for t P, then Note that the asymptotic distribution of T R is nonstandard, and it depends on some nuisance parameters. However, as the authors pointed out, it can be estimated using bootstrap methods that implicitly solve the nuisance parameter problem. For implementation of the above procedure using bootstrap methods, the R package MCS by Catania and Bernardi (2017) is used. As a last piece of the MCS procedure, we need to choose appropriate loss functions. Earlier, it has been discussed (see Diebold & Mariano, 2002 ) why a single loss function is not the best way to distinguish advantages and disadvantages of the predictive power for various models. In light of that, we use three different loss functions in our work. They are mean squared error (MSE), mean absolute error (MAE) and quasi-likelihood (QLIKE). Let y t be the daily volatility of a stock price and y t be the corresponding forecast, for t P, the test set. Then, the above loss functions are defined as follows: Recall that all data before April 12, 2020 are used for training purposes and the rest are kept as test set. We use four candidate models as mentioned in Section 3.2. For all of these models, GARCH order is taken as (1, 1) whereas the ARMA order is chosen by the least value of AIC. In Table 3 , the specifications and the in-sample AIC of these four types of candidate models are presented. Maximum order of 4 is used while choosing the best ARMA orders. It has been found that for a fixed GARCH order and regressors, the AIC values for different ARMA T A B L E 3 List of ARMA-GARCH models used to analyse and predict the returns and volatility of three airlines stocks Note: Predictors are used in both mean and variance structure. F I G U R E 4 Conditional volatility for AA stock returns, as modelled by four candidate models specifications are indeed very close. Hence, it should not have significant role in the forecasting performance. Also, the AIC values decrease only marginally after introducing the predictors. The conditional volatility of the stock returns in the training set, as modelled by the four candidate models, are presented in Figure 4 (for AA), Figure 5 (for DA) and Figure 6 (for UA). It is clear that all models capture the peaks and troughs in the stock price movement in a comparable way. Especially, towards the end of the training period, the high level of volatility can be explained by the models as well. Detailed results of the estimated coefficients and standard errors of all these models are provided in the Appendix A, in tables A1, A2 and A3. We now turn our attention to forecasting performance, the more important aspect of the analysis. First, the results of the MCS procedure are presented in Table 4 . For every model, we get the predicted values of the volatility and compare it to the observed ones from the test set. Three different loss functions and a 10% significance level, as mentioned in the last section, are used in these tests. Thus, p-values of greater than 0.1 indicate which models survive the MCS test under the specific loss function and the test statistics T R , while a p-value of 1 would indicate a model performing the best out of all candidates. It can be observed that for all cases, MICP survives the MCS test. In fact, it performs the best of the four in all cases. The baseline model, except for DA stock returns, does not survive the MCS test. Same thing is true for MCOVID and MCASES as well. These two however, unlike the baseline model, perform equally good as MICP for DA stock returns. These results are very straightforward to establish that MICP is the single best model across all loss functions. Next, we take a look at the improvement in the three loss functions for volatility forecasting from the baseline model for the three stock returns. Table 5 lists these values. For AA, only MICP performs better than the baseline model in terms of all three loss functions. For the other two airlines stock returns, all three models beat the baseline model. MCASES does terribly for all loss functions in case of AA. For DA stock returns though, it has much better improvement over other two models in terms of MAE and MSE whereas under QLIKE loss, the three models see comparable improvement. Finally, for F I G U R E 5 Conditional volatility for DA stock returns, as modelled by four candidate models UA data, MICP beats MCASES and MCOVID ubiquitously, but MCASES does better than MCOVID. Further, note that for DA stock returns, the improvement in MSE and MAE are much more substantial than AA or UA stock returns. On the other hand, a closer look at the results has shown that MCASES, which uses daily number of COVID-19 cases as the predictor, always tends to predict high volatility, which is not the reality for AA stock returns, and hence this model fails miserably there. Overall, we see that the proposed model outperforms the other candidate models. The actual number of daily COVID- 19 cases is not a good predictor for volatility forecasting for airlines stock prices as it suffers heavily from overforecasting issues. With the number of cases growing exponentially in the USA, a much better strategy is to quantify the effect of the pandemic on general public by using the internet concern variable defined in this paper. Moreover, based on the above findings, one can say that the internet concern for different aspects, for example general, travel related and pandemic related, contain more useful information than the internet concern for only pandemic related aspect. This paper serves two main purposes. On one hand, it is an endeavour to understand the severity of COVID-19 on airlines stock prices while on the other, the proposed model provides a new way to extract relevant information from people's behaviour on the internet to construct useful predictors for forecasting stock return volatility. It is found that the information obtained from the internet search volume on certain topics provide higher predictive accuracy. Thus, this study can provide a useful analytics tool for market participants, policymakers, and market regulators to understand the movement of airlines stock market in the future. In fact, not only for a global issue like COVID-19 pandemic, but this type of approach can also be used to understand and predict the stock market from other aspects, such as political activities, terrorism, crashes or accidents, in short whatever can be thought to leave a significant effect on the airlines stock prices. An attractive feature of the proposed method is its flexibility in incorporating more information. For instance, we aggregated the topics within a category to get a holistic view of the internet concern in the three broad aspects. Instead, one can choose to treat the IC it series separately for every topic, and use those data in the model. Adding more topics within a category or considering other categories is also possible to gather more information in the internet concern predictors. On that note, a limitation of the proposed method is that the Twitter data analysis has been done in a limited capacity. We collected approximately 30,000 tweets in this analysis whereas more than millions of tweets are available to conduct a more extensive study. Although it demands a lot more computational burden, a more detailed analysis of Twitter data can in fact be used to formulate a time-dependent set of topics, which can be subsequently used to create an insightful predictor that captures the internet concern on trending topics more appropriately. This has the potential to be used efficiently in short term volatility forecasting, and is a good future direction to this work. Another interesting extension of the current framework is to use GARCH MIDAS model, which has been used in the context of volatility forecasting in recent times, in conjunction with the internet search predictors. Finally, coming to the main data we analysed in this paper, one can see that there has been a much higher level of volatility in the stock price index of three of the biggest airlines companies from the USA, and that can be explained by the internet concern on the relevant topics related to the pandemic. With COVID-19 cases increasing everyday, there is a chance that in near future, the governments will have to impose new policies to contain the virus. That is bound to cause problems for the aviation industry and the major companies will incur losses once again. It can be hypothesized that the other big companies are also going to suffer similarly while the smaller airlines will take a greater hit should that happen. Earlier in March 2020, British regional airlines Flybe ran out of cash and entered administration. In the USA, two small-scale services Compass Airlines and Trans States Airlines suffered the same fate in April. Elsewhere in the world, Virgin Australia is another company who went into bankruptcy as an effect of the coronavirus carnage. Many other similar airlines companies are currently hanging by the thread and are hoping to see the pandemic to come to an end. This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors. No potential conflict of interest was reported by the author. The data that support the findings of this study are openly available in JHU-CSSE (https://github.com/ CSSEGISandData/COVID-19) and Yahoo Finance at https://in.finance.yahoo.com/. Note: SEs are provided in parentheses. *Denotes significant effect at 5% level of significance. The unprecedented stock market reaction to COVID-19 Generalized autoregressive conditional heteroskedasticity Time series analysis: Forecasting and control MCS: Model confidence set procedure The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak A time series method to analyze incidence pattern and estimate reproduction number of COVID-19 Comparing predictive accuracy Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation With COVID-19, modeling takes on life and death importance The adjustment of stock prices to new information Economic effects of coronavirus outbreak (COVID-19) on the world economy. Available at SSRN 3557504 On the relation between the expected value and the volatility of the nominal excess return on stocks Investigating causal relations by econometric models and cross-spectral methods Macroeconomic implications of COVID-19: Can negative supply shocks cause demand shortages? The model confidence set Estimating and projecting air passenger traffic during the COVID-19 coronavirus outbreak and its socio-economic impact 2019 Novel coronavirus COVID-19 (2019-NCOV) data repository by Johns Hopkins CSSE Relationship between crude oil price changes and airlines stock price: The case of indian aviation industry A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action Correlating S&P 500 stocks with twitter data The global macroeconomic impacts of COVID-19: Seven scenarios Event studies in management research: Theoretical and empirical issues. Academy of Management Routinizing a new technology in the newsroom: Twitter as a news source in mainstream media October) Sentiment analysis of twitter data for predicting stock market movements The Effects of Twitter Sentiment on Stock Price Returns Bayesian semiparametric time varying model for count data to study the spread of the covid-19 cases COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) The impact of crisis events on the stock returns volatility of international airlines The impact of natural events and disasters on the Australian stock market: a GARCH-M analysis of storms, floods, cyclones, earthquakes and bushfires Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in Wuhan, China: A modelling study Online big data-driven oil consumption forecasting with Google Trends How to cite this article: Deb S. Analyzing airlines stock price volatility during COVID-19 pandemic through internet search data ORCID Soudeep Deb https://orcid.org/0000-0003-0567-7339