key: cord-0437730-py45odgq
authors: Corona, Francisco; Gonz'alez-Far'ias, Graciela; L'opez-P'erez, Jes'us
title: A nowcasting approach to generate timely estimates of Mexican economic activity: An application to the period of COVID-19
date: 2021-01-25
journal: nan
DOI: nan
sha: ec31723bd2bc829a38b5d72ba77e81b2701665d4
doc_id: 437730
cord_uid: py45odgq

In this paper, we present a new approach based on dynamic factor models (DFMs) to perform nowcasts for the percentage annual variation of the Mexican Global Economic Activity Indicator (IGAE in Spanish). The procedure consists of the following steps: i) build a timely and correlated database by using economic and financial time series and real-time variables such as social mobility and significant topics extracted by Google Trends; ii) estimate the common factors using the two-step methodology of Doz et al. (2011); iii) use the common factors in univariate time-series models for test data; and iv) according to the best results obtained in the previous step, combine the statistically equal better nowcasts (Diebold-Mariano test) to generate the current nowcasts. We obtain timely and accurate nowcasts for the IGAE, including those for the current phase of drastic drops in the economy related to COVID-19 sanitary measures. Additionally, the approach allows us to disentangle the key variables in the DFM by estimating the confidence interval for both the factor loadings and the factor estimates. This approach can be used in official statistics to obtain preliminary estimates for IGAE up to 50 days before the official results.

Currently, the large amount of economic and financial time series collected over several years by official statistical agencies allows researchers to implement statistical and econometric methodologies to generate accurate models to understand any macroeconomic phenomenon. One of the most important events to anticipate is the movement of the gross domestic product (GDP) because doing so allows policy to be carried out with more certainty, according to the expected scenario. For instance, if an economic contraction is foreseeable, businesses can adjust their investment or expansion plans, governments can apply countercyclical policy, and consumers can adjust their spending patterns.

As new economic and financial information is released, the forecasts for a certain period are constantly also being updated; thus, different GDP estimations arise. In this sense, a new, unexpected event can drastically affect predictions in the short term; consequently, it might be necessary to use not only economic and financial information but also nontraditional and highfrequency indicators, such as news, search topics extracted from the Internet, social networks, etc. The seminal work of Varian (2014) is an obligatory reference for the inclusion of highfrequency information by economists, and Buono et al. (2018) is also an important reference to characterize the types of nontraditional data and see the econometric methods usually employed to extract information from these data.

Thus, the term "nowcast", or real-time estimation, is relevant because we can use a rich variety of information to model, from a multivariate point of view, macroeconomic and financial events, plus specific incidents that can affect the dynamics of GDP in the short run. Econometrically and statistically, these facts are related to the literature on large dynamic factor models (DFMs) because a large amount of time series is useful to estimate underlying common factors.

First introduced in economics by Geweke (1977) and Sargent and Sims (1977) , DFMs have recently become very attractive in practice given the current requirements of dealing with large datasets of time series using high-dimensional DFM; see, for example, Breitung and Eickmeier (2006) , Bai and Ng (2008) , Stock and Watson (2011) , Breitung and Choi (2013) and Bai and Wang (2016) for reviews of the existing literature.

An open question in the literature on large DFMs is whether a large number of series is adequate for a particular forecasting objective. In that sense, preselecting variables has proven to reduce the error prediction with respect to using the complete dataset Boivin and Ng (2006) ; that is, not always by using a large set of variables, we can obtain closer factor estimates with respect to when we use fewer variables, especially under finite sample performance Poncela and Ruiz (2016) . Even when the number of time series is moderate, approximately 15, we can accurately estimate the simulated common factors, as shown by Corona et al. (2020) 

Carlo analysis. The latter also corroborates that the Doz et al. (2011) two-step (2SM) factor extraction method performs better than other approaches available in the literature above all when the data are nonstationary. DFM methodology has already been used to nowcast or predict the Mexican economy. Corona et al. (2017a) , one of the first works in this line, estimated common trends in a large and nonstationary DFM to predict the Global Economic Activity Indicator (IGAE in Spanish) two steps ahead and concluded that the error prediction was reduced with respect to some benchmarking univariate and multivariate time-series models. Caruso (2018) focuses on international indicators, mainly for the US economy, to show that its nowcasts of quarterly GDP outperform the predictions obtained by professional forecasters. Recently, Gálvez-Soriano (2020) concluded that bridge equations perform better than DFM and static principal components (PCs) when making the nowcasts of quarterly GDP. An important work related with timely GDP estimation is Guerrero et al. (2013) where, based on vector autoregression (VAR) models, they generate rapid GDP estimates (and its three grand economic activities) with a delay of up to 15 days from the end of the reference quarter, while the official GDP takes around 52 days after the quarter closes. This work is the main reference to INEGI's "Estimación Oportuna del PIB Trimestral." 1 Although prior studies are empirically relevant for the case of Mexico, our analysis goes beyond including nontraditional information to capture more drastic frictions that occur in the very short run, one or two months. We identify that previous works focus on traditional information, which limits their capacity to predict the recent historical declines attributed to COVID-19 and the associated economic closures since March 2020. Our approach maximizes the structural explanation of the already relevant macroeconomic and financial time series with the timeliness of other high-frequency variables commonly used in big data analysis.

In this tradition, this work estimates a flexible and trained DFM to verify the assumptions that guarantee the consistency of the component estimation from a statistical point of view.

That is, we use previous knowledge and attempt to fill in the identified gaps by focusing on the Mexican case in the following ways: i) build a timely and correlated database by using traditional economic and financial time series and real-time nontraditional information, determining the latter relevant variables with least absolute selection and shrinkage operator (LASSO) regression, a method of variable selection; ii) estimate the common factors using the two-step methodology of Doz et al. (2011) ; iii) train univariate time series models with the DFM's common factors to select the best nowcasts; iv) determine the confidence intervals for both the factor loadings and the factor itself to analyze the importance of each variable and the uncertainty attributed to the estimation; and iv) combine the statistically equal better nowcasts to generate the current estimates.

In practice, we consider the benefits of this paper to be opportunity and openness. First, given the timely availability of the information that our approach uses, we can generate nowcasts of the IGAE up to 50 days before the official data release; thus, our approach becomes an alternative to obtaining IGAE's preliminary estimates, which are very important in official statistics. Second, this paper illustrates the empirical strategy to generate IGAE nowcasts stepby-step to practitioners, so any user can replicate the results for other time series. Third, and very important, the nowcasting approach allows to known which variables are the most relevant in the nowcasts, consequently, we emphasize in the structural explanation of our results.

The remainder of this paper is structured as follows. The next section, 2, summarizes the Mexican economy evolution in the era of COVID-19. Section 3 presents the methodology 1 https://www.inegi.org.mx/temas/pibo/ considered to generate the nowcasts. Section 4 describes the data and the descriptive analysis.

Section 5 contains the empirical results. Finally, Section 6 concludes the paper.

The first six months of the COVID-19 pandemic (until September 2020) has had severe impacts on the Mexican economy. The first case of coronavirus in Mexico was documented on February 27, 2020. Despite government efforts to cope with the effects of the obligatory halt of economic activity, GDP in the second quarter plummeted with a historic 18.7% yearly contraction. Moreover, the pandemic accelerated economic stagnation that had begun to show signs of amelioration, following three quarters of negative growth of 0.5, 0.8 and 2.1% since the third quarter of 2019. However, starting in 2020, the actual values were not foreseen by national and international institutions such as private and central banks. For example, the November 2019 Organisation for Economic Co-operation and Development Economic Outlook estimated the real GDP variation for 2020 at 1.29%, while the June 2020 report updated it to -8.6%, a difference of 9.8% in absolute terms. Moreover, even when the Mexican Central Bank expected barely zero economic growth for 2020, placing its November 2019 outlook between -0.2% and 0.2%, it did not anticipate such a contraction as has seen so far this year.

Between January 2019 and February 2020, before the COVID-19 outbreak started in Mexico, the annual growth of IGAE 2 already showed signs of slowing and fluctuated around -1.75 and 0.76%, and since May 2019, the economy exhibited nine consecutive months of negative growth.

Broken down by sector and using IGAE, the economy suffered devastating consequences in the secondary and tertiary sectors. Overall, the pandemic brought about -19.7, -21.6 and -14.5% contractions in total economic activity for April, May and June of 2020, respectively.

The industrial sector registered the deepest contractions, reducing its activity in April and May by -30.1 and -29.6%, respectively, in annual terms, mainly driven by the closure of manufacturing and construction operations, which were considered nonessential businesses, following a slight recovery in June, -17.5%, when an important number of activities, including automobile manufacturing, resumed but remained at low activity levels. The services sector also suffered from lockdown measures, falling by -15.9, -19 and -13.6% in the three months of the second quarter, respectively, especially due to transportation, retail, lodging and food preparation, mainly due to the decrease in tourist activity, although restaurants and airports were not closed. The primary sector showed signs of resilience and even grew in April and May 2020, by 1.4 and 2.7%, and only shrank in June by -1.5% on an annual basis.

The great confinement in Mexico, which officially lasted from March 23 to May 31 (named "Jornada Nacional de Sana Distancia"), had severe consequences for the components of the aggregate demand: consumption, investment and foreign trade suffered consequences. Consumption had been on a deteriorating path since September 2019, and in May 2020, the last month for which data are available, it exhibited a -23.5% plunge compared to the same period of 2019. Similarly, investment, which peaked in June 2018, continued to deteriorate and registered a drop of -38.4% in May 2020 on a year-over-year basis. Regarding international trade, exports began to abate in August 2019, hit a record low in May 2020, and despite a slight recovery in June, the yearly variation in July 2020 was still -8.8% below its 2019 level. Similarly, imports registered a maximum in November 2018, and despite improvements in May 2020, the yearly variation as of July 2020 was still -26.3% under its 2019 level.

Prices and employment, to round out description of the Mexican economy, also suffered the ravages of the pandemic. Prices, unlike during other periods of economic instability in Mexico, do not seem to be into an inflationary spiral; in fact, the inflation rate in July 2020 compared to the previous year was 3.6%, and the central bank expects it will hover around 3% for the next 12 to 24 months. Additionally, different job-related statistics also reveal an underutilization of the labor force. For example, IMSS-affiliated workers, who account for approximately 90% of the formal sector, suffered 1.3 million in job losses from the peak in November 2019 to July 2020. Similarly, the underemployment rate, an indicator of part-time employment, increased over twelve months from 7.6% to 20.1% in June 2020. In addition, the labor force participation rate showed a sharp decline in the first months of the social distancing policies, implying that 12 million people were dropped from the economy's active workforce thanks to COVID-19. Thus, the unemployment rate, people actively looking for a remunerated job, registered an annual increase of 1.32% in June 2020 to stand at 5.5%. the journal EconomíaUNAM dedicated its number 51 of volume 17 in its entirety to study the impacts in Mexico of the pandemic, covering a wide range of issues related mainly to health economics (Vanegas, 2020 , Kershenobich, 2020 , labor economics (Samaniego, 2020) , inequality (Alberro, 2020) , poverty (Fernández, 2020) and public policy (Sánchez, 2020 , Moreno-Brid, 2020 . None of these related to short-term forecasting of the economic activity.

The closest paper to ours is Meza (2020) , who projects the economic impact of COVID-19 for twelve variables, including IGAE, based on a Susceptible-Infectious-Recovered epidemic model and a novel method to handle a sequence of extreme observations when estimating a VAR model (Lenza and Primiceri, 2020) . To make the forecasts, Meza (2020) first estimates the shocks that hit the economy since March 2020, and then produce four forecasts considering a path for the pandemic or not, and if so then considers three scenarios. Opposite to this work, the forecast horizon focuses in the mid term, June 2020 to February 2023, rather than ours in the short term, one or two months ahead.

This section describes how we employ DFM to generate the nowcasts of the IGAE. First, we describe how LASSO regression is used as a variable selection method to select among various Google Trends topics. Then, we report how the stationary DFM shrinks the complete dataset in the 2SM strategy to obtain the estimated factor loadings and common factors and in the Onatski (2010) procedure to detect the number of common factors. Finally, we describe the nowcasting approach.

LASSO regression was introduced by Tibshirani (1996) as a new method of estimation in linear models by minimizing the residual sum of the squares (RSS) subject to the sum of the absolute value of the coefficients being less than a constant. In this sense, LASSO regression is related to ridge regression, but the former focuses on determining the tuning parameter, λ, that controls the regularization effect; consequently, we can have better predictions than ordinary least squares (OLS) in a variety of scenarios, depending on its choice.

Let W t = (w 1t , . . . , w Kt ) be a K×1 vector of stationary and standardized variables. Consider the following penalized RSS:

is a T × K matrix and c ≥ 0 is a tuning parameter that controls the shrinkage of the estimates.

In practice, this solution never sets coefficients to exactly zero; therefore, ridge regression cannot perform as a variable selection method in linear models, although its prediction ability is better than OLS. Tibshirani (1996) considers a penalty function as f (β) = K j=1 |β j | ≤ c; in this case, the solution of (4) is not closed, and it is obtained by convex optimization techniques. The LASSO solution has the following implications: i) when λ → 0, we obtain solutions similar to OLS, and ii) when λ → ∞, β LASSO λ → 0. Therefore, LASSO regression can perform as a variable selection method in linear models. Consequently, if λ is large, more coefficients tend to zero, selecting the variables that minimize the error prediction.

In macroeconomic applications, Aprigliano and Bencivelli (2013) use LASSO regression to select the relevant economic and financial variables in a large data set with the goal of estimating a new Italian coincident indicator.

We consider a stationary DFM where the observations, X t , are generated by the following process:

where X t = (x 1t , . . . , x N t ) and ε t = (ε 1t , . . . , ε N t ) are N × 1 vectors of the variables and idiosyncratic noises observed at time t. The common factors, F t = (F 1t , . . . , F rt ) , and the factor disturbances, η t = (η 1t , . . . , η rt ) , are r × 1 vectors, with r (r < N ) being the number of static common factors, which is assumed to be known. The N × 1 vector of idiosyncratic disturbances, a t , is distributed independently of the factor disturbances, η t , for all leads and lags, denoted by L, where LX t = X t−1 . Furthermore, η t and a t , are assumed to be Gaussian white noises with positive definite covariance matrices Σ η = diag(σ 2 η 1 , . . . , σ 2 ηr ) and Σ a , respectively.

where Φ and Γ are r × r and N × N matrices containing the VAR parameters of the factors and idiosyncratic components with k and s orders, respectively. For simplicity, we assume that the number of dynamic factors, r 1 , is equal to r.

Alternative representations in the stationary case are given by Doz et al. (2011 Doz et al. ( , 2012 , who assume that r can be different from r 1 . Additionally, when r = r 1 , Ng (2004), Choi (2017) , and Corona et al. (2020) also assume possible nonstationarity in the idiosyncratic noises. Barigozzi et al. (2016 Barigozzi et al. ( , 2017 assume possible nonstationarity in F t , ε t and r = r 1 .

The DFM in equations (2) to (4) is not identified. As we noted in the Introduction, the factor extraction used in this work is the 2SM; consequently, in the first step, we estimate the common factors by using PCs to solve the identification problem and uniquely define the factors;

we impose the restrictions P P/N = I r and F F being diagonal, where F = (F 1 , . . . , F T ) is r×T .

For a review of restrictions in the context of PC factor extraction, see Bai and Ng (2013) . Giannone et al. (2008) popularized the usage of 2SM factor extraction to estimate the common factors by using monthly information with the goal of generating the nowcasts of quarterly GDP.

However, Doz et al. (2011) proved the statistical consistency of the estimated common factor using 2SM. In the first step, PC factor extraction consistently estimates the static common factors without assuming any particular distribution, allowing weak serial and cross-sectional correlation in the idiosyncratic noises; see, for example, Bai (2003) . In the second step, we model the dynamics of the common factors via the Kalman smoother, allowing idiosyncratic heteroskedasticity, a situation that occurs frequently in practice. In a finite sample study, Corona et al. (2020) show that with the 2SM of Doz et al. (2011) based on PC and Kalman smoothing, we can obtain closer estimates of the common factors under several data generating processes that can occur in empirical analysis, such as heteroskedasticity and serial and cross-sectional correlation in idiosyncratic noises. Additionally, following Giannone et al. (2008) , this method is useful when the objective is nowcasting given the flexibility to estimate common factors when all variables are not updated at the same time.

The 2SM procedure is implemented according to the following steps:

1. SetP as √ N times the r largest eigenvalues of X X, where X = (X 1 , . . . , X T ) is a T × N matrix. By regressing X onP and using the identifiability restrictions, obtainF = XP /N andε = X −F P . Compute the asymptotic confidence intervals for both factor loadings and common factors as proposed by Bai (2003) .

2. Set the estimated covariance matrix of the idiosyncratic errors asΨ = diag Σ ε , where the diagonal ofΨ includes the variances of each variable of X; hence,σ 2 i for i = 1, . . . , N.

3. Estimate a VAR(k) model by OLS to the estimated common factors,F , and compute their estimated autoregressive coefficients as the VAR(1) model, denoted byΦ. Assuming that f 0 ∼ N (0, Σ f ), the unconditional covariance matrix of the factors can be estimated (2) to (4) in state-space form, and with the system matrices substituted byP ,Ψ,Φ,Σ η andΣ f , use the Kalman smoother to obtain an updated estimation of the factors denoted byF .

In practice, X t are not updated for all t; in these cases, we apply the Kalman smoother,

where Ω T is all the available information in the sample, and we take into account the following two cases:

Empirically, when specific data on X t are not available, Harvey and Phillips (1979) suggests using a diffuse value equal to 10 7 ; however, we use 10 32 according to the package nowcast of the R program, see de Valk et al. (2019).

To detect the estimated number of common factors, r, Onatski (2010) proposes a procedure when the proportion of the observed variance attributed to the factors is small relative to that attributed to the idiosyncratic term. This method determines a sharp threshold, δ, which consistently separates the bounded and diverging eigenvalues of the sample covariance matrix.

The author proposes the following algorithm to estimate δ and determine the number of factors:

1. Obtain and sort in descending order the N eigenvalues of the covariance matrix of observations, Σ X . Set j = r max + 1.

2. Obtain γ as the OLS estimator of the slope of a simple linear regression, with a constant of {λ j , . . . , λ j+4 } on (j − 1) 2/3 , . . . (j + 3) 2/3 , and set δ = 2| γ|.

3. Let r (N ) max be any slowly increasing sequence (in the sense that it is o(N )). If λ k − λ k+1 < δ, set r = 0; otherwise, set r = max{k ≤ r

4. With j = r + 1, repeat steps 2 and 3 until convergence.

This algorithm is known as edge distribution, and Onatski (2010) proves the consistency of r for any fixed δ > 0. Corona et al. (2017b) shows that this method works reasonably well in small samples. Two important features of this method are that the number of factors can be estimated without previously estimating the common components and that the common factors may be integrated.

In this subsection, we describe the nowcasting approach to estimate the annual percentage variation of IGAE, denoted by y * = (y 1 , . . . , y T * ), where T * = T − 2; hence, we focus on generating the nowcasts two steps ahead.

Currently, Google Trends topics, an up-to-date source of information that provides an index of 

Define the H g × K matrix, β = ( β 1 , . . . , β K ), where β j = ( β j,1 , . . . , β j,Hg ) is an H g × 1 vector.

5. Select the l significant variables that satisfy the condition β l = β l∈j |1 β > ϕ , where ϕ is the 1 − α sample quantile of 1 β with 1 being and vector 1 × H g of ones.

With this procedure, we select the topics that frequently reduce the prediction error -in sample -for the IGAE estimates during the last H g months. We estimate the optimum λ by using the glmnet package from the R program.

In our case, to predict y * , the time series X i = (x i1 , . . . , x iT * ) are transformed such that they satisfy the following condition:

Hence, we select the f (X i ) that maximizes the correlation between y. Consider f (·) as follows:

1. None (n)

Xt X t−1 × 100 − 100 3. Annual percentage variation (a):

Note that these transformations do not have the goal of achieving stationarity, although intrinsically these transformations are stationary transformations regardless of whether y * is stationary;

in fact, the transformations m and a tend to be stationary transformations when the time series are I(1), which is frequent in economics; see Corona et al. (2017b) . Otherwise, it is necessary that (f (X i ), y * ) are cointegrated. The implications of equations (2) to (4) are very important because it is necessary to stationarize the system in terms that, theoretically, although some common factor, F t , can be nonstationary, consistent estimates remain regardless of whether the idiosyncratic errors are stationary, see Bai (2004) . In this way, we use the PANIC test (Bai and Ng, 2004) to verify this assumption. Additionally, an alternative to estimate nonstationary common factors by using 2SM when the time series are I(1) is given by Corona et al. (2020) .

Having estimated the common factors as described in subsection 3.2.1 by using X * t for t = 1, . . . , T , we estimate a linear regression model with autoregressive moving average (ARMA) errors to generate the nowcasts

The parameters are estimated by maximum likelihood. Consequently, the nowcasts are obtained by the following expression: y T * +h = a + bF T * +h + u T * +h for h = 1, 2.

Note that Giannone et al. (2008) propose using the model with p = q = 0; hence, the nowcasts are obtained by using the expression (7). In our case, we estimate different models by the orders p = 0, . . . p max and q = 0, . . . q max ; thus ,the case of Giannone et al. (2008) is a particular case of this expression. Now, our interest is in selecting models with similar performance for training data. In this way, we carry out the following procedure:

1. Start with p = 0 and q = 0.

2. Estimate the nowcasts for T * + 1 and T * + 2, namely, y 0,0 = ( y T * +1 , y T * +2 ) .

3. Split the data for t = 1, . . . , T * − H t .

4. For h = 1 and for the sample of size T * − H t + h, estimate equation (6), generate the nowcasts with expression (7) one step ahead, and calculate the errors and absolute error (AE) as follows:

Repeat steps 3 and 4 until H t . Hence, estimate e 0,0 = (e 0,0 1 , . . . , e 0,0 H ) and AE 0,0 = (AE 0,0 1 , . . . , AE 0,0 Ht ). Additionally, we define the weighted AE (WAE) as W AE 0,0 = AE 0,0 Υ where Υ is a weighted H t ×1 matrix that penalizes the nowcasting errors such that Υ1 = 1.

6. Repeat steps for all combinations of p and q until p max and q max . Generate the following elements: y(p, q) = ( y 0,0 , y 1,0 , . . . , y pmax,qmax ), e(p, q) = (e 0,0 , e 1,0 , . . . , e pmax,qmax ),

where y is a 2 × (p max + 1)(q max + 1) matrix of nowcasts, e is an H t × (p max + 1)(q max + 1) matrix that contains the nowcast errors in the training data, and W AE is an H t × 1 vector of the weighted errors in the training data. 7. We select the best nowcast as a function of p and q, denoted by y(p * , q * ), where p * , q * are obtained as follows: p * , q * = argmin 0≤p,q≤pmax,qmax W AE(p, q)

8. To use models with similar performance, we combine the nowcasts of y(p * , q * ) with models with equal forecast errors according to Diebold and Mariano (1995) tests, by using the e(p, q), carrying out pairs of tests between the model with minimum AE(p, q) and the others. Consequently, from the models with statistically equal performance, we select the median of the nowcasts, namely, y.

This nowcasting approach allows the generation of nowcasts based on a trained process, taking advantage of the information of similar models. It is clear that b must be significant to exploit the relationship between the IGAE and the information summarized by the DFM. Note that Υ is a weighted matrix that penalizes the nowcasts errors. The most common form is Υ =

(1/H t , . . . , 1/H t ) , a H t ×1 matrix where all nowcasts errors have equal weight named in literature as mean absolute error (MAE). Therefore, we are not considering by default the traditional MAE, but rather a weighted (or equal) average of the individual AE. For example, we could have penalized with more weight the last nowcasts errors, that is, in the COVID-19 period.

Also, note that we can obtain AE(p, q) and estimate the median or some specific quantile for each vector of this matrix.

Note that despite root mean squared errors (RMSEs) are often used in the forecast literature, we prefer a weighted function of AEs, although in this work we use equal weights i.e., the MAE.

The main advantages of MAE over RMSE are in two ways: i) it is easy to interpret since it represents the average deviation without considering their direction, while the RMSE averages the squared errors and then we apply the root, which tends to inflate larger errors and ii) RMSE Boivin and Ng (2006) , in the context of DFM, we can reduce the forecast prediction error with selected variables by estimating the common components. Additionally, Poncela and Ruiz (2016) and Corona et al. (2020) show that with a relativity small sample size, for example, N = 12, we can accurately estimate a rotation of the common factors.

Consequently, given the timely and possibly contemporaneous correlation with respect to the y * , the features of the variables considered in this work are described in Annex 1. 3

Hence, we initialized with 68 time series divided into three blocks. The social media mobility index is calculated based on Twitter information. We select around 70,000 daily tweets georeferenced to the Mexican, each one is associated with a bounding box.

Then, movement data analysis is performed by identifying users and their sequence of daily tweets: a trip is considered for each pair of consecutive geo-tagged tweets found in different bounding boxes. The total number of trips per day is obtained and divided by the average number of users in the month. The number obtained can be interpreted as the average number of trips that tweeters make per day.

To select the relevant topics, we apply the methodology described in subsection 3.3.1 by using H g = 36 and α = 0.10; consequently, we select the topics that are relevant in 90% of cases in the training data. In this way, the significant topics are quarantine and facemask.

Once X is defined, we apply the transformations suggested by equation (5) to define X * . Figure 1 shows each X * i ordered according to its correlation with y * . Figure 1 : Blue indicates the specific X * i , and red indicates the specified y * . Numbers in parentheses indicate the linear correlation and those between brackets the transformation.

We can see the behavior of each variable, and industrial production is the variable with the most correlated time series with the IGAE, followed by imports and industrial production in the according to their historical values. The first quantile (ϕ(X * i ) < 0.25) is in red, the second quantile (0.25 < ϕ(X * i ) < 0.50) is in orange, the third quantile (0.50 < ϕ(X * i ) < 0.75) is in yellow, and finally, the fourth quantile (0.75 < ϕ(X * i )) is green. Gray indicates that information is not available.

We can see that during the 2009 financial crisis, the variables are mainly red, including the Google Trends variables, which is reasonable because the AH1N1 pandemic also occurred during March and April of 2009. Additionally, during 2016, some variables related to the international market were red, for example, the US industrial production index, the exchange rate and the S&P 500. Note that since 2019, all variables are orange or red, denoting the weakening of the economy. Consequently, it is unsurprising that the estimated common factor summarizes these dynamics. Note that this graph has only a descriptive objective. It cannot be employed to generate recommendations for policy making because that some variables may be nonstationary.

The nowcasts depend on the dates of the information released. Depending on the day of the current month, we can obtain nowcasts with a larger or smaller percentage of updated variables.

For example, it is clear that the high-frequency variables are available in real time, but the traditional and monthly time series, with are timely with respect to the IGAE, are available on different dates according to the official release dates. Figure 3 shows the approximate day when the information is released for T * + 2 after the current month T * . Figure 3 : Percentage of updated information to carry out the nowcasts T * + 2 once the current month T * is closed.

We can see that traditional and nontraditional high-frequency variables, business confidence and fuel demand, can be obtained on the day after the month T * is closed. This indicates that on the first day of month T * + 1, we can generate the nowcasts to T * + 2 with approximately 50% of the updated information and 81% for the current month, T * + 1. Note that on day 12, the IMSS variable is updated, and on day 16, the IPI USA is updated. These variables are highly correlated with y with linear correlations of 0.77 and 0.80, respectively. Consequently, in official statistics, we recommend conducting the nowcasts on the first day of T * + 1 and 16 days after, updating the nowcasts with two timely traditional and important time series, taking into account the timely estimates but with relevant variables updated. 4

In this work, the update of the database is August 13, 2020; consequently, we generate the 5 Nowcasting results 5.1 Estimating the common factors and the loading weights By applying the Onatski (2010) procedure to the covariance matrix of X * , we can conclude that r = 1 is adequate to define the number of common factors. Hence, the estimated static common factor obtained by PCs by using the set of variables, X * , their confidence intervals at 95%, and the dynamic factor estimates by applying the 2SM procedure with k = 1 lags, are presented in We observe the common factors summarizing the previous elements representing the decline in the economy in 2009 and 2020. Note that in the last period, the dynamic common factor shows a slight recovery of the economy because this common factor supplies more timely information than the static common factor. Thus, the static common factor has information until May 2020, while the dynamic factor has information until July 2020. Note that the confidence intervals are closed with respect to the static common factor, which implies that the uncertainty attributed to the estimation is well modeled. It is important to analyze the contemporaneous correlation with respect to IGAE. Thus, Figure 5 shows the correlation coefficient ofF t with y * since 2008. Having estimated the dynamic factor by the 2SM approach, we show the results of the loading weight estimates that capture the specific contribution of the common factor to each variable, or in other words, given the PC restrictions, they can be seen as N times the contribution of each variable in the common factor. We compute the confidence interval at 95% denoted by CIP ,0.05 . Once the dynamic factor is estimated by using the Kalman smoother, it is necessary to reestimate the factor loadings to haveP = f (F ), such thatF = g(P ). To do so, we use Monte Carlo estimation iterating 1,000 samples and select the replication that best satisfies the following condition:F ≈ XP /N s.tP ∈ CIP ,0.05 .

The results of the estimated factor loadings are shown in Figure 6 . The loadings are ordered from the most positive contribution to the most negative. Figure 6 : Factor loadings. The blue point is eachP i with its respective 95% confidence interval.

Red curves are theP i .

We observe several similarities with respect to Figure 1 . Note that the more important variables in the factor estimates are the industrial production of Mexico and the U.S., exports and imports along with Google Trends topics such as quarantine and facemask, which makes sense in the COVID-19 period. Obviously, when these variables are updated, it will be more important to update the nowcasts. In this way, note that Google Trends are available in real time.

Other timely variables, such as IMO, CONF MANUF, GAS, S&P 500, MOBILITY and E, are also very relevant. However, note that all variables are significant in all cases, and the confidence interval does not contain zero. The less important variables are M4, the business confidence of the construction sector and remittances. Also, note that the most relevant variables are very timely with respect to the IGAE: the industrial production index of Mexico and the U.S. are updated around days 10 and 16 for T * + 1 and T * + 2, respectively, once closed the current month; furthermore, the exports and imports are updated for T * + 2 by 25th day, while IMO and IMSS are updated since the first day and 12th day, respectively for T * + 2. Consequently, this allows us to have more accurate and correlated estimates since the first day of the current month for both, T * + 1 and T * + 2.

As we have previously noted, to obtain a consistent estimation ofF andP it is necessary thatε be stationary. We check this point with the PANIC test of Bai and Ng (2004) , concluding that we achieved stationarity in the idiosyncratic component, obtaining a statistic of 6.6 that generates a p-value of 0.00; hence,ε does not have a unit root. Additionally, we can verify with the augmented Dickey-Fuller test thatF is stationary with a p-value of 0.026; consequently, we also achieved stationarity in X * .

We apply the procedure described in subsection 3.3.3 by using a Υ = (1/H t , 1/H t , . . . , 1/H t ) ; then, we assume that each AE has equal weight over time in step 5. Additionally, we fix p max = q max = 4. The obtained results indicate that the optimums p * and q * are selected to be equal to 4. Consequently, the best model is the following: We can see that the nowcast model performs well given that in 92% of cases, the observed values are within the confidence interval at 95%. The MAE (equal weights in Υ) is 0.65, and the mean absolute annual growth of IGAE is 2.55%. Regarding the median of the AEs, the estimated value is 0.36. These statistics are very competitive with respect to the model estimated by Statistics Netherlands, see Kuiper and Pijpers (2020) . They also estimate common factors to generate the nowcasts of the annual variation of quarterly Netherlands GDP. According to In order to contrast the results of our approach with those obtained by other procedures, we consider the following two alternative models:

• Naive model: We assume that all variables have equal weights in the factor, consequently, we standardize the variables used in the DFM, X * t , and by averaging their rows, we obtain a F * t . Then, we use this naive factor in a linear regression in order to obtain the nowcasts by the last H t = 36 months.

• DFM without nontraditional information: We estimate a traditional DFM similar to Corona et al. (2017a) or Gálvez-Soriano (2020), but using only economic and financial time series, i.e. without considering the social mobility index and the relevant topics extracted from Google Trends. Hence, we carry out the last H t = 36 nowcasts. We can see that, in training data the named naive model is the one with the weakest performance, followed by traditional DFM. Specifically, the MAE is 1.02 for the naive model, 0.74 when using DFM without nontraditional information and, as we have commented, 0.65 for the incumbent model, which includes this type of information. Note that the use of nontraditional information does not affects the behaviour of the MAEs previous to COVID-19 pandemic and reduces the error during this period. Consequently, the performance of the suggested approach is highly competitive when compared with i) similar models for nowcasting of GDP, ii) models that estimate the levels of the objective variable and iii) alternative models that can be used in practice.

Having verified our approach in the previous section as highly competitive to capture the real observed values, the final nowcasts for the IGAE annual percentage variation for June and July 2020 are shown in Figure 9 . These are obtained after combining the statistically equal models to the best model with the approach previously described and the traditional nowcasting model of Giannone et al. (2008) , weighting both nowcasts according to their MAEs. 6 We expect a slight recovery of the economy in June and July 2020, obtaining nowcasts of -15.2% and -13.2%, respectively, with confidence intervals of (-16.3, -14.1) and (-14.1, -12.4 ) for both months. Considering the observed values for June, released on August 25 by INEGI, the annual percentage change for the IGAE was -14.5%; consequently, the model is very accurate since the deviation from the real value was 0.7% and falls within the confidence interval.

The procedure described in the previous subsection allows to generate nowcasts using databases with different cut dates. In this way, we carry out the procedure updating the databases twice a month during the COVID-19 period. Table 1 summarizes the nowcasts results, comparing them with the observed values. We can see that in June 4, 2020, the nowcasts were very accurate, capturing the drastic drop occurred in April (previous month was -2.5%) and May, with absolute discrepancies of 1.4 and 1.2% respectively. The update of June 18, 2020 shows a slight accuracy improvement. The following two nowcasts generate also closes estimates with respect to the observed value of May, being the more accurate, the updated carried out in July 7, 2020. Note the the last updates generate nowcasts by June around -16.6 and -15.2%, being the more accurate the last nowcasts described in this work, with an absolute error of 0.7%. Considering these results, our approach anticipates the drop attributed to the COVID-19 and foresees and slight recovery since June, although it is also weak. According to Gálvez-Soriano (2020) , the IGAE's accurate and timely estimates can drastically improve the nowcasts of the quarterly GDP; consequently, the benefits of our approach are also related to quarterly time series nowcast models.

In this paper, we contribute to the nowcasting literature by focusing on the two step-ahead of the annual percentage variation of IGAE, the equivalently of the Mexican monthly GDP, during COVID-19 times. For this purpose, we use statistical and econometric tools to obtain accurate and timely estimates, even, around 50 days before that the official data. The suggested approach consists in using LASSO regression to select the relevant topics that affect the IGAE in the short term, build a correlated and timely database to exploit the correlation among the variables and the IGAE, estimate a dynamic factor by using the 2SM approach, training a linear regression with ARMA errors to select the better models and generate current nowcasts.

We highlight the following key results. We can see that our approach is highly competitive considering other models as naive regressions or traditional DFM, our procedure frequently captures the observed value, both, in data test and in real time, obtaining absolute errors between 0.2% and 1.4% during the COVID-19 period. Another contribution of this paper lies in a statistical point of view, given that we compute the confidence interval of the factor loadings and the factor estimates, verifying the significance of the factor on each variable and the uncertainty attributed to the factor estimates. Additionally, we consider some econometric issues to guarantee the consistency of estimates like stationarity in idiosyncratic noises and uncorrelated errors in nowcasting models. Additionally, it is of interest to denote in-sample performance whether the nowcast error increases when using monthly versus quarterly data.

Future research topics emerged when doing this research. One is the implementation of an algorithm to allow to estimate nonstationary common factors and making the selection to the number of factors flexible, such as the one developed in Corona et al. (2020) , to minimize a measure of nowcasting errors. Another interesting research line is to incorporate machine learning techniques to automatically select the possible relevant topics from Google Trends. Also, it would be interesting to incorporate IPI information as restrictions to the nowcasts, by exploring some techniques to incorporate nowcasts restrictions when official countable information is available. Finally, for future research in this area, its worth to deep into the effects of monthly timely estimate variables versus quarterly time series in nowcasting models, this can be achieved by Monte Carlo analysis with different data generating process which can occur in practice to compare the increase in the error estimation when distinct frequencies of time series are used.

La pandemia que perjudica a casi todos, pero no por igual/The pandemic that harms almost everyone, but not equally

The impact of COVID-19 on the US child care market: Evidence from stay-at-home orders

Ita-coin: a new coincident indicator for the Italian economy. Banca D'Italia. Working papers

Inferential theory for factor models of large dimensions

Estimating cross-section common stochastic trends in nonstationary panel data

A PANIC attack on unit roots and cointegration

Large dimensional factor analysis

Principal components estimation and identification of static factors

Econometric analysis of large factor models

Non-Stationary Dynamic Factor Models for Large Datasets. Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board

Dynamic factor models, cointegration, and error correction mechanisms

Are more data always better for factor analysis

Handbook of Research Methods and Applications in Empirical Macroeconomics

Dynamic factor models

Big data econometrics: Now casting and early estimates

How Has Labor Demand Been Affected by the COVID-19 Pandemic? Evidence from Job Ads in Mexico

Googling unemployment during the pandemic: Inference and nowcast using search data

Nowcasting with the help of foreign indicators: The case of Mexico

Efficient estimation of nonstationary factor models

A dynamic factor model for the Mexican economy: are common trends useful when predicting economic activity?

Determining the number of factors after stationary univariate transformations

Estimating Non-stationary Common Factors: Implications for Risk Sharing

Nowcasting: An R Package for Predicting Economic Variables Using Dynamic Factor Models

Comparing predictive accuracy

A two-step estimator for large approximate dynamic factor models based on Kalman filtering

A quasi maximum likelihood approach for large, approximate dynamic factor models

La pandemia del Covid-19: los sistemas y la seguridad alimentaria en América Latina/Covid-19 pandemic: systems and food security in Latin America

Nowcasting Mexico's quarterly GDP using factor models and bridge equations

The dynamic factor analysis of economic time series

Nowcasting: The real-time informational content of macroeconomic data

Predicting Initial Unemployment Insurance Claims Using Google Trends

Rapid Estimates of Mexico's Quarterly GDP

Maximum Likelihood Estimation of Regression Models With Autoregressive-Moving Averages Disturbances

Fortalezas, deficiencias y respuestas del sistema nacional de salud frente a la Pandemia del Covid-19/Strengths, weaknesses and responses of the national health system to the Covid-19 Pandemic

Nowcasting GDP growth rate: a potential substitute for the current flash estimate

How to Estimate a VAR after

The Impact

Lockdowns and Expanded Social Assistance on Inequality, Poverty and Mobility in Argentina, Brazil, Colombia and Mexico

Forecasting the impact of the COVID-19 shock on the Mexican economy

Pandemia, política pública y panorama de la economía mexicana en 2020/Pandemic, public policy and the outlook for the Mexican economy in 2020

Determining the number of factors from empirical distribution of eigenvalues

Small versus big data factor extraction in Dynamic Factor Models: An empirical assessment in dynamic factor models

El Covid-19 y el desplome del empleo en México/The Covid-19 and the Collapse of Employment in Mexico

México en la pandemia: atrapado en la disyuntiva salud vs economía/Mexico in the pandemic: caught in the disjunctive health vs economy

Business cycle modeling without pretending to have too much a priory economic theory

A hands-on guide to Google data

Dynamic factor models

Regression shrinkage and Selection via the Lasso

Los desafíos del sistema de salud en México/The health system challenges in Mexico

Big data: New tricks for econometrics

Calderón Calderón online search index Google

Cártel Cártel online search index Google

Casa Blanca Casa Blanca online search index Google

Chapo Chapo online search index Google

China China online search index Google

Coronavirus Coronavirus online search index Google

Corrupción Corrupción online search index Google

Crisis económica Crisis económica online search index Google

Crisis sanitaria Crisis sanitaria online search index Google

Cuarentena Cuarentena online search index Google

Cubrebocas Cubrebocas online search index Google

Desempleo Desempleo online search index Google

Dólar Dólar online search index Google

Elecciones Elecciones online search index Google

EPN EPN online search index Google

Gasolina Gasolina online search index Google

Homicidios Homicidios online search index Google

Inflación Inflación online search index Google

Inseguridad Inseguridad online search index Google

Mascarilla N95 Mascarilla N95 online search index Google

Medidas económicas Medidas económicas online search index Google

Migración Migración online search index Google

Migrantes Migrantes online search index Google

Morena Morena online search index Google

Muertos Muertos online search index Google

Muro Muro online search index Google

Pandemia Pandemia online search index Google

PEMEX PEMEX online search index Google

Peso Peso online search index Google

Petróleo Petróleo online search index Google

PRI PRI online search index Google

Recesión Recesión online search index Google

Reformas Reformas online search index Google

Salario Salario online search index Google

Sismo Sismo online search index Google

Tipo de cambio Tipo de cambio online search index Google

Trump Trump online search index Google

Violencia Violencia online search index Google

The authors thankfully acknowledge the comments and suggestions carried out by the authorities