key: cord-0544898-ibwwt8uq authors: Monmousseau, Philippe; Marzuoli, Aude; Feron, Eric; Delahaye, Daniel title: Putting the Air Transportation System to sleep: a passenger perspective measured by passenger-generated data date: 2020-04-29 journal: nan DOI: nan sha: 3b93e2cb4b2cc22b16c879cde56fb7b1c33c67dd doc_id: 544898 cord_uid: ibwwt8uq This paper aims at analyzing the effect on the US air transportation system of the travel restriction measures implemented during the COVID-19 pandemic from a passenger perspective. Flight centric data are not already publicly and widely available therefore the traditional metrics used to measure the state of this system are not yet available. Seven metrics based on three different passenger-generated datasets are proposed here. They aim to measure in close to real-time how the travel restriction measures impacted the relation between major stakeholders of the US air transportation system, namely passengers, airports and airlines. States regarding international and domestic air transportation (as of April 27 th 2020), so there are no unified means of measuring this impact. Most traditional metrics to measure the state of the air transportation system are centered on the performance of flights in terms of delay, cancellation and number of passengers transported using data gathered by the Bureau of Transportation Statistics (BTS) [4] . This data has first to be provided by airlines and airports to the BTS before being published as a monthly report. The usual latency is of two months for on-time flight data to be published by BTS. This frequency is not adapted to the monitoring of situations such as the COVID-19 pandemic. This paper proposes to take an alternative approach, and to consider data generated by the core of the business of the air transportation system: passengers. Passengers generate various sort of data throughout their journey, as well as before and after their flight should have taken place. Some of these passenger-generated data are publicly available in close-to real-time and could be used in an aggregated and anonymized fashion to assess the state of the air transportation system. Three such data sources are considered in this paper in order to build passenger-centric metrics: data passively emitted by passenger passports at immigration, data passively emitted by their phones and data actively emitted within social media. The rest of this paper is structured as follows: Section II first presents an overview of what has previously been done from a passenger perspective to monitor the state of the air transportation system. Section III analyses the impact of the COVID-19 pandemic on airports from a passenger perspective. Section IV then focuses on the impact on airlines and Section V summarizes the passenger-centric metrics proposed and discusses future research direction. A passenger approach to analyzing flight delays was first introduced by Bratu and Barnhart [5] who developed a Passenger Delay Calculator to show that flight-centric metrics do not accurately reflect passenger delays, especially due to flight cancellations. Later in [6] they calculated passenger delay using monthly data from a major airline operating a hub-and-spoke network. They show that disrupted passengers, whose journey was interrupted by a capacity reduction, are only 3% of the total passengers, but suffer 39% of the total passenger delay. Wang et al. in [7, 8] showed that high passenger trip delays are disproportionately generated by canceled flights and missed connections. 9 of the busiest 35 airports cause 50% of total passenger trip delays. Congestion, flight delay, load factor, flight cancellation time and airline cooperation policy are the most significant factors affecting total passenger trip delay. These studies, based on BTS or airline data, have highlighted the disproportionate impact of airside disruptions on passenger door-to-door journeys, already showing that traditional flight-centric metrics do not capture the full picture. This led NextGen [9] in the United States and ACARE Flightpath 2050 [10] to advocate a shift from flight-centric metrics to passenger-centric metrics to evaluate the performance of the Air Transportation System. Both the USA and Europe aim to take a more passenger-centric approach, with ACARE Flightpath 2050 setting some ambitious goals, including some that are not measurable yet due to lack of available data. In the US, the Joint Planning and Development Office has proposed and tested metrics regarding NextGen's goals, but there are still metrics missing from the passenger's viewpoint, especially regarding door-to-door travel times and passenger handling [11] . The shift from flight-centric information to passenger-centric metrics was then explored by Cook et al. [12] within the project POEM -Passenger Oriented Enhanced Metrics, where they designed propagation-centric and passenger-oriented performance metrics using both complexity and data science approaches. They simulated air transportation networks and analyzed their resilience from a flight-centric perspective and from a passenger-oriented perspective, highlighting the need for the implementation of passenger centric metrics. Taking the passenger objectives during decision making was then proposed within the concept of Multimodal, Efficient Transportation in Airports and Collaborative Decision Making (META-CDM) by Laplace et al. [13] . This concept proposes to link both airside CDM and landside CDM, taking into account the passenger perspective. Within this framework, Kim et al. [14] proposed to improve airport gate scheduling by implementing a decision model that balances aircraft, operator and passenger objectives. Dray et al. [15] highlighted the need of taking a multi-modal approach, which is passenger centered, when handling major disturbances of the air transportation system in order to offer better solutions to passengers. Taking a multi-modal approach implies having access to different source of data and being able to link them together. Data generated by passengers throughout there trip are diverse and scattered across different sensors. Airports gather customs or security records, shuttle traffic, parking occupancy, sometimes measure queue lengths, while third-parties collect online traces through WiFi hotspots and Bluetooth beacons [16] . These real-time information, combined with historical data, were used to analyze and predict passenger flow to an Australian immigration booth [17] or within several Dutch train stations [18] as well as for the analysis and prediction of passenger occupancy in a Chinese airport [19] . These studies are limited to a limited part of the full system (one or two airport terminals) indicating the difficulty of gathering a system-wide data-driven picture of passenger behavior. Considering passengers as sensors was made easier with the increase in the use of smartphones. Marzuoli et al. was able in [20] and [21] to use mobile phone data in order to analyze the performances of US airports from a passenger perspective. These studies were a first validation that passenger-centric data can be used to have a view of the overall health of the Air Transportation System that is complementary to the traditional flight-centric approach. A major weather perturbation impact on passenger experience in airports was studied using this same approach and the study was complemented by an additional passenger generated data source, i.e. social media, in [21] . In Europe, a similar approach was conducted within the BigData4ATM project * by Garcia-Albertos et al. [22] who were able to measure door-to-door travel times of air passengers between two Spanish cities, Madrid and Barcelona, thanks to mobile phone data. However mobile phone data is proprietary data and is not often publicly available for research. In the special case of research about the COVID-19 pandemic, SafeGraph [23] gave access to an aggregated version of their database, consisting of various sets of data generated by mobile phone users. One important source of user-generated data regularly used to study large-scale behaviors with the advantage of real time availability is social media, and Twitter † more specifically. With more than 68 millions active users in the United States [24] , Twitter is an important pool of user-created data. Its real-time availability already led Twitter to be the main focus of multiple studies of large scale events, with several works by Palen et al. on how to help emergency responders during US natural disasters [25] [26] [27] . In Europe, Terpstra et al. also studied how a real time Twitter analysis could provide valuable information for the operational response of a natural disaster crisis management with the case of the storm hitting a festival in Belgium [28] . Regarding the air transportation field, most works mining Twitter data focus on airline sentiment analysis, with Breen [29] explaining how to mine Twitter textual data and create sentiment classifiers or Wang et al. [30] proposing an improved airline sentiment classifying method. These works focused essentially on improving the available methods for sentiment analysis without proposing any direct use of their results to improve airline service or passenger satisfaction. Monmousseau et al. in [31] used publicly available social media data created by passengers to accurately estimate and predict the hourly aggregated status of the US air transportation system. This method was further improved in [32] to reliably estimate the hourly delays at departure and at arrival per airport. This section leverages two different user-generated datasets to analyze what was the effect on US airports of the implemented travel restrictions presented in Section I.A from a passenger and visitor perspective. Travel restrictions do not ban entirely international travel, and there are still passengers arriving at most US airports of entry after the implementation of these travel restrictions. However, starting March 13 th 2020, US citizens who have been in high risk areas and are returning to the United States have to arrive by one the thirteen following airports of entry: [1] • ATL: Hartsfield-Jackson Atlanta International Airport The effect of these travel restrictions on international travel coming to the US can be studied thanks to the "Airport Wait Times" data from the Customs and Border Protection (CBP) website [33] . This data are aggregated at an hourly level and are usually available on the following day they are generated. The readiness of the data is due to the fact that CBP measures directly the signal emitted by passengers, the signal here being emitted through passports once passengers clear the immigration process, and does not have to wait for an airline or airport to process and provide the data. Among other information, the dataset contains the number of passengers arriving at immigration per hour, the average wait time at immigration per hour, and the number of open immigration booths per hour. For a more detailed presentation of the available dataset, the authors recommend the reading of [34] , which also proposes an analysis of these wait times from January 2013 to January 2019. The data considered here ranges from January 1 st 2020 to April 22 nd 2020. Looking first at the evolution of the total number of passengers arriving at US immigration booths per day across all airports, this number of passengers drops from an average of 218.7 thousands passengers per day between February 23 rd 2020 and March 15 th 2020 to an average of only 5.0 thousands passengers per day between April 1 st 2020 and April 22 nd 2020. This represents a drop of 97.7% in two weeks. The day by day evolution of the total number of passengers arriving at US immigration from March 1 st to April 22 nd is shown in Figure 1 . This figure also indicates for each airport with no CBP immigration data available on April 22 nd the last date where immigration data are available. This corresponds to 22 airports. Only Raleigh-Durham International Airport (RDU) closed its immigration service between the US ban of EU travellers and before the US entered a Level 4 travel advisory. It is to be noted that John Wayne Airport (SNA) has no immigration data since January 5 th 2020. Starting March 22 nd 2020, the number of airports not generating any immigration data steadily increases with nine airports shutting down their immigration services in ten days. Another nine airports then stop generating immigration data in ten days starting April 12 th 2020. Figure 1 indicates that BOS does not have any immigration data on April 22 nd 2020 even though it is one of the selected airport of entry for US citizens coming from high-risk areas. This illustrates the fact that the influx of international passengers is so low that they had no arriving international passenger to their immigration service for at least one day. From a domestic perspective, and thanks to SafeGraph's willingness to provide aggregated data for research on how to better understand and better handle the COVID-19 pandemic, weekly patterns at specific points of interest (POI) are available ‡ . From these patterns, it is possible to have an estimate of the number of airport visitors per hour by considering all available POI associated with an airport. Airport visitors are a broader category than air passengers, since this category also encompasses airport staff and people dropping off or picking up passengers. The data available for this study ranges from February 27 th 2020 to April 18 th 2020. Looking first at the evolution of the total number of airport visitors per day across all airports, this number of passengers drops from an average of 176.8 thousands passengers per day between February 27 th 2020 and March 15 th 2020 to an average of only 20.2 thousands passengers per day between April 1 st 2020 and April 18 th 2020. This represents a drop of 88.6% in two weeks. The day by day evolution of the total number of airport visitors from March 1 st to April 18 th is shown in Figure 2 . Similarly to Figure 1 , this figure also indicates for each airport with no CBP immigration data available on April 22 nd the last date where immigration data are available. From Figure 2 , it is clear that US domestic travel was already impacted before the rise to a Level 4 travel advisory: The number of airport visitors contained within the SafeGraph data drops from 152.4 thousands on March 11 th 2020 down to 71.6 thousands on March 19 th 2020, which represents a 53% drop. ‡ https://docs.safegraph.com/docs/weekly-patterns Figure 1 showed that all US airports of entry are impacted by these lockdown and travel ban measures, and the next step is now to look into this impact at an airport level. Figure 3 compares the individual airport situation of the first two weeks of April 2020 with the first two weeks of April 2019. Figure 3 (a) shows the boxplots of the number of passengers arriving at immigration per day for each airport over the period of April 1 st -22 nd 2019. The median number of arriving passengers is indicated in green and each box lower and upper bounds represent respectively the 1 st and 3 rd quartile. The whiskers above and below each box give a visualization of the full range of the considered data even though extreme values are not drawn. The airports are ordered by their median daily number of passengers arriving at immigration over that period. Figure 3 between the median number of passengers arriving at immigration of these two periods. For all the considered airports, the corresponding drop is between 70.7% for Sacramento International Airport (SMF) and 100% for the eleven airports without no immigration data between April 1 st 2020 and April 22 nd 2020. Looking at the airport ranking based on the median number of passengers arriving at immigration per day over the period of April 1 st -22 nd , Figure 3 (b) shows that it has been reshuffled from year 2019 to year 2020: JFK dropped to the sixth place and IAD climbed to the first place right behind LAX. IAD has however the highest average number of passengers arriving at immigration per day over the period of April 1 st -22 nd 2020 with 726 passengers a day on average, LAX being second with 658 passengers a day on average. JFK was the airport with the highest number of passengers arriving at immigration per year since 2013 [33] , this makes the drop from first place to sixth place is all the more impressive. A similar comparison of the number of airport visitors before and after the travel restriction measures can be conducted based on the SafeGraph data. Due to data availability, this comparison has to take place between March 2020 and April 2020. Figure 4 shows the boxplots of the number of airport visitors per day for 40 US airport with available SafeGraph data over the first two weeks of March 2020 (Figure 4 (a)) and April 2020 (Figure 4(a) ). The airports on these two plots are sorted by their median daily number of airport visitor over the period of March 1 st -15 th 2020. Please note that the y-axis are not the same between Figure 4 (a) and Figure 4 As for the number of passengers arriving at immigration, the airport with the highest median daily number of airport visitors is the airport with the most important drop in volume. Regarding the median daily number of airport visitors, ATL has a drop of 11.1 thousand airport visitors in the SafeGraph data between these two weeks, which represents a 89.3% drop. Unlike for immigration, no airport stops completely of receiving visitors, though the drop is important for all 40 considered airports, ranging from 72.5% for Fresno Yosemite International Airport (FAT) to 93.8% for IAD. With 35 airports having a drop in the median number of passengers arriving at immigration greater than 90%, all airports are severely impacted by the COVID-19 measures from a passenger-volume perspective. A question to be asked is: Since there are far fewer passengers arriving at immigration, does the immigration process go faster? The number of agents operating immigration booths has also decreased due to the corona virus, but it is possible to consider an immigration load factor. This load factor indicates the load in terms of passengers for each immigration booth per hour. A lower load indicates that each immigration booth has fewer passengers to process per hour. From a passenger perspective, a lower load for a given number of passengers, indicates that there are more immigration booths open, so the average processing time should be lower and thus a passenger at immigration would have to wait less to be processed. Based on this reasoning, an immigration quality score is proposed: It measures how well Assumption 1 is verified for an airport immigration service over a selected period of days. Proposed passenger-centric metric 1 The immigration quality score for an airport of entry is defined as the correlation between the daily average wait time for passengers at its immigration service and the daily average immigration load factor of the airport over a given period. This immigration quality score is equal to 1 if Assumption 1 is perfectly verified, to 0 if the daily average wait time for passengers at immigration is uncorrelated with the daily average immigration load factor and to -1 if the opposite of Assumption 1 occurs over the considered period, i.e. a decrease in the daily average immigration load factor implies an increase in the daily average wait time for passengers at immigration. This proposed passenger-centric metric is applied to the period pre-COVID (January 1 st 2020 to February 29 th 2020) and to the period post-COVID (March 1 st 2020 to April 22 nd 2020) for 40 US airports of entry. Table 1 shows the associated partial ranking (top ten best airports and top 10 worst airports) for these two periods. Table 1 Airport partial ranking based on the Proposed passenger-centric metric 1 (immigration quality score) applied to the period of pre-COVID of January 1 st 2020 to February 29 th 2020 and to the period post-COVID of March 1 st 2020 to April 22 nd 2020 for the 40 considered US airports of entry. Top ten worst airports Pre-COVID Post-COVID Airport Score Airport Score Airport Score Airport Score The airport still generating immigration data on April 24 th with the worst drop between the period pre-COVID and the period post-COVID is IAD, going from 17 th down to 36 th , and the airport still generating immigration data on April 24 th with the best increase in rank is JFK, with 30 places gained and with an increase in score from the negative value of -0.32 to the positive value of +0.81. With 38 airports having a drop in the median number of airport visitors greater than 80%, all airports are also severely impacted by the COVID-19 travel restrictions measures from a visitor-volume perspective. Visitors in general avoid airports, but some are still going to the airports after the travel restriction measures. The same question as for the immigration process can be asked: Are these visitors processed faster since there are less visitors? The data for visitors available for this is different than the data available for passengers arriving at immigration, therefore a different approach has to be considered here. The SafeGraph data contains weekly bucketed dwell times for each considered location. The dwell time is the time spent at that location, be it waiting, shopping, walking, etc. The buckets are: less than 5 minutes, between 5 and 20 minutes, between 21 and 60 minutes, between 61 minutes and 240 minutes and more than 240 minutes. From these weekly bucketed dwell times, two complementary passenger-metrics are proposed to measure an airport efficiency to process visitors. Proposed passenger-centric metric 2 The weekly airport visitor efficiency score for an airport is defined as the weekly proportion of airport visitors spending less than 60 minutes at an airport. Proposed passenger-centric metric 3 The weekly airport visitor slugginess score for an airport is defined as the weekly proportion of airport visitors spending more than 240 minutes at an airport. The time limits within these two metrics are also chosen due to the format of the data, and could be adjusted to less aggregated data. The idea behind the airport visitor efficiency score is to incentivize airports to keep the flow of people coming in and out of their airports as fast as possible. The time limit of 60 minutes concerns essentially visitors dropping off or picking up a passenger, and hopefully some passengers on domestic flights, where the overall security screening process is faster than for international flights. Most airlines and airports recommend their passengers on international flights to arrive two to three hours ahead of their flight's scheduled departure time, therefore the idea behind the airport visitor slugginess score is to measure the validity of this recommendation. Airport staff can be counted as airport visitors using this dataset and they are likely to stay more than 240 minutes at the airport, increasing the number of airport visitors staying longer than this threshold. Therefore, an airport with a high airport visitor slugginess score could either be an airport with many passengers taking more than four hours to clear their entire airport process, or an airport with a disproportionate number of airport staff compared to the number of airport visitors. Since there are several locations per airport within the SafeGraph data, e.g. "LAX Terminal 4" and "LAX Terminal South" for LAX, an estimation of the proposed airport visitor efficiency score is calculated by taking the minimum weekly proportion of airport visitors spending less than 60 minutes at a location within an airport over all considered airport locations. Similarly, an estimation of the proposed airport visitor slugginess score is calculated by taking the maximum weekly proportion of airport visitors spending more than 240 minutes at a location within an airport over all considered airport locations. These proposed passenger-centric metrics are applied to the period pre-COVID (March 1 st 2020 to March 15 th 2020) and to the period post-COVID (April 5 th 2020 to April 19 th 2020) for 44 US airports. These periods contain 2 weeks each and therefore 2 points of data each. The scores are calculated for each week and then averaged over the period. Table 2 shows the partial ranking (top ten best airports and top 10 worst airports) associated to the proposed passenger-metric 2 for these two periods. Table 2 Airport partial ranking using the proposed metric based on the proportion of airport visitors staying less than 60 minutes applied to the period of pre-COVID of March 1 st 2020 to March 15 th 2020 and to the period post-COVID of April 5 th 2020 to April 19 th 2020 for the 44 considered US airports based on SafeGraph data. Top ten worst airports Pre-COVID Post-COVID Airport Score Airport Score Airport Score Airport Score In Table 2 , a score of 1 indicates that all airport visitors within the SafeGraph data spend less than one hour at the same location within the airport, while a score of 0 indicates that all airport visitors within the SafeGraph data spend more than one hour at the same location within the airport. Some airports have a score of 0 due to locations receiving very few visitors (less than 5) over the considered week that were captured within the SafeGraph data, and all those visitors stayed more than one hour at that same airport location. Table 3 shows the partial ranking (top ten best airports and top 10 worst airports) associated to the proposed passenger-metric 3 for the same two considered periods. In Table 3 , a score of 0 indicates that all airport visitors within the SafeGraph data spend less than four hours at the same location within the airport, while a score of 1 indicates that all airport visitors within the SafeGraph data spend Table 3 Airport partial ranking using the proposed metric based on the proportion of airport visitors staying more than 240 minutes applied to the period of pre-COVID of March 1 st 2020 to March 15 th 2020 and to the period post-COVID of April 5 th 2020 to April 19 th 2020 for the 44 considered US airports based on SafeGraph data. Top ten worst airports Pre-COVID Post-COVID Airport Score Airport Score Airport Score Airport Score more than four hours at the same location within the airport. Similarly to the visitor airport efficiency score, some airports have a score of 1 due to locations receiving very few visitors (less than 5) over the considered week that were captured within the SafeGraph data, and all those visitors stayed more than four hours at that same airport location. Following the results of the metric proposed in Section III.C.1, this section focuses on the two airports with the most important change in behavior linked to the COVID-19 travel restriction measures, JFK and IAD. JFK had the best increase in rank using the proposed immigration quality score presented in Table 1 , and this section aims at analyzing the available CBP immigration data. The effect of the travel ban measures presented in Section I.A on passengers arriving at JFK's immigration is presented in Figure 5 through four different views by comparing data from 2020 with CBP data from the years 2018 and 2019 between January 1 st and April 22 nd . Figure 5 (a) shows the daily evolution of the number of passengers arriving at JFK's immigration and confirms the important huge drop in the number of arriving international passengers to the US from an average 35.6 thousands passengers arriving at immigration a day down to barely 360 passengers a day. Compared to the years 2018 and 2019, with an average of 45.9 thousands passengers, the difference is more important, since the number of passengers arriving at JFK's immigration is usually higher in April than in March. Figure 5 Figure 5 (c) shows the evolution of the daily average load factor (Definition 1). After the lockdown and travel ban measures, the load factor drops significantly from an average of 42.5 before the measures down to around 8.5, which represents a 80% drop. This indicates that after the measures, an immigration booth has about five times fewer passengers to process per hour. Or from the passenger perspective, each passenger has about five times more open immigration booths to take care of them. This has a direct positive impact to the average wait time at immigration for passengers. Figure 5 IAD had the worst drop in rank using the proposed immigration quality score presented in Table 1 , and is the focus of this section. Figure 6 shows the impact of the travel restriction measures for passengers arriving at IAD's immigration through the four same perspectives as the analysis of JFK. Figure 6 (a) shows the daily evolution of the number of passengers arriving at IAD immigration and confirms that, even though in 2020 that number has dropped from an average of 7.2 thousands in February 2020 to an average of 726 in April 2020 after the implementation of the travel ban measures, the drop is less important than for JFK ( Figure 5(a) ). Though this is still a 93% drop for the number of passengers arriving at immigration in April between the years 2019 and 2020, with a daily average of 10.4 thousands passengers in 2019, the number of open immigration booths was not impacted as much as for JFK. Figure 6 This section leverages a dataset actively generated by passengers to observe the effects of the travel restriction measures presented in Section I.A on seven major US airlines and propose several passenger-centric metrics to analyze their reactions with respect to their customers. Airlines operate over more than one airport and in each airport there are usually more than one airline operating at the same time. There is thus no straight-forward way to determine which airline a passenger is flying with using only geolocation data without excessive passenger tracking. The datasets presented and used for the evaluation of the COVID-19 travel restriction measures on airports cannot be directly used to evaluate the impact of these same measures to airlines. Another approach is thus necessary to evaluate the reaction of airlines to this pandemic situation from a passenger perspective. The importance of airport experience in customer, i.e. passenger, satisfaction towards both airline and airport services was already highlighted in the study of Pruyn and Smidts [35] , where they show that customer satisfaction is largely affected by their experience at waiting areas, both in terms of wait times and wait environment. This implies that waiting at airports (or any other transit station) can be acceptable for passengers if they are taken care of accordingly. Watkins et al. [36] confirmed this conclusion by showing that the perceived wait time for transit riders was lower for riders receiving real-time information than for passengers without that information. One means of real-time information is social media. In particular, Twitter is an important means of direct communication between airlines and passengers, with an average of more than 300 tweets a day over the month of January 2020 written by the customer services of four major US carriers (Southwest Airlines, Delta Airlines, American Airlines and United Airlines) and an average of more than 800 tweets a day written by their customers. The use of Twitter as a real-time estimator of the air transportation system has already been investigated in [32] in order to estimate flight-centric values per airport before they were released by BTS, and the same data extraction process is used here. Seven airlines, and their associated Twitter handles, are considered in the following sections: American Airlines (@AmericanAir), Delta Air Lines (@Delta), United Airlines (@united), Alaska Airlines (@AlaskaAir), Southwest Airlines (@SouthwestAir), JetBlue Airways (@JetBlue) and Spirit Airlines (@SpiritAirlines). The first four are legacy airlines, and the last three are low-cost carriers. All tweets written from these airlines Twitter account were scrapped from February 1 st 2020 to April 12 th 2020 and are categorized as "customer service tweets". All tweets written over that same period and mentioning at least one of the airline handles that was not written from the associated airline Twitter account were also scrapped and categorized as "passenger tweets". The same classifiers as in the study conducted in [32] are used here to estimate the mood expressed within tweets in order to monitor the real-time evolution of the passenger and airline customer service expressed moods. Each classifier gives a score of 1 if it considers that the tweet expresses a positive sentiment and a score of 0 if it expresses a negative sentiment. In effect, each classifier calculates the probability for a tweet of being positive, and then rounds that probability to the closest integer (0 or 1). The classifiers are here transformed into regressors by considering the probability for a tweet of being classified as positive. The output of all trained regressors are then averaged into one single score, going from 0 for a negative mood to 1 for a positive mood. Using the mean sentiment expressed within each tweets aggregated on a daily level, it is possible to compare the effect of the lockdown from both a passenger perspective and an airline perspective. Figure 7 shows the evolution of the expressed mood from February 1 st 2020 to April 12 th 2020 for the four legacy airlines considered. From Figure 7 (a), a drop in the mood expressed within passenger tweets can be observed right after the US travel ban for the three major airlines (Delta Air Lines, United Airlines and American Airlines). Delta has the steepest descent but also the sharpest recovery. The case of Alaska Airlines is particular: a #AlaskaHappyHour campaign, which gave the opportunity of winning free flights to Alaska, took place in the beginning of March 2020. This campaign could explain the increase in the expressed mood in passenger tweets between March 1 st 2020 and March 5 th 2020 and could as well as compensate the effect of the travel ban announcement. This drop in the mood is less visible (or non-existent in the case of American Airlines) within the tweets written by the airline customer services, as shown in Figure 7 (b). Though Delta Air Lines and Alaska Airlines had the highest expressed mood on average within passenger tweets, the mood expressed by their customer service is the lowest on average of the four legacy airlines considered. The better mood expressed by their passengers could be explained by the fact that these companies expressed a mood closer to their passengers' actual mood. The gap between the mood expressed within tweets written by passengers and tweets written by airline customer services is visible from one figure to another, airline customer service tweets expressing a mood about 0.2 points higher than passenger tweets. Similar observations can be drawn from a low-cost carrier perspective. Figure 8 shows the evolution of the expressed mood from February 1 st 2020 to April 12 th 2020 for the three low-cost carriers considered. Airlines express a significantly lower mood on average than the other two low-cost carriers considered over the months of February and March 2020. A spike in the expressed mood in tweets written by JetBlue passengers can be seen around March 26 th 2020, which corresponds to the period when JetBlue announced they would be offering free flights to health care workers in order to help the governor of New York handle the spread of COVID-19 in New York State § , as well as the period when an update of their mobile application contained the message "Now, go wash your hands". The drop in the mood expressed in the tweets written by passengers of legacy airlines after Italy's lockdown is less visible for passengers of low-cost airlines. From a customer service perspective, Figure 8 (a), the gap between the mood expressed in the tweets written by Southwest Airlines customer and the mood expressed in the tweets written by the customer service of the other two carriers is resorbed the day after Italy's lockdown. This could indicate a similar communication policy for these carrier regarding the COVID-19 pandemic. The same gap as for legacy airlines between the mood expressed within tweets written by passengers and tweets written by airline customer services of about 0.2 points is visible from Figure 8 (a) to Figure 8(b) . Based on this sentiment data, two metrics are proposed here to compare airlines. A first metric aims to measure how well airlines are in phase with the mood of their passengers. For example, if the average passenger mood is decreasing, the mood expressed by the airline customer service should not be increasing. The airline empathy score is defined as the correlation between the evolution of the average mood expressed by passengers in their tweets and the evolution of the average mood expressed by the airline customer service in their tweets. This score goes from -1 to 1, with 1 meaning that the airline customer service expressed mood is perfectly in phase with the mood expressed by their passengers. A score of 0 indicates that the mood expressed by the airline customer service is uncorrelated with the expressed mood of their passengers. A score of -1 indicates that the airline customer service expressed mood is in complete opposition of phase with the mood expressed by their passengers. In other words, the mood expressed by airlines increases when the mood expressed by passengers decreases, and vice-versa. Proposed passenger-centric metric 5 The airline sentiment gap measures the average difference between the average mood expressed by passengers and the average mood expressed by airlines. This measures goes from -1 to 1 with 0 indicating that airlines and passengers express the same average mood. 1 indicates the worst case, i.e. when the mood expressed by airlines is equal to 1 (i.e. highest possible) and the mood expressed by passengers is equal to 0 (lowest possible) throughout the considered period. A measure of -1 indicates the opposite scenario. Table 4 presents these two proposed passenger-centric metrics based on the sentiment expressed in tweets for the seven considered airlines and ranks the airlines for each score. The scores were calculated over the period of March 1 st 2020 to April 11 th 2020. When a some exceptional situation occurs, a spike in the use of certain keywords can be seen within the stream of tweets written by the affected Twitter users. For example, in the case of an important number of cancellations, many passengers will go on Twitter and use the keyword "cancel" to express their concerns directly to the airline they were flying with. Figure 9 shows the evolution of the normalized number of occurrences of the keyword "cancel" in tweets written by passengers from February 1 st 2020 to April 12 th 2020 for four US legacy airlines and three US low-cost carriers. The normalization is based on the total number of passengers transported by each considered carrier over the year 2018 using BTS data [37] . Figure 9 (a) indicates that the passengers of the four legacy airlines considered are reactive to the international situation; an important increase can already be seen around the date of Italy's lockdown announcement. A second spike then occurs once the US announces that it bans all travellers from the EU, China and Iran, with Delta Air Line passengers being, in proportion, the most vocal on Twitter. This could indicate that Delta Air Line has a greater proportion of its US passengers traveling in the EU at that time. Alaska Airlines had an early spike in the number of tweets containing the keyword "cancel" compared to the other legacy airlines. That early spike could be link to the fact that most of the first US cases of COVID-19 were discovered on the US West Coast, which is where Alaska Airlines main hub is located. Figure 9 (b) shows the evolution in the mood expressed in passenger tweets for the three considered low-cost carriers. Passengers of Southwest Airlines are the less vocal in proportion on the matter of cancellation, with a slight increase in number of occurrences of the keyword "cancel" in their tweets almost entirely contained within the period between Italy lockdown announcement and the rise to a Level y travel advisory for the US. Passengers of JetBlue Airways have a Figure 9 Number of occurrences of the keyword "cancel" in tweets written by passengers normalized by the number of total passengers per carrier over the year 2018 using BTS data [37] behavior similar to legacy airlines in this case. Spirit Airlines passengers waited for the US travel ban announcement to express massively their concerns using the word "cancel". Figure 10 shows the evolution of the number of occurrences of the keyword "cancel" in tweets written by airline customer sservices from February 1 st 2020 to April 12 th 2020 for the same four US legacy airlines and three US low-cost carriers. Please note that the y-axis scale is different between Figure 10 (a) and Figure 10 For legacy airlines, the behavior shown in Figure 10 (a) is similar for three out of four of the considered airlines. There is a significant increase in the number of occurrences of the keyword "cancel" starting the Italy announced its lockdown and that number then slowly decreases. For American Airlines, after a similar increase in the number of occurrences of the keyword "cancel", that number does not decrease but fluctuates at a level more important than during the period before the travel restriction measures where announced. Regarding low-cost carriers, Figure 10 (b) shows that each carrier has its own characteristic regarding the use of the keyword "cancel". Southwest Airlines has two important spikes around each of the US announcements referenced in the plot. JetBlue has a single massive spike on March 13 th 2020. Both of these carriers then spent more than two weeks with a higher level of occurrences of the keyword "cancel" than in February 2020. Spirit Airlines barely uses the keyword "cancel" in their communication except on March 23 rd 2020. Based on the observations from the plots in Figure 9 , it is possible to consider that an important increase in the normalized number of passenger tweets containing the keyword "cancel" represents a situation that airlines have to deal with in order for that volume to return to normal values. Two metrics to measure the airline reaction to such a situation are proposed here. The aim of the first metric is to measure the effectiveness of the airline response to these keyword-related situations. Proposed passenger-centric metric 6 The keyword-related Twitter situation quality response score of an airline is the time needed for the airline to bring the normalized number of occurrences of the keyword within passenger tweets back to a predefined level. This proposed quality metric measures the time needed for the airline to bring the number of keyword occurrences back to a normal state. When measuring the response of long term perturbation, such as the COVID pandemic, this time is measured in days. The number of keyword occurrences in the passenger tweets is normalized by the total number of passengers over the year 2018 in this case, similarly to the data presented in Figure 9 , and this normalization should be updated with the most recent numbers once they are available. The aim of the second metric is to measure the communication effort produced by the airline in order to handle the situation linked to the increase of number of occurrences of the keyword under consideration. The keyword-related Twitter situation quantity response score of an airline is calculated by integrating the number of occurrences of the keyword in tweets written by the airline over the number of days associated to the situation. The number of days used to calculate this quantity response score corresponds to the number of days found using the quality response score associated with the same situation. Table 5 presents these two proposed metrics in the case of the keyword "cancel" considering that the predefined threshold indicating when a situation starts and ends is 1. Table 5 illustrates the necessity of considering both the quality response score and the quantity response score hand in hand. Southwest Airlines has the best scores from both perspective but Spirit Airlines has the second best quality response score but the worst quantity response score by far. Figure 11 shows the evolution of the number of occurrences of the keyword "refund" in tweets written by passengers from March 1 st 2020 to April 12 th 2020 for the same seven US airlines. Figure 11 shows the evolution of the normalized number of occurrences of the keyword "refund" in tweets written by passengers from February 1 st 2020 to April 12 th 2020 for the same seven US airlines using the same normalization process. From a passenger perspective, the situation linked to the "refund" keyword is similar to the situation linked to the "cancel" keyword but at a lower proportion. Figure 11 (a) shows that legacy airlines all have a steep increase in the number of occurrences of the keyword "refund" in passenger tweets at the announcement of Italy's lockdown and then a very slow decrease, with Alaska Airlines have an anticipated spike at the beginning of March 2020. Figure 11 that Southwest Airlines increase in the occurrences of the keyword "refund" is still lower and resorbs faster than the other low-cost carriers and that the spike for Spirit Airlines passengers starts only at the announcement of the US travel ban. Figure 12 shows the evolution of the number of occurrences of the keyword "refund" in tweets written by airline customer services from February 1 st 2020 to April 12 th 2020 for the same seven US airlines. Figure 12 (a) shows the evolution of the number of occurrences of the keyword "cancel" within tweets of customer service of the four considered legacy airlines. The initial increase is similar than for the keyword "cancel" (Figure 10(a) ), however there is then a second increase towards the end of March 2020, this increase being most visible within the tweets written by American Airlines. From a low-cost carrier perspective, Figure 12 (b) illustrates the same idiosyncracies as in Figure 10 (b): Two spikes around the US announcements for Southwest Airlines, this time with higher fluctuations afterwards, one major spike on March 13 th 2020 for JetBlue and not even one occurrence of the keyword "refund" over the whole period for Spirit Airlines. The same two metrics associated to the "cancel"-related Twitter situation presented in Section IV.C.1, i.e. the quality response score and the quantity response score, can be used for this "refund"-related Twitter situation. Table 6 presents these two proposed metrics in the case of the keyword "refund" using the same predefined threshold of 1 for delimiting a Twitter situation. In order to complement existing flight-centric metrics, which are not broadly and readily available, this paper proposed to consider seven different passenger-centric metrics based on passenger-generated data, three of them focusing on airports and the other four metrics focusing on airlines. They are regrouped and listed here. Proposed passenger-centric metric 1 The immigration quality score for an airport of entry is defined as the correlation between the daily average wait time for passengers at its immigration service and the daily average immigration load factor of the airport over a given period. The weekly airport visitor efficiency score for an airport is defined as the weekly proportion of airport visitors spending less than 60 minutes at an airport. Proposed passenger-centric metric 3 The weekly airport visitor slugginess score for an airport is defined as the weekly proportion of airport visitors spending more than 240 minutes at an airport. Proposed passenger-centric metric 4 The airline empathy score is defined as the correlation between the evolution of the average mood expressed by passengers in their tweets and the evolution of the average mood expressed by the airline customer service in their tweets. Proposed passenger-centric metric 5 The airline sentiment gap measures the average difference between the average mood expressed by passengers and the average mood expressed by airlines. The following two proposed metrics can be used with selected keywords, e.g. "cancel" and "refund". Proposed passenger-centric metric 6 The keyword-related Twitter situation quality response score of an airline is the time needed for the airline to bring the normalized number of tweets containing the specific keyword back to a predefined level. Proposed passenger-centric metric 7 The keyword-related Twitter situation quantity response score of an airline is calculated by integrating the number of tweets written by the airline and containing the keyword over the number of days associated to the situation. These proposed passenger-centric metrics were built using data available, either in real-time or due to the exceptional circumstances, and can still be further tuned to meet the expectations of both federal agencies and passengers. Several limitations and possible improvements should be noted here for a better understanding of these proposed metrics. The data used to estimate the number of visitors at airports and the proportion of time spent per airport location was graciously provided by SafeGraph in order to better understand the COVID-19 situation, and is not usually as easily available. In order to implement the associated metrics, agreements should be held between the different data providers and the group in charge of such metrics. Furthermore, an analysis of the categories of person most likely to be within the gathered data should be undertaken to better tune the final score. The proposed Twitter related metrics have the unique advantage, among the presented datasets, that they can easily be updated on an hourly basis if needed. They also enable each passenger and airline to actively be able to influence their scores. They measure however essentially the communication quality and quantity between airlines and passengers, and should therefore still be complemented with traditional flight-centric measures for completeness. This study focused on the effects of the travel restriction measures linked to a major disruption taking its course over an important number of days and tailored the proposed metrics for this timespan. Future studies could also investigate into the adaptation of some of these proposed passenger-centric metrics to measure effects on a smaller scale, e.g. over a single day or a few hours. Coronavirus Travel Restrictions, Across the Globe Which countries are in mandatory lockdown due to COVID-19? Global Level 4 Health Advisory -Do Not Travel Bureau of Transportation Statistics An Analysis of Passenger Delays Using Flight Operations and Passenger Booking Data Flight Operations Recovery: New Approaches Considering Passenger Recovery Passenger Trip Time Metric for Air Transportation Methods for Analysis of Passenger Trip Performance in a Complex Networked Transportation System NextGen Integration, and Implementation Office Flightpath 2050: Europe's Vision for Aviation ; Maintaining Global Leadership and Serving Society's Needs NextGen Metrics for the Joint Planning and Development Office Passenger-Oriented Enhanced Metrics META-CDM: Multimodal, Efficient Transportation in Airports and Collaborative Decision Making Airport Gate Scheduling for Passengers, Aircraft, and Operation Air Transportation and Multimodal, Collaborative Decision Making during Adverse Events The passenger IT trends survey Passenger Flow Predictions at Sydney International Airport: A Data-Driven Queuing Approach Advances in Measuring Pedestrians at Dutch Train Stations Using Bluetooth, Wifi and Infrared Technology Modeling and Predicting the Occupancy in a China Hub Airport Terminal Using Wi-Fi Data Implementing and Validating Air Passenger-Centric Metrics Using Mobile Phone Data Passenger-Centric Metrics for Air Transportation Leveraging Mobile Phone and Twitter Data Understanding Door-to-Door Travel Times from Opportunistically Collected Mobile Phone Records SafeGraph COVID-19 data consortium Monthly Active Twitter Users in the United States Twitter-Based Information Distribution during the 2009 Red River Valley Flood Threat Microblogging during Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness Applications of Topics Models to Analysis of Disaster-Related Twitter Data Towards a Realtime Twitter Analysis during Crises for Operational Crisis Management Practical text mining and statistical analysis for non-structured text data applications An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis Predicting and Analyzing US Air Traffic Delays Using Passenger-Centric Data-Sources Passengers on Social Media: A Real-Time Estimator of the State of the US Air Transportation System United States Customs and Border Protection Doorway to the United States: An Exploration of Customs and Border Protection Data Effects of Waiting on the Satisfaction with the Service: Beyond Objective Time Measures Where Is My Bus? Impact of Mobile Real-Time Information on the Perceived and Actual Wait Time of Transit Riders The authors would like to thank Nikunj Oza from NASA-Ames, the French École Nationale de l'Aviation Civile and the King Abdullah University of Science and Technology for their financial support, as well as SafeGraph for making their data available for this study.The authors would also like to deeply thank all workers and researchers associated in the fight against the COVID-19 pandemic, with a special thought to health-care workers and providers.