key: cord-0701683-6t29xi73 authors: Kutela, Boniphace; Novat, Norris; Langa, Neema title: Exploring Geographical Distribution of Transportation Research Themes Related to COVID-19 using Text Network Approach date: 2021-01-23 journal: Sustain Cities Soc DOI: 10.1016/j.scs.2021.102729 sha: 163621e6dc775ade12077d073eec7d705de93dbc doc_id: 701683 cord_uid: 6t29xi73 The COVID-19 outbreak has extremely impacted the globe due to travel restrictions and lockdowns. Geographically, COVID-19 has shown disproportional impacts; however, the research themes' distribution is yet to be explored. Thus, this study explored the geographical distribution of the research themes that relate to COVID-19 and the transportation sector. The study applied a text network approach on the bibliometric data of over 400 articles published between December 2019 and December 2020. It was found that the researches and the associated themes were geographically distributed based on the events that took place in the respective countries. Most of the articles were published by the authors from four countries, the USA, China, Japan, and the UK. The text network results revealed that the USA-based studies mainly focused on international travelers, monitoring, travel impacts of COVID-19, and social-distancing measures. The Japanese-based studies focused on the princess diamond cruise ship incident. On the other hand, Chinese authors published articles related to travel to Wuhan and China, passenger health, and public transportation. The UK-based studies had diverse topics of interest. Lastly, the remaining 62 countries' studies focused on returning travelers from China, public transportation, and the global spread of COVID-19. The findings are crucial to the transportation sector’s researchers for various applications. It has been about a year since the first asymptomatic case of COVID-19 was reported on 8 th December 2019 in Wuhan, China (WHO 2020b) . COVID-19 outbreak was later declared as a pandemic in March 2020, after slightly more than 150,000 people were infected. As of December 2020, statistics show that J o u r n a l P r e -p r o o f Being a communicable disease, COVID-19 is easily transmitted through different transportation modes. Thus, the imposed spread control measures such as lockdowns and travel restrictions have greatly affected the transportation sector. Such impacts have prompted many researchers in the world to explore the implication of COVID-19 on transportation. Understanding how the transportation sector has aided the spread, as well as developing preventive measures, are among the few focus to mention (Clifford et al. 2020; Craig, Heywood, and Hall 2020; Gostic et al. 2020) . Their study (Craig, Heywood, and Hall 2020) analyzed how travelers may introduce COVID-19 into the Pacific islands by using travel and Global Health Security Index data using a scoring tool to produce quantitative estimates. They found that the air routes with the highest risk to the Pacific islands are from East Asian countries (specifically, China, Korea, and Japan). Also, (Clifford et al. 2020 ) evaluated the effectiveness of thermal passenger screening at the airport exit and entry to inform public health decision-making. They estimated that 46% (95% confidence interval: 36 to 58) of infected travelers would not be detected, depending on the incubation period, the sensitivity of exit and entry screening, and the proportion of asymptomatic cases. Although there is a geographical disproportion of the number of infected, deaths, and recovery of the individuals, limited literature describing the impact of such a disproportion on the transportation sector is available. In other sectors, however, a few studies exist that show the distribution of the research studies in response to the COVID-19 geographical impacts. A study by (Rose-Redwood et al. 2020) J o u r n a l P r e -p r o o f discussed the geography of COVID-19 with a view of the extent of spread worldwide by exploring 42 commentaries written by contributors across the globe. Their study's findings are influenced by contributors' perceptions whose views are limited to their understanding and may differ from someone else. Also, a study by (Sun et al. 2020 ) explored the interconnections between countries and spatial structures in the prevalence of the pandemic by using a dataset of confirmed cases by June 28 th . They found that spatial models can help partially explain the geographic disparities in COVID-19 period prevalence. Another study (Kuchler et al. 2020 ) relied on social media analysis to forecast the geographic spread of communicable diseases such as COVID-19. The data set appears to be limited to two countries; USA and Italy and thus it cannot be concluded if the results can be extrapolated to other countries and specifically countries with little exposure to social media (Kuchler et al. 2020 ). Therefore, this study applies the text mining approach to evaluate the themes of the COVID-19-Transportation published articles to explore their geographical distribution. The study also tries to answer a question regarding the influence of COVID-19 on the collaboration among authors as well as the key role played by the major developed countries to help COVID-19 researches in other developing countries. The practical importance of the methodology adopted for this research is the visualization from text networks, which give a thorough analysis of what is covered in the literature. This study views the pandemic from a wider spectrum of approaches and perspectives by different authors and countries. The geographical extent of this research will expose the global scholarly gap in the transportation field amidst the pandemic. And therefore, it helps track down how the world of transportation is impacted in different parts of the world but also understand what is currently known and what is yet to be studied globally as far transportation is concerned. The proceeding sections presents the methodology that summarizes the approach used to explore the geographical publications on COVID-19 and Transportation. Then, the data description section, which introduces the study data sources used in the analysis, followed by a discussion of the results. The last section is the conclusion, which summarizes the manuscripts and presents the limitations and other areas of exploration. J o u r n a l P r e -p r o o f This study utilized a total of 488 Transporation-COVID-19 publications retrieved from reliable and upto-date publication depositories, namely, Lit Covid (Chen, Allot, and Lu 2020) Stephen B. Thacker Center for Disease Control and Prevention (CDC) library (CDC 2020), and Elsevier (Elsevier 2020). The LitCovid (Chen, Allot, and Lu 2020) have articles that are updated daily and are further categorized by different research topics and geographic locations for improved access. The Stephen B. Thacker Center for Disease Control and Prevention (CDC) library (CDC 2020) and (Elsevier 2020) have more than 80,000 articles related to COVID-19 that are deposited and are freely available. The outstanding relevance of these depositories proves them credible for use in the study. The textual data from the database were published between 8 th December 2019 and 8 th December 2020; the latter publications from this date were therefore not considered. COVID-19 and transportation-related manuscripts were explored in this study; a collective dataset with relevant keywords was assembled to extract transportation-related articles. The keywords such as; traveling, travelers, mobility, which are general transportation terms were used to search manuscripts that related COVID-19 to transportation. Further, airways, airport, airplane, flight, aircraft, air travel were used to search for airways travel mode. Additionally, ship, cruise, and boat were the keywords used to search for publications relating to COVID-19 and waterways mode. Subway, bus, train, rideshare, taxi, and bike were used to search for surface transportation. The search was followed by stratification by author country of affiliation and respective countries of case studies to narrow down the focus geographically. This study applied the text network approach to explore the geographical distribution as well as the differences in the contents of COVID-9-transportation-related literature per geographical area. The text The text network analysis enables the extraction and interpretation of the textual data easily by considering the connections among the keywords; thus, it over-performs the frequency-based textual data analysis approaches. The text network's creation involves a few more extra steps than the one used in other text mining approaches. In general, text mining approaches involve text normalization and the creation of structured data from unstructured data. During text normalization, all stop words signs and symbols are removed, and capital letters are converted to small letters for uniform analysis. The creation of structured data involves the identification of the individual keywords by which several analyses can be performed. The extra step added for the text network is the network creation using a two-word gap or five-word gap approach. In this study, the two-word gap is used. In this approach, the predefined algorithms scan the sentence to determine consecutive keywords within a two-word window (Paranyushkin 2011 ). The first obtained pair is mapped on the network, and then the algorithm continues to search for the next pair. If the same previously mapped pair is found, the algorithm assigns an additional weight/frequency of one on the existing pair of keywords (Figure 3 ). Edge Node/Cluster Else, if the new pair is obtained, the algorithm assigns a new node and an edge whose frequency is one ( Figure 3 ). It should be noted that the algorithm does not connect words from two different sentences (Paranyushkin 2011) . A complete text network has several nodes of different sizes and edges of different lengths and sizes. The nodes, also known as a cluster, are labeled by using their respective keywords. The larger the node/cluster, the more frequent the keyword, while the thicker the edge, the higher the frequency of the connected keywords. The length of the edge represents the distance between the keywords in a sentence. The shorter the edge in the text network, the closer the keywords in the sentence. Moreover, keywords with a similar pattern tend to form a group of interconnected clusters called a community. Completion of the network allows the extraction of quantitative information for in-depth analysis and inferences (Yoon and Park 2004) . In this study, the interpretation is based on the size of the cluster and edges, as well as the topology of the entire network. Further, since the focus is on the major themes studied so far, the degree centrality, which quantifies the extent of connections between nodes was used to draw insights. It is computed by considering the number of edges that originate from one node to the other (Hansen et al. 2020 ) as shown in Equation 1. Whereby, takes a value of 1 if nodes i and j, are connected, and 0 otherwise. The entire analysis was performed in R-statistical software(R Core Team 2020), using quanteda and igraph packages, respectively (Benoit et al. 2018; Csárdi 2020) . Considering the complexity of the network, only the top 50 keywords were used to draw insights. This section presents a thorough discussion of the text networks produced from the text mining of the The text network in (Worldometer 2020) . The measures of degree centrality among the most published author country also indicate that the USA (32) is leading followed by UK, China Germany, and Japan, explaining its extensive collaboration with other country's authors. Although Germany has a greater number of collaborated studies (Figure 4) , it has fewer number of total studies than Japan (Figure 2) . The text network of the titles for the Chinese-based publications is presented in Figure 6 . The network is dominated by two communities. The one to the far right of the network is formed by keywords (health, public, transport, guideline, stations passenger, protection) . This shows that some researchers had a particular interest in public transportation connection to COVID-19 (Mizumoto et al. 2020 in the quest to understand the pandemic (Lai et al. 2020; Staff 2020; Zhong, Guo, and Chen 2020) . Also, the link (China-province) shows these researchers focused their search on the provinces where the break J o u r n a l P r e -p r o o f out is traced such as Wuhan and Macao. (Lio et al. 2020) . The heavy node travel has heavy links with travel-high-risk, travel-history signifying that some researchers tried to understand the pandemic by studying passengers' travel history and their prominent risks (Ceder and Jiang 2020; Shi et al. 2020) . Their degree centrality measures values also explain the linkages of the mentioned nodes, keyword travel (21) is leading followed by China (20), epidemic (20), travelers (17) The text networks presented in Figure 7 describe the keywords for the articles' titles whose authors are based in Japan. A vivid community of highest degree centrality (cruise-ship, princess-diamond, and Japan) appears to be at the center of this network topology. The implication is that the authors focused more on the famous Diamond Cruise ship incident that occurred in Japan (Anan et al. 2020; Sekizuka et al. 2020; Zhang et al. 2020) . Moreover, the bottom of the network presents a community formed by keywords (acute, environmental, syndrome, mild, respiratory) . The implication here is that some researchers tried to link the mild respiratory symptoms to environmental sanitation, and some tried to explore the possibility of limiting the spread by sanitizing the environment (Hirotsu et al. 2020 ). Nevertheless, the use of such keywords highlights that some studies in Japan had initially perceived the pandemic as a mild respiratory syndrome (Arashiro, Furukawa, and Nakamura 2020). This network's important keywords also have the highest degree centrality measures showing their higher level of association with other keywords and their in-depth coverage. The keyword cruiseship (38) is leading by followed by princess-diamond (37), japan (24), passengers (18) and respiratory (17). cruise-ship princess-diamond japan passengers respiratory Degree centrality 38 37 24 18 17 Japan-based Article Titles' Network J o u r n a l P r e -p r o o f Given the outstanding authorship collaboration from different countries across the globe as elaborated from Figure 4 , the text networks in Figure 9 describe the mixed countries' context. The text network reveals several heavy nodes (international, travel, China, and spread) that are well connected to the rest of the network. First, the heaviest node (international) appears to have a dense connection to the keyword (travel-restrictions). The connection shows that researchers explored the effects of travel restrictions that have been issued worldwide (Devi 2020). The node international is also linked to the nodes China, Wuhan, spread, and potential; it is then fair to say a significant number of studies explored the causes of the potential spread of the outbreak in Wuhan, China and internationally. Additionally, the network presents another heavy keyword travel, which has a central connection to keywords global, pandemic, and health, among others. The deduction here is that many of these publications centered their thoughts on safe traveling (Bonilla-Aldana et al. 2020; Errett, Sauer, and Rutkow 2020; Nakazawa, Ino, and Akabayashi 2020) . Further, the themes related to the princess diamond cruise ship appear in this network as shown by the node princess-diamond. The node is J o u r n a l P r e -p r o o f connected to Japan where the princess diamond cruise ship incident occurred Japan (Anan et al. 2020; Sekizuka et al. 2020; Zhang et al. 2020) . The heaviest nodes of the network also have a high degree of centrality measures value explaining their in-depth coverage In these articles, the keyword china (22) is leading, followed by keywords international (22), travel (22), risk (20) and spread (18). The text networks in Figure 10 presents keywords for the publications whose authors are from other countries than the aforementioned above. First, the majority of researches seek to explore the COVID-19 infection among travelers returning from Wuhan, China (Hoehl et al. 2020; Ng et al. 2020; Tian et al. 2020) . Second, some of these publications J o u r n a l P r e -p r o o f focused their search on the screening process and its efficiencies at the airports (Quilty et al. 2020 Further, lockdown and mobility also appear in the network, which suggests that a significant number of researchers focused on the impact of lockdown on the mobility of people (Beria and Lunkar 2020). The degree centrality measures of the heaviest nodes of the network also signify their in-depth coverage by the researchers by other countries with keyword travel (31) leading followed by disease (29), travelers (28), cases (23) and risk (23). The literature on COVID-19 has been growing at a rather alarming rate since its outbreak in December circumstances in developing countries. The study found that majority of these publications had a common interest, which is the risk of spreading the disease through human mobility. Therefore, it was a common resolution for most researchers to explore the effectiveness of travel restrictions in the hope of limiting the spread. It is also commonly observed that most of these publications linked their findings to China and particularly the city of Wuhan. The common conception here is that the pandemic originates from China. Consequently, some of the authors chose to link each COVID-19 confirmed case travel history to China as well as estimating the extent of spread from person to person contact with regards to returning travelers from China. Additionally, many authors explored the absolute risk of contagion and restrictions or regulations on public transportation use as it is highly dependent on the disease prevalence in the community at any specific time and the phase of the outbreak. Moreover, the geographical distribution of the research themes is observed. Authors from the USA focused their exploration on travelers with respect to their origins and destinations of traveling, protective mechanisms such as screening and social distancing. While the Japanese based authors greatly explored the circumstances of COVID-19 and the Diamond Princess Cruise ship. However, the Chinese-based studies centered their research on Wuhan, China, as well as spread preventive guidelines and public transportation. Furthermore, great concern on the global spread of the pandemic, traveling to and from China, and risks of air transportation were made by the rest of the remaining countries (62). The geographical theme distribution is attributed to the nature of impact a particular country has endured with respect to the pandemic. While this study's results offer some insights into the geographical distribution of the themes of COVID-19-transportation research, some limitations could be addressed in further research. Firstly, this study J o u r n a l P r e -p r o o f focused on exploring the publication through text mining and visualizations of the contexts sourced from some of the most renowned depositories. The results, however, cannot be extrapolated to the contexts from other depositories. Secondly, this research is bounded within the transportation sector's reach as it is undoubtedly the most pandemic linked sector next to the health, financial, and socialeconomic fields. Nevertheless, few to none of the published literature cover in-depth all modes of transportation as well as the adversity that they are enduring. Furthermore, regardless of the geographical theme distribution, less has been covered on sustainable transportation means such as bikeshare systems. There is no conflict of interest J o u r n a l P r e -p r o o f The Effect of Uncontrolled Travelers and Social Distancing on the Spread of Novel Coronavirus Disease (COVID-19) in Colombia Estimated Effectiveness of Traveller Screening to Prevent International Spread of 2019 Novel Coronavirus (2019-NCoV)." medRxiv : the preprint server for health sciences Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections Environmental Cleaning Is Effective for the Eradication of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Contaminated Hospital Rooms: A Patient from the Diamond Princess Cruise Ship Evidence of SARS-CoV-2 Infection in Returning Travelers from Wuhan, China The Geographic Spread of COVID-19 Correlates with Structure of Social Networks as Measured by Facebook Assessing Spread Risk of Wuhan Novel Coronavirus within and beyond China Coronavirus Impacts on Post-Pandemic Planned Travel Behaviours The Common Personal Behavior and Preventive Measures among 42 Uninfected Travelers from the Hubei Province, China during COVID-19 Outbreak: A Cross-Sectional Survey in Macao SAR, China Africa Braces for Coronavirus, but Slowly -The New York Times 2020. medRxiv Estimating the Asymptomatic Proportion of 2019 Novel Coronavirus Onboard the Princess Cruises Ship Impact of COVID-19 on Transportation in Lagos, Nigeria Identification and Monitoring of International Travelers During the Initial Phase of an Outbreak of COVID-19 -California Chronology of COVID-19 Cases on the Diamond Princess Cruise Ship and Ethical Considerations: A Report from Japan Travel Risk Perception and Travel Behaviour during the COVID-19 Pandemic 2020: A Case Study of the DACH Region SARS-CoV-2 Infection among Travelers Returning from Wuhan, China Using Observational Data to Quantify Bias of Traveller-Derived COVID-19 Prevalence Estimates in Wuhan, China Assessing the Impact of COVID-19 on Bike-Sharing Usage: The Case of Thessaloniki, Greece Identifying the Pathways for Meaning Circulation Using Text Network Analysis Effectiveness of Airport Screening at Detecting Travellers Infected with Novel Coronavirus (2019-NCoV) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Geographies of the COVID-19 Pandemic Limiting Spread of COVID-19 from Cruise Ships: Lessons to Be Learnt from Japan Can You Catch It? Lessons Learned and Modification of ED Triage Symptom-and Travel-Screening Strategy Haplotype Networks of SARS-CoV-2 Infections in the Diamond Princess Cruise Ship Outbreak Impact of the COVID-19 Pandemic on Travel Behavior in Istanbul: A Panel Data Analysis Travel Restrictions and Sars-Cov-2 Transmission: An Effective Distance Approach to Estimate Impact Contextualising Coronavirus Geographically Pressure Grows on China for Independent Investigation into Pandemic's Origins A Spatial Analysis of COVID-19 Period Prevalence in US Counties through Presumptive Asymptomatic COVID-19 Carriers' Estimation and Expected Person-to-Person Spreading among Repatriated Passengers Returning from China Early Evaluation of the Wuhan City Travel Restrictions in Response to the 2019 Novel Coronavirus Outbreak COVID-19 and Public Transportation: Current Assessment, Prospects, and Research Needs Travel Restrictions as a Disease Control Measure: Lessons from Yellow Fever COVID-19 Cases Top 10 000 in Africa | WHO | Regional Office for Africa Coronavirus Update (Live): 20,521,644 Cases and 745,918 Deaths from COVID-19 Virus Pandemic -Worldometer Lessons and Suggestions to Travelers and Cruise Ships in the Fight against COVID-19 A Text-Mining-Based Patent Network: Analytical Tool for High-Technology Trend Health Protection Guideline of Passenger Transport Stations and Transportation Facilities during COVID-19 Outbreak Health Protection Guideline of Public Transport during the Novel Coronavirus Pneumonia(NCP) Outbreak Estimation of the Reproductive Number of Novel Coronavirus (COVID-19) and the Probable Outbreak Size on the Diamond Princess Cruise Ship: A Data-Driven Analysis Transmission of Respiratory Viruses When Using Public Ground Transport: A Rapid Review to Inform Public Health Recommendations during the COVID-19 Pandemic Correlation between Travellers Departing from