key: cord-302648-16aq6ai4 authors: Iovanovici, Alexandru; Avramoni, Dacian; Prodan, Lucian title: A dataset of urban traffic flow for 13 Romanian cities amid lockdown and after ease of COVID19 related restrictions date: 2020-09-17 journal: Data Brief DOI: 10.1016/j.dib.2020.106318 sha: doc_id: 302648 cord_uid: 16aq6ai4 This dataset comprises street-level traces of traffic flow as reported by Here Maps™ for 13 cities of Romania from 15th. of May 2020 and until 5th. of June 2020. This covers the time two days before lifting of the mobility restrictions imposed by the COVID19 nation-wide State of Emergency and until four days after the second wave of relaxation, announced for 1st. of June 2020. Data were sampled at a 15-minute interval, consistent with the Here API update time. The data are annotated with relevant political decisions and religious events which might influence the traffic flow. Considering the relative scarcity of real-life traffic data, one can use this data set for micro-simulation during development and validation of Intelligent Transportation Solutions (ITS) algorithms while another facet would be in the area of social and political sciences when discussing the effectiveness and impact of statewide restriction during the COVID19 pandemic. Transportation Traffic flow demand data Table Figure CVS data files How data were acquired Software application (available in the dataset, as part of the article), developed using Python language, using Here API for gathering raw data regarding live traffic and a set of custom developed scripts for cleaning the data (detailed below) and plotting visual representations of the instantaneous traffic flow. Hand annotation was used for providing supplementary data and information for specific events regarding the national policy against COVID19 and also for description of the cities The datasets covers the period from 15th. of May 2020 and until 5th. of June 2020, with a sampling period of 15 minutes, using the standard Here Maps Traffic API There are 3 software scripts used: one is responsible for job automation and runs the grabbing script at a 15 minutes interval, which subsequently launches the API requests for each of the cities and writes the XML files with raw data on drive. Later the third script iterates over the XML files and extracts the road information data and traffic flow data, discarding the geometrical properties of the road. • There is a scarcity of data available regarding traffic flow and road use demand. Even if larger cities in highly developed nations have near real-time data from ITS systems, in other cases those data are practically impossible to gather with good quality and at decent costs. This data set covers a broad range of demands and loads, form almost empty roads (during COVID19 restrictions) and up to full traffic (after second set of relaxation rules); • This dataset is directly useful for practitioners in the field of ITS systems design, for assessing transportation capacity and developing algorithms and policies for congestion prediction and mitigation and also for sociologists doing research regarding the impact of COVID19 restrictions and the reaction of the public to the restrictions and gradual lifting of the restrictions. • The main usage of the data, in the field of ITS, is to provide real-life data from a variety of Romanian cities (ranging from small to large in population, area and road network size) useful for training machine learning algorithms for prediction of congestion and for simulation of the impact of traffic incidents over the traffic flow. Practitioners in the field of social sciences can benefit from the data in the analysis of specific reactions of the population to COVID19 restrictions. • Descriptive statistics could be used for simple analysis of data and detection of anomalies in the traffic flow which in turn can be used for inferring hidden events such as an incident on a minor street which feeds to a major artery. • Machine learning methods and tools can be used for identifying signature-features of traffic flow which predict congestion, with high spatial resolution. • Qualitative analysis of the impact of COVID19 transportation restrictions can be made, with ramification of both the economic sector and epidemiological one In the field of Transportation there is a distinct subfield of Intelligent Transportation Systems (ITS) characterized by the usage of methods and tools of computation, mathematics and control theory for deriving means of maximizing the usability of the existing infrastructure (transportation capacity and quality) or the decision to develop new infrastructure [3] . One of the current important topics in this field is related to congestion prediction [1, 11] , while a lot of the approaches rely on the means and methods of machine learning to leverage the value of the past (historic data) in order to predict the future (when congestion will arise) [2] . Another subject of interest, directly connected to the problem of congestion is the one related to the traffic incident management [1, 4] . A lot of the rules, policies and the systems are designed and work well in stable nominal conditions (when all the participants obey the traffic laws and everything works as intended). Analysis done over the root cause of major gridlocks showed that the complex dynamics involved with road traffic allows minor incidents (i.e. a car not giving way when changing lanes) to become major sources of trouble spanning dozens of minutes a few blocks (hundreds of meters) radius [10] . The resolution of both problems can be addressed in a virtual environment using what is called traffic micro-simulation [5] . When fed high quality data and with a good description of the existing infrastructure, current software tools for microsimulation are capable of mirroring actual traffic conditions over a time-span ranging from dozens of minutes to hours [1] . Topology of the road infrastructure and the placement of road signs and traffic signaling plans are core components of the simulation scenarios and can be obtained either from local authorities or from open data ( [2, 12] ) and an initial leg-work (for collecting data regarding signaling plans). The missing component is represented by the actual conditions on the road, which can be obtained by the existing infrastructure (car counting loops and equipment) -which is costly to deploy and provide low spatial resolution -or by deploying human observers for making assessments -which is costly and provides low temporal resolution [1, 2] . Over the last decades, with the development of mobile applications targeted at assisting drivers on the road, a new set of sources has appeared in the form of traces form mobile devices of the drivers (or passengers), but still most of them are not providing means of accessing historical data [13] . Major players in the field provide current data inside their applications and most of the time historic data are provided in an aggregate manner, which suffice for the average user, but are not of good enough quality for the practitioners in the field of ITS [7, 9] . We selected Here Maps (™) [7] for gathering data because they provide data access via API, allowing scripted automation, and the collection of the data in an automated manner is allowed by their Terms and Conditions. Data provided by the API is always for current conditions but can be inferred by the Here Maps engine when the actual number of participants to the traffic is low, expressed by Confidence Level (see below) [7] . We have chosen a sampling frequency of 4 times per hour (once every 15 minutes) based on empirical observations regarding when data changes and limitations in the software license we used. A smaller than 5 minutes sampling period is not useful because the Here Traffic API does not update the data that often. Each of the cities was defined through a rectangular bounding box with geo-coordinates described in Table 1 . The time span covered by this dataset ranges from 15th. of May 2020 and until 5th. of June, during the mobility restrictions imposed by Romanian authorities for containing the COVID19 pandemic and provides the opportunity for capturing a diverse and broad spectrum of scenarios in terms of traffic demand data. The cities for which we provide traffic data, also represent a diverse set in terms of demographics, urban development and geographical placement in Romania. A detailed description is provided in Table 1 . su, ty. The naming of the fields follows the notation presented in Table 2 . 2. Raw XML files as provided by the Here Maps API web service. Each file corresponds to a unique city and a specific moment in time. These are stored into the ./xml.zip archive and follow the naming structure _-- For a more depth and complete analysis , taking into account the context of the data (the transportation and traffic restrictions imposed on the national level by the SARS-CoV-2/COVID19 pandemic) we present in Table 3 the most important events with impact over the traffic flow. These data can be augmented by the user of the dataset with supplementary data (such as weather), based on their own avenue of investigation. There is no need for any documents when travelling inside national borders. The data collection is done by a dedicated Python script, adapted from the one available at [9] . Using the Here Maps Traffic Flow API we query the web service for the data regarding each of the cities, defined by the bounding boxes presented in Table 1 . For each of the queries, we get a list of items as an XML formatted response. The structure of each item, as exemplified by the record from Figure 3 , consists of a field which describes the static structural characteristics of the road, a list of shapes describing the road segments the geographical coordinates of the start and stop points of each segment, together with its functional class and a field with traffic flow related information. The Python script, with detailed comments is available in the data repository under the name grab.py. The Here Maps API keys were removed and should be replaced by the user's keys. The next level out automation is provided by the UNIX cron tool (but can be also implemented by Microsoft Windows Scheduled Tasks) and consists of a shell script for calling the grabbing script for each of the cities which need to be monitored. This file is available in the repository under the name command-line argument is the label of the city (city name) and the third argument is represented by the base folder path for the output (where the post-processed CSV is to be stored). For each of the XML files found into the basePath, the script is extracting the metadata encoded into the file-name (city, date and time) and iterates over the items extracting the relevant information for traffic flow. The structure of the data is described in the Data Description section. For defensive programming reasons checking of None type is done and default values are stored whenever the actual data are corrupted or missing (i.e. "DE" field representing the street/road name is missing and is replaced by "N/A"). For each folder (set of records about a specific city) the parse.py produces a concatenated .csv file with all the records available, one per line. These files, for each of the cities, represent the core element of this dataset and are provided distinctly per city, or as an archive with all the cities and all the records, in the data repository. The data regarding the shapes of the roads are discarded in the CSV files but are available in the raw XML files, stored under the ./raw path in the dataset. This work did not include any human subjects nor animal experiments Traffic flow prediction for road transportation networks with limited traffic data Traffic flow prediction with big data: a deep learning approach Traffic and emissions impact of congestion charging in the central Beijing urban area: A simulation analysis SUMO-simulation of urban mobility: an overview Ptv vissim 7 user manual Traffic flow in 12 Romanian cities during and around lifting of COVID19 restrictions Guide -Traffic API, Bounding-box Visualizing Real-Time Traffic Patterns Using HERE Traffic Api Traffic flow dynamics Dataset on the road traffic noise measurements in the municipality of Thessaloniki Multi-source dataset for urban computing in a Smart City Extracting traffic safety knowledge from historical accident data This work was supported by research grant GNaC2018 -ARUT, no. 1349/01.02.2019, financed by Politehnica University of Timisoara. The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.