key: cord-0499491-g56ydww8
authors: Bazzan, Ana L. C.
title: Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand
date: 2022-03-18
journal: nan
DOI: nan
sha: e900b53e069b5b33edacc28456c5c93c1542ee55
doc_id: 499491
cord_uid: g56ydww8

As the demand for mobility in our society seems to increase, the various issues centered on urban mobility are among those that worry most city inhabitants in this planet. For instance, how to go from A to B in an efficient (but also less stressful) way? These questions and concerns have not changed even during the covid-19 pandemic; on the contrary, as the current stand, people who are avoiding public transportation are only contributing to an increase in the vehicular traffic. The are of intelligent transportation systems (ITS) aims at investigating how to employ information and communication technologies to problems related to transportation. This may mean monitoring and managing the infrastructure (e.g., traffic roads, traffic signals, etc.). However, currently, ITS is also targeting the management of demand. In this panorama, artificial intelligence plays an important role, especially with the advances in machine learning that translates in the use of computational vision, connected and autonomous vehicles, agent-based simulation, among others. In the present work, a survey of several works developed by our group are discussed in a holistic perspective, i.e., they cover not only the supply side (as commonly found in ITS works), but also the demand side, and, in an novel perspective, the integration of both.

Despite climate change issues, it seems that our society is turning more and more mobile. Covid-19 has its share of responsibility, as many people is avoiding public transit. While cycling may be an option in developed countries that already provide bike lanes and the regulation for a safety bike trip, this is not the case in developing countries, where the number of trips using private vehicles increased even more during the pandemic. For obvious reasons, car sharing arXiv:2204.03570v1 [cs.CY] 18 Mar 2022 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand is not an option under such circumstances. Even before the pandemic, there has been a growing demand for mobility.

For example, according to INRIX, congestion costs each American nearly 100 hours, which amounts to US$1,400 per year 1 . In 2017, costs were of the order of 300 billion dollars, an increase of 10 billion compared to 2016.

The direct and indirect impact of congestion in urban and interurban areas is immense and results in costs that can reach up to 1% of GNP. According to experts, these costs are of various types. Among them, two are prevailing: opportunity cost and monetary expenses imposed, not only on the driver(s), but also on society. Examples of these are expenses related to the consumption of fuel, as well as expenses regarding health, caused by various kinds of pollutants. Besides, the negative impact is also felt in the structure country's economic growth, health, and quality of life. "Solutions" such as tolls, license rotation etc., practiced currently in Brazil, are extremely unpopular. Citizens need to see the return of their sacrifice, whether monetary or not. Thus, there is a great demand for solutions that involve intelligence and information as a way to offer a counterpart to the population.

From a practical point of view, the question of how to efficiently move from A to B is a topic that is on the agenda of most inhabitants of the cities of the planet. This can be seen by the number of applications to assist choosing a route to drive or plan the use of public transportation means. A way to mitigate traffic congestion is to make better use of the existing infrastructure. Fortunately, scientific and technological advances today allow us to be optimistic about this task.

On the scientific side, recent advances in artificial intelligence (AI) research facilitate optimizing the use of the existing infrastructure. This ranges from smarter traffic control to services that not only indicate less congested routes to the users of the transportation system, but they do so in order to balance the use of the road network as a whole. Such agenda also involves: microscopic, agent-based simulation; computer vision (the basis not only for monitoring roads, but also to turn autonomous vehicles a reality); deep neural networks; and other techniques that emerged from AI.

On the technological side, AI has been associated with the use of solutions and products provided by the recent advances in the area of data networks, Internet and, more specifically, IoT (Internet of Things). The latter allows sensors to be embedded both in vehicles and in the road infrastructure, thus forming the basis for connected and autonomous vehicles.

It is in this context that the agenda around smart cities emerges, where one focus is on smart urban mobility -as for instance, the rational use of different means of transport, integrating them and adapting them to demand.

In more general terms, several researchers point to a scenario where the Internet will be prevalent in vehicles and replace, at least in part, the Internet as we know today. For example, in "Reinventing the Automobile", Mitchell and colleagues [1] claim that the so-called mobility internet will enable vehicles to have the same abilities that the computers now have regarding the standard Internet: exchange a huge amount of geo-referenced information, in real time, which will allow integrating vehicles to the IoT. This will potentially influence the way one manages and optimizes travel on a road network. However, many of these services have conflicting objectives when studied at the level of component of the system. For example, it is known that a naive diffusion of information (e.g., same information for most of the drivers) can have negative consequences [2, 3] . Further, even an intelligent traffic signal management has to be carefully designed, as the performance of each signal is strongly linked to adjacent ones. 1 https://inrix.com/press-releases/2019-traffic-scorecard-us/ 2 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand Although the scenario imagined by Mitchell and colleagues is not yet deployed, it is already being employed in prototypes of various magnitudes in research labs, such as ours. In fact, for the last two decades, the author has proposed, developed, and applied AI techniques to various problems around the agenda of urban mobility. One expectation is to add up to developing public policies that lead to more smart cities. Quoting Martin Wachs, "Mobility and increased access to transportation are two of the most important global forces for the alleviation of poverty." [4] .

As an example (others will be presented in Section 3), our laboratory has developed an open source traffic simulation infrastructure. Its advantage becomes evident when one thinks that traffic is a complex and dynamic system. As such, the best way to investigate the emergence of patterns is by using simulation tools. Indeed, there is an urgency for acting on traffic systems due to: (i) climate change and need to reduced carbon emissions and (ii) the pandemic caused by the Sars-CoV-2 virus. If the adoption of measures for reducing carbon emissions were already in the agenda, the pandemic shed light on the need for new policies. For instance, Anne Hidalgo (mayor of Paris) proposed an important change in the city by blocking private vehicular traffic in the Rue de Rivoli, allowing only bicycle traffic in three lanes, the fourth being reserved for buses and taxis. The mayor of New York is focusing on outdoor activities, temporarily excluding vehicular traffic over 110 kilometers of streets, and allowing bicycles and outdoor tables for restaurants.

These examples just shed light on the need to optimize existing resources in the city.

Thus, the purpose of this text is to discuss the various works carried out by the author. Such works aim both at optimizing the supply (for example, with intelligent control of traffic signals), and at distributing the demand (with non-trivial dissemination of information and recommendations for travelers).

The rest of this text focus on AI's contribution and how it is likely to be even more prominent, especially considering large volumes of data collected from personal mobile devices and/or embedded sensors. The vision is that there will be a truly intelligent system, with individuals, traffic signals and vehicles connected and working together. In this view, traffic signals receive information about the state of the traffic at network level, in order to assess how recurrent and non-recurrent events may influence the policies regarding signal timings. In addition, there will be an exchange of information among connected and autonomous vehicles, improving their distribution in the transport network. Finally, from the point of view of the system as a whole, the current research seeks a multiobjective reinforcement approach approach, where it aims not only at reducing travel times, but also reducing emissions, opportunity costs, and the effects of negative externalities.

The problem of how to move from A to B efficiently seems increasingly complex and is among the main concerns of a typical urban citizen. How did we get to this scenario? Putting it simply, whenever the demand exceeds the supply, congestion arises. For the sake of illustration, we can think of the following decision scheme: In the past, when there were few users in the road network, decisions were easily made and implemented, resulting in rare or no congestion.

One can say that the (traffic) system had a low coupling. From the second half of the twentieth century this situation has changed with an increase in the demand side, and, to a lesser extent, in the supply side. With such increase in complexity, it is expected that users no longer know the entire road network. This way, the system's coupling has increased significantly, causing many decisions to affect the decisions of others. Hence, methods of traffic control have 3 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand emerged. With time, though, their efficiency seems to be reducing. This is the phase we are currently experiencing.

New technologies aim at returning the system to the situation when users have a higher level of information.

This agenda is manifold and multidisciplinary in its nature. Thus, this section briefly presented some fundamental concepts that underly the works discussed in Section 3. More details on concepts about transport systems and traffic simulation can be found in [5, 6, 7, 8] .

Sections 2.1 and 2.2 address issues directly related to demand: how go from A to B, which involves urban mobility, intelligent transportation, and the problem of how to allocate trips to the existing infrastructure. Following, sections 2.3 to 2.6 discuss simulation techniques (necessary to study the effects of the use of certain control measures), as well as as concepts linked to AI and the use of new technologies. In turn, sections 2.7 and 2.8 discusses the state-of-knowledge in the areas of traffic signal control and guided navigation, respectively. Section 2.9 presents a summary of the open questions.

Transportation and traffic experts have long been working with computational tools for demand estimation, as well as for optimizing the use of the infrastructure. The ITS (Intelligent Transportation Systems) area has a multidisciplinary character and arose precisely for, among others objectives, encourage the use of new technologies. Among these, in the last years, AI has influenced the ITS area, by increasing the performance of optimization and control processes.

The goal of ITS is to develop correct, safe, scalable, persistent, and ubiquitous control systems. However, traffic control alone cannot solve the aforementioned problems. Control is just one facet of ITS; there are others providing up-to-date and reliable information to the various users of these systems, in a human-in-the-loop manner.

As aforementioned, ITS involves the application of modern technologies from the information and communication technology area. It can be said that ITS comprises two major areas: ATMS (advanced travel management systems) and ATIS (advanced traveler information system). While the first refers to the infrastructure and engineering side (supply), the second is directly related to the user of the transportation system (demand).

An ATMS aims to safely and efficiently manage technologies linked to control devices, as well as to traffic monitoring, communication, and control devices.

An ATIS aims at providing information to drivers and other users of the transportation system. This information is generated mostly by an ATMS, eventually handled by the engineering team, and then transmitted to users, by diverse means: from radio broadcast to private services developed for users who subscribe such a service or have dedicated, embedded devices in their vehicles. These may include a proper route guidance or, most commonly, just a panorama of the status of the traffic level on the network.

These technologies can potentially help mitigating the effects of the growing demand for mobility. However, there is a well-known feedback phenomenon: the decisions of the travelers generate a demand, which translate into route choices, which in turn create a traffic pattern. Travelers may then review their decisions (or, directly, their choices of routes) and the cycle is repeated (see Figure 1 ). This is a complex process that requires that any measure is first tested in a simulation environment. For this, there are classic simulation methods for traffic engineering that are able to 4 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand related to an ITS, such as ATIS, are not trivial to simulate, as they involve complex human behaviors.

In a transportation network, travellers' trips are distributed among a set of origin-destination (OD) pairs. Each of these pairs is associated with several paths or routes that connect its respective origin and destination. The traffic assignment problem (TAP) consists of assigning trips to such routes in an optimal way, given restrictions of capacity and others.

In general, each user knows the best route between two points, under the assumption that the links in the network are not congested. However, this is not realistic. In situations such as peak hours, the traffic pattern changes and routes that were not optimal before may become attractive alternatives. Commuters, who are familiar with the conditions of the network, tend to perform an individual optimization process, based on their own experience. In a situation in which each traveller has found the route that has the shortest travel time, none has an incentive to change its choice.

This state is called user (or Nash) equilibrium, as formulated by Wardrop [9] : no user can improve his performance by changing routes, which means that all routes have equal costs (Wardrop's First Principle) . This state is maintained if neither the demand nor the network changes.

Although it appears to be an interesting result, two questions are relevant. First, variations in demand conditions, special events (sports, cultural), or weather can change traffic conditions. The equilibrium state can be dissolved, leading, for example to highly congested arterials, while other parts of the network are underused. It is in this circumstance that an ATIS can be used to improve the efficiency at network level, via direct or indirect recommendation of alternative routes.

A second problem is that the user equilibrium (henceforth UE), as formulated by Wardrop's First Principle, is not necessarily efficient from the point of view of the system as a whole because, in general, it leads to a greater sum of the travel time. This stems from the fact that a myopic or greedy route choice can cause an increase in the travel time and the consequent degrade in performance of the network as a whole. This way, an ATIS can be used, so that the traffic manager can align the optimum of the system (the desired situation from the point of view of the system) with the UE (a not necessarily efficient situation). In a more recent view, this problem has been formulated as a congestion game, thus using game theory techniques [10] .

5 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Transport systems are complex systems, whose behavior is defined by interactions between various types of entities.

As tests in the real-world are costly, time-consuming, or may pose safety issues, simulation tools are widely used.

In transportation, simulation models are usually classified according to the level of detail with which they represent In general these models are highly sensitive to the initial parameters, which can have an impact on the overall behavior of the system. On their turn, microscopic models are generally complex and costly to develop. In addition, they involve a much larger number of parameters and thus require more data. Yet, given the need to model individual behaviors, the trend is toward use of microscopic models, especially in tasks that involve simulating effects of traffic control measures and/or involve human behavior.

At the microscopic modeling level, there has been a growing interest in the use of AI techniques. One of the most interesting paradigms is that of agent-based simulation (ABS). This if due to the fact that, in ABS, it is possible to model an individual driver's decision-making, increasing the possibilities of modeling heterogeneity and individuality (for example for route recommendations).

This section briefly introduces concepts about distributed AI and multiagent systems; for details see [11, 12] .

Up to the 1970s, AI has focused on problem solving techniques involving a single entity (a robot, a vehicle, an expert, and so on). However, strictly speaking, none of these entities exists in isolation. Moreover, these individuals may have complex interrelationships and coupling. This means that, in real-world applications, it might be expensive to consider such complex systems in their entirety and/or use a centralized modeling approach. On the other hand, when decomposing a given problem, the interactions among the components must be carefully handled. This was the initial motivation that led to the arising of a new subarea of AI, initially called distributed artificial intelligence (DAI).

Later, the focus of DAI has turned to multiagent systems, a view of DAI in which the components are not necessarily collaborative or have a common goal.

A multiagent system is a system that consists of a number of agents that interact with each other. One of the motivations for development of multiagent systems is the possibility to solve problems that cannot be treated in a centralized way, due to issues such as limitation of computation and/or communication resources, performance, or privacy.

6 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Machine learning techniques have found more and more applications in transportation. In particular, reinforcement learning is one of the most used techniques because it allows different classes of agents (for example traffic signal controllers, vehicles) to adapt to the state of the traffic, by building a model that tells the agent what action to do in each observed state. In this way, the system designer does not need to provide the agent with models that require domain knowledge or training instances that are of difficult to obtain.

There are two major classes of the reinforcement learning techniques: model-based and independent of model. In the former, the agent has models that explain how the environment behaves (possibly under different conditions), as well as which reward to expect when selecting a given action in a given state. In the latter, the agent has to construct such models. In both cases, agents aim at maximizing their expected rewards.

In 

Traffic control techniques have existed for several decades and derive mainly from the areas of operations research and control. A classical approach is the synchronization of traffic signals, so that vehicles can cross an arterial in one direction, with a constant speed and without stops (the so-called "green wave"). Well known (generally commercial) softwares in this category are TRANSYT [13, 14] , SCOOT [15, 16] , SCATS [17] , and TUC [18] .

More recently, AI and multiagent systems techniques have been employed, especially in connection with reinforcement learning. In fact, the problem of traffic control can be addressed from the point of view of reinforcement learning.

In these cases, reinforcement learning is used by traffic signals to learn a policy that maps states (usually queues at intersections) to actions. Due to the number of works that employ reinforcement learning for traffic control, the reader is referred to surveys [19, 20, 21, 21, 22] .

It is worth mentioning that few studies employ reinforcement learning simultaneously at traffic signal control and for route choice, as our work (see Section 3.4). In fact, this integration, as obvious as it seems, has received little attention in the literature. In the work of [23] , drivers and traffic signals learn simultaneously. Traffic signal controllers receive specific information about drivers' routes (for example, the destination) to calculate the expected waiting time. This however, can be considered a strong assumption. In addition, the underlying model in [23] is not entirely microscopic.

7 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

The work of [24] does not use reinforcement learning, but, rather, a strategy based on back-pressure to integrate traffic signals and route choice.

Understanding how a driver select routes is key in an ATIS. Some milestones in this area are [25, 26, 27, 28] among others. However, in these, the travellers' response to such systems is not considered. This is only possible when one employs a microscopic, agent-based simulation, as described in Section 2.3.

As previously mentioned (Section 2.2), it is essential to consider both the global and individual costs. One way to do this is through tolling mechanisms devised to penalize some users, such as those using roads with higher traffic. In this sense, the authors in [29] use adaptive tolls to optimize the chosen routes, as the toll fees change. The authors focus on the optimum of the system, which can be achieved by imposing costs on drivers. Similarly, [30] deals with the TAP in a centralized perspective to find an assignment by imposing tolls on some parts of the network.

To deal with fluctuations in the flow of vehicles, the so-called congestion tolls can improve the efficiency of the network and a better traffic distribution [31, 32] .

However, tolls are unpopular and, in general, unfair. The work described in Section 3.3 aims at achieving similar results through the dissemination of information to users. There are not many works that consider AI in this context.

Neural networks were used by [33] in order to predict the route choices. However, the authors focus only on the impact of the messages and not on the impact of the traffic distribution and travel time. Neural networks were also used in [34] . The network parameters are determined in a preliminary training. The output of the neural network is the action to be selected by the agent: stay or modify the route. The work of [35] uses an ant colony optimization algorithm.

The difference is that, instead of using the pheromone to attract ants, it aims at repelling them. The approach proposed by [36] is also based on ant colony optimization, combined with traffic prediction. However, agents here also have centralized information.

This section provided a brief dive into key topics, as well as an overview on related work. First, the formulation and modeling relates to multiagent reinforcement learning (MARL), that is, where the presence of more than one agent makes the problem more complex [37, 38, 39, 40, 41] . The fact that there are several agents learning simultaneously in the same environment makes the problem inherently non-stationary, as several agents are making changes in the environment, while others try to learn. In addition, there are only a few works that consider co-learning between two classes of agents, such as traffic signals and drivers, an issue that is covered in Section 3.4.

A second issue is that reinforcement learning requires the deployment of sensors of various types, something that can be expensive. In this sense, our work (Section 3.3.4) calls for using connected vehicles and C2X as a way of collecting information, besides the use of traditional sensors already used in traffic engineering.

Finally, an important issue is the fact that, in the reinforcement learning literature, there are only a few studies that deal with the issue of learning and decision-making based on multiple objectives. In fact, the vast majority of works dealing with multiagent systems in general and with MARL in particular, consider that agents seek to formulate policies that relate to a single objective (for example, agents have objective functions that considers only travel time). However, most real-world problems involve more than one objective to be considered. This problem starts to get more attention [42, 43] . However, in many cases, the solution found is to use scalarization over the multiple objectives, creating a single objective function, which may not find all Pareto-efficient solutions. These questions are discussed in Section 3.5, where we present our ongoing work. Given that in the real world both traffic signal controllers and road users learn simultaneously, this leads to co-learning, which is much more challenging. Section 3.4 then discusses works that address this scenario.

Lastly, Section 3.5 reports work in progress, with emphasis for the topic of multiobjective reinforcement learning.

The map of the contents presented in this section -whose color scheme is consistent throughout the text -is shown in which pre-dates SUMO [44] . Whereas SUMO is based on car-following, which may be computationally demanding, the development of ITSUMO was centered on keeping computational costs low. Therefore, ITSUMO is based on a combination of an agent-based model with the Nagel and Schreckenberg cellular automaton model [45] . While the latter is computationally efficient because it is discrete in time and in space, the agent paradigm allows great 10 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand flexibility for modelling the behavior of the various entities of the system, from the traffic signal controller to the drivers, including their decision strategies. Further, while one of SUMO's strength is the demand modeling, ITSUMO has focused on traffic signal control.

ITSUMO was the basis that allowed the development of some of the works mentioned ahead, in which several innovative methods related to AI and reinforcement learning have been proposed, tested and applied to problems related to traffic control. Details can be found in [46, 47] .

state?

action: a reward: r

The classic approaches described in Section 2.7 has some disadvantages. In an attempt to address them, the following describes the main approaches proposed and developed by the author.

In [48] , the author proposed the first approach to formation of green waves where each traffic signal is modeled as an agent that learns. Each has pre-defined plans for synchronization / coordination with agents adjacent in different directions according to the traffic situation. This approach employed evolutionary game theory techniques, having the following benefits: agents can create subgroups of synchronization to form green waves and prioritize the flow in a given direction; there is no need for central control; and there is neither communication nor direct negotiation between

agents. The approach was tested in a arterial, obtaining better performance than a centralized approach classic.

In [49] an approach to dynamic formation of green waves was presented, an an attempt to address some disadvantages of both classic approaches and also those that are fully decentralized. This new approach is based on swarm intelligence. Each traffic signal behaves like an insect where the decision-making process is inspired by the task specialization process. It also uses the pheromone metaphor, i.e., it assumes that vehicles leave traces where they travel.

However, pheromone here is used to repel other vehicles, rather than to attract others.

The approach was tested with an arterial route located in the city of Porto Alegre. It has been shown that the system tends to remain stable and adapted to the traffic, without losing its ability to adapt to changes in the environment.

The approach described in [50] , unlike those in [48] and [49] , is based on explicit communication between agents, as well as in cooperative mediation. In this model, the problem of forming green waves was approached as a compromise between a coordination with only implicit communication ( [49] ), and a classic centralized solution (Section 2.7). An algorithm for distributed constraint optimization was used, where a mediator helps to assign values to decision vari-11 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand ables. The scenario used for testing extrapolates an arterial route, being a grid and therefore having a high number of constraints. In the experiments, two coordination directions were considered: horizontal and vertical. The mediationbased approach computes more appropriate and flexible coordination groups, as compared to groups found by classical methods.

When dealing with non-stationary environments, where vehicle flow is not constant, both model-independent reinforcement learning approaches and model-based ones (see Section 2.5) have drawbacks. Specifically, when the environment changes, they both need to relearn everything from scratch, since the policy calculated for a given environment are no longer valid if there is a change in vehicle flow dynamics. This causes reinforcement learning algorithms to experience performance drops. Further, these algorithms potentially have to relearn policies even for situations that have already been previously experienced.

As discussed before, model-based approaches for reinforcement learning assume the existence of a fixed number of models that supposedly describe the behavior of an environment. Since this assumption is not always realistic, an alternative is the incremental construction of models.

In the specific case of traffic signal control, a such method was proposed in [51] , where a traffic signal controller is able to automatically handle different flow dynamics. An optimal policy is associated with each model, i.e., there is a mapping between the traffic conditions and the corresponding traffic signal plan to be chosen. This way, experiments have shown that performance drops can be avoided as there is no need for constant relearning from scratch, thus opening the way for the use of model-based approaches in traffic signal optimization.

The presence of many signal controllers present a challenge for reinforcement learning methods. To deal with largescale road networks that may have a high number of learning controller agents, some approaches have been proposed by the author.

In [52] an approach has been proposed that considers that these agents are organized in groups, each under the supervision of a manager agent. This manager thus controls a group containing several Intersections, computing and recommending joint actions, as opposed to actions that are locally computed and selected by the agents at the intersections. These, in turn, try to balance the actions recommended by the manager with actions that are optimal from their myopic and local point of view.

In a similar direction, [53] proposes the use of a holonic multi-agent system to model a road network partitioned into regions (holons). The differential of this method was the extension of the Q-learning method [54] to the region level.

In [55] , a hierarchical multi-agent system is proposed, which includes two levels of traffic signals. In the first level, each signal is controlled by an agent. For the other levels, the traffic network is divided into a number of regions, each controlled by a region agent that controls the group. First level agents employ reinforcement learning to find the best policy, and then send their local information to the agent in charge of a region. Moreover, the local information is used to train a long short-term memory (LSTM) neural network for traffic status prediction. The agents in the above level 12 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand can control the traffic signals by finding the best joint policy using the predicted traffic information. Experimental results show the effectiveness of the proposed method in a traffic network including 16 intersections.

In all these works, a significant acceleration in the learning was achieved, which is to say that traffic signals adapt a lot more quickly to new flow conditions.

Development of several high-performance methods for adaptive and decentralized traffic control. This problem has received several approaches. In the case of the most recent research, the focus is on: dissemination of information in a smart way, understanding the effect of such information, as well as behavioral changes by the drivers, how to learn to choose routes, how to disseminate information in order to guarantee a certain level of system performance, and using vehicular communication.

To achieve these objectives, the author has proposed several methods, some pioneers in addressing the dissemination of information via mobile devices when the smartphone did not exist as we know today. Other methods involved: game theory, C2X communication, route choice via reinforcement learning, and effect of route recommendation on the alignment of the user and system optimum.

Section 2.8 discussed the benefits of using an ATIS. However, the dissemination of this type of technology brings with it the need to consider the human being, when dealing with the loop between traffic control and allocation of resources. This question has received less attention, mainly due to computational issue. With the increase in the computation power of processors, the advent of agent-based modeling, and also of several multidisciplinary projects, several attempts to model the problem of route choice were made, including those carried out by author. In particular, agent-based modeling tries to take into account the heterogeneity of such decisions; after all each agent has its own particular way of making decisions. To do this, it was necessary to develop models that can represent the behavior of the drivers, such as the use of mental state models using the BDI (Beliefs, Desires, and Intentions) logic. Components of these models can be: desires related to minimizing travel time, and beliefs about the status and cost of each route or part of the road network used by the agent. An application can be found in [56] .

13 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Understanding human beings' reasoning on route choice is an open research question. There are no precise models for this. In order to investigate this process, a useful tool is experimental game theory. While classical game theory provides several tools for modeling congestion games (Section 2.2), in experiments with human subjects it is possible to observe whether and how they deviate from the results of classical theory.

In a interdisciplinary project, experiments were designed where human subjects would iteratively choose between two routes. After each iteration, they received information about corresponding outcomes. The main objective was to study the effect of the dissemination of different types of information through mobile devices. It should be noted that this idea preceded the actual use of these devices -which would only come to the market some years later.

The data collected in these experiments were the basis for the formulation of heuristics for iterative route choice, published in [57, 58] , where a simple form of reinforcement learning simulated the choices that were indeed made by the human subjects. This work had important implications because, at that time, the typical means of disseminating traffic information was radio broadcast or variable message panels (on the road). It needs to be stressed that there were no explicit recommendation of routes. More recently, other means of information dissemination exist, such as Internet and geo-positioning, through services such as Waze and similars. These, in possession of the location of a significant number of users of the service, recommend a route to the user. One problem here is that the recommendation, when it is the same for all users, can lead to an over use of the route when too many users follow the recommendation.

Simulations of this type of situation appear in [59, 60, 57] .

These are known issues in game theory. In problems related to minority games as in [60] , it is known that in systems where each participant tries to optimize his performance in an individual, myopic and greedy way, the overall performance is suboptimal. In the case of transportation networks, assuming that vehicles try to avoid route A by opting for a route B, the latter will have drastic performance loss, and there may be fluctuation and deterioration for all participants, as in [2] . Specifically, this is a question that underlies the so-called Braess paradox, originally presented in [61] , which represents a counter intuitive phenomenon: in a road network, when a new route connecting two points is built, it is possible that there is an increase in the overall travel time, rather than a reduction. This happens because every decision by the drivers (based on their cost estimate) ignores the effects of other drivers' decisions. That is, the drivers, when trying to reduce their travel times individually, end up increasing the global travel time.

As shown in [60] , the use of reinforcement learning makes drivers adapt and learn how to distributed themselves among the available routes, thus improving the performance of the system as a whole.

In the works mentioned in the previous section, reinforcement learning was used in a simplified way, without considering factors such as change of route during the trip, , agents' regret for inefficient choices, simulation granularity (whether microscopic or macroscopic), and the search for optimal system performance.

These points were addressed by the author with methods such as [62] (use of learning automata); in [63] (modeling route choice as an stochastic game, where the agent can change his route at each intersection); in [64] (choice route considering the past regret); and in [65, 66, 67] . In the case of these last three works, the objective was to align the optimum of the system to the UE. This problem, as mentioned before and, in particular in the previous section, 14 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand is linked to the paradox of Braess, where adding resources to a system may degrade its performance. To mitigate this problem, [65] proposed the use of a distributed genetic algorithm. In the "islands" model, each vehicle can communicate with others and exchange solutions. In the case of [66] , a genetic algorithm exchanges information with a reinforcement learning process that runs at vehicle agents level. This synergy leads to more diverse solutions, which not only accelerate the convergence of learning, but also lead to more efficient solutions at the global system level.

Aligning the optimum of the system to the UE is achieved in [67] by means of drivers sharing real-time information by means of an app that is used by members of a social network.

These works have shown that it is possible to tackle route choice from a decentralized point of view, by means of using reinforcement learning. Our experience with such use has shown that the performance of a reinforcement learning method greatly varies according to the road network topology, demand distribution, as well as other factors.

Motivated by this, we have pursue the investigation of metrics that can characterize the traffic assignment problem according to the difficulty posed to the reinforcement learning. These metrics were, firstly, based simply on centrality of nodes and edges of the network [68, 69] . Later, we proposed measures that compute the coupling between shortest routes [70, 71] . Since none of these take into account the actual traffic assignment, a further measure was divised, based on the entropy of the distribution of routes, thus taking into account the actual demand in each route [72] .

A scheme that is becoming more popular is congestion toll, i.e., charging a resource by its current use as, for example, in Singapore where panels on some main roads indicate the price to be payed for its use. In order to study the effects of this scheme, in [73] an approach was proposed that employs reinforcement learning in two classes of agents. The first one is the infrastructure through road managers that learn how much to charge for each vehicle. More congested roads should impose a price that discourages its excessive use, so that traffic gets distributed over the network. On the other hand, driver agents also learn to use routes that balance their cost and the agent's travel time. In the proposed model, the road network is treated as a multiagent system, where autonomous vehicles or vehicles plus their drivers have their local perception expanded through C2X communication. Road managers perceive the flow of vehicles on the roads they manage and try to maximize this flow through setting tolls. Further, in this work, a fraction of the vehicle agents may act maliciously and try to reduce their travel times disseminating false information to keep other drivers away from their routes.

The experiments were carried out in a microscopic simulator, which replicates the behavior of drivers at various levels, especially the one related to route choice. It was shown that a small fleet of malicious agents is capable of harming other drivers. In short, this work anticipates scenarios that should occur in the near future.

Three other, more recent, works by the author also address communication, either among vehicles or between vehicles and infrastructure.

In [74] , connected vehicles can communicate in order to coordinate their route choices through an app. This app, having the information collected from vehicles that subscribe this service, recommends routes that leads the system close to its optimum, rather than to the UE.

Both [75] and [76] aim to investigate how to speed up the process of agent learning in terms of choosing routes using C2X communication. In the former, vehicles inform the road infrastructure (link managers) how much time it took to cross a link or a portion of the network. Managers aggregate such information and distribute it to vehicles that are yet to decide which link to use to reach their destinations. In the latter work, vehicles inform their travel times to an app that shares the rewards (travel times) obtained when a particular route was used.

In both works, it was shown that the learning task is achieved quicker through communication. Note that, contrary

to previous works such as [2] , these two most recent works are about sharing just local information. This way, such scheme avoids the aforementioned phenomenon where sharing the same information for all agents can lead to drop in performance.

Development and testing of various methods that model agents individually in order to investigate the effects of disseminating information for route choice, as well as aligning user and global optima. This is done through different schemes, including congestion charge and synergy between machine learning techniques. of intelligent control of traffic signals and route choice respectively. However, in the real world, not only these two tasks do occur simultaneously but also are highly coupled. Clearly, the learning task carried out at traffic signal controllers affects the learning task of drivers and vice versa. Therefore, it is important to consider the simultaneous adaptation of the two classes of agents, i.e., co-learning.

In [77] the theoretical basis for using reinforcement learning in both classes of agents was proposed. However, the underlying simulation model used there was not fully microscopic. Thus, in [78, 79] the original proposal has been extended to include a microscopic simulation environment, which led to some challenges. First, the problem of having two classes of agents representing the supply and the demand adapting simultaneously makes the problem more complex computationally speaking, since several convergence guarantees are lost. Second, the learning tasks become more complex because the actions of the agents are highly coupled. A further challenge is the fact that the natures of these learning tasks are different; for instance, a driver's goal is to minimize his individual travel time, while the goal of a traffic signal controller is to reduce queues locally.

Therefore, in [78, 79] an approach was proposed where the drivers' learning task is based on repeated games; for the traffic signals, the learning task is based on stochastic games. While the former is based on episodes that are not synchronous, traffic signals have an infinite learning horizon, i.e., there are no proper episodes. Finally, another challenge regards the microscopic simulation. The experiments were carried out using a microscopic simulator and a 16 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand grid network with 32 traffic signals. It was shown that co-learning reduced travel times and queues of vehicles at the intersections.

Development of a non-trivial mechanism that integrates reinforcement learning being performed in two classes of agents, with distinct characteristics and objectives.

As mentioned in Section 2.9, in the real world, it is rarely the case that one deals with only one objective to be optimized (such as travel time or queue length at an intersection). Typically, an optimization processes involve several variables. Therefore, there is a clear gap in the literature. The work by [80] uses a traffic signal control (single intersection) scenario to illustrate the use of a multiobjective approach, where objectives are to maximize the traffic flow and minimize waiting times. However, the paper does not give all the details, since such scenario is used together with others as illustration.

In short, it seems that there are just few works capable of dealing with multiobjective reinforcement learning in urban mobility scenarios. In the case of vehicles, the existing methods only address drivers who seek to minimize their travel times, disregarding other factors such as toll, emissions, or battery consumption. In the case of traffic signal controllers, it is clear that there are several measures that could be optimized simultaneously, even if some of them are correlated. This is where our ongoing work is currently focusing. Figure 3 illustrates this vision with a scheme, in which most of the elements have the ability to communicate, either through C2X communication, or through the cloud, or another protocol.

With regard to traffic signal controllers (red circle in the figure), in addition to managing traffic at local level, they must: (i) communicate in order to interact cooperatively, and (ii) make decisions that include multiple, sometimes

conflicting objectives. The figure shows some of these objectives (central circle, gray): minimize queues, reduce waiting times at the intersection, reduce travel times (not only locally but throughout the entire trip), give priority to emergency vehicles and public transportation, and allocate green times among the phases in a fair way.

In the case of vehicles (blue circle in the figure), these being equipped with communication devices (all of them or only partially), it is necessary to formulate communication protocols that are effective and efficient.

Finally, as already mentioned, there is a gap with respect to reinforcement learning when agents must make decisions based on multiple objectives. In the case of vehicles, in the real world, their drivers aim not only at minimizing their travel times (as commonly assumed), but also other goals (green circle in the figure): minimize costs involved with tolls, increase battery life (if electric vehicles) by selecting routes that can regenerate battery capacity, etc. Further, it is necessary to consider heterogeneous agents with respect to their preferences and preferences, which is an advantage of an agent-based modeling.

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand In a work in progress, algorithms for multiobjective reinforcement learning are being proposed, which address part of the aforementioned agenda. Such algorithms are not trivial since they do not just formulate the various objectives as a (linear or non-linear) combination of the objectives. This is important in order to guarantee that all solutions at the Pareto front will be found.

Specifically, classical algorithms for single-agent, single-objective reinforcement learning such as Q-learning [54] and UCB [81] are being extended and tested in route choice scenarios, where the objectives are not only to minimize travel time, but also the toll expenditure. Preliminary results show a good performance of the new algorithms, as they achieve the same result as centralized algorithms such as [82] ), while being faster.

Extension of reinforcement learning algorithms for multi-objective scenarios, especially for route choice.

The agenda around improving urban mobility is a priority of municipal authorities. In this paper, some among the various aspects of this agenda were discussed. It has been shown that it is possible to improve the overall efficiency 18 Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand of the transportation system through artificial intelligence methods and technologies. This refers both to traffic control measures, as well as measures related to the dissemination of information to drivers and also to connected and autonomous vehicles.

Specifically, this work has contributed and continues to contribute to the development of new techniques that were described here. The highlights are:

1. New techniques have been developed involving MARL. This allows agents to adapt in a decentralized way, that is, without central control. Recall that centralization can be costly both computationally and in terms of infrastructure, since it may require cabling and/or communication with a central entity. The fact that learning involves dozens or even even thousands of agents (if traffic signals or drivers are involved, respectively), makes the problem more complex due to the non-stationary nature of the learning tasks. To mitigate these problems, the author developed hierarchical methods or synergies with other methods have led to a significant increase in the speed of learning, something that is a critical factor in dynamic environments, where agents must adapt as quick as possible.

Methods have been developed and tested that simulate and anticipate effects of policies for the dissemination of information to travelers. These methods showed the feasibility of obtaining better performance, in terms of the system as a whole.

3. Methods based on reinforcement learning were able to reduce queues (in the case of traffic signal controllers) and travel times (vehicles).

Reinventing the Automobile

Decision dynamics in a traffic scenario

The impact of real time information in a two route scenario using agent based simulation

Transportation policy, poverty, and sustainability: History and future

Sistemas inteligentes de transporte e tráfego: uma abordagem de tecnologia da informação

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Introduction to Intelligent Systems in Traffic and Transportation

A review on agent-based technology for traffic and transportation

Agent-based modeling and simulation

Some theoretical aspects of road traffic research

How bad is selfish routing?

An Introduction to MultiAgent Systems

IA multiagente: Mais inteligência, mais desafios

TRANSYT: A traffic network study tool

TRANSYT-7F User's Manual. Transportation Research Center

SCOOT -a traffic responsive method of coordinating signals. TRRL Lab

Optimizing networks of traffic signals in real time -the SCOOT method

The Sydney coordinate adaptive traffic system -principles, methodology, algorithms

Extensions and new applications of the traffic signal control strategy TUC

Opportunities for multiagent systems and multiagent reinforcement learning in traffic control

An experimental review of reinforcement learning algorithms for adaptive traffic signal control

Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation

A survey on reinforcement learning models and algorithms for traffic signal control

Multi-agent reinforcement learning for traffic light control

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Integrated signal control and route guidance based on back-pressure principles

Toward the design of intelligent traveller information systems

Dynamic network models and driver information systems

The influence of route guidance advice on route choice in urban networks

Comparative assessment of origin-based and en route real-time information under alternative user behavior rules

Real-time adaptive tolling scheme for optimized social welfare in traffic networks

A biased random-key genetic algorithm for road congestion minimization

Departure time and route choice for the morning commute. Transportation Research

The informational impacts of congestion tolls upon route traffic demands

Intelligent Transport Systems: Neural Agent (Neugent) Models of Driver Behaviour

A dynamic behavioural traffic assignment model with strategic agents

An inverted ant colony optimization approach to traffic

A decentralized approach for anticipatory vehicle routing using delegate multiagent systems

Coordinated reinforcement learning

Multiagent reinforcement learning: Theoretical framework and an algorithm

Coordinated learning in multiagent MDPs with infinite state-space

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Multiagent learning is not the answer. It is the question

Efficient multi-agent reinforcement learning through automated supervision (extended abstract)

Multi-objective multi-agent decision making: a utility-based analysis and survey

A practical guide to multi-objective reinforcement learning and planning

Microscopic traffic simulation using sumo

A cellular automaton model for freeway traffic

ITSUMO: an agent-based simulator for ITS applications

ITSUMO: an intelligent transportation system for urban mobility

A distributed approach for coordination of traffic signal agents

A swarm-based approach for selection of signal plans in urban scenarios

Using cooperative mediation to coordinate traffic lights: a case study

Dealing with non-stationary environments using context detection

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Learning in groups of traffic signals

Holonic multi-agent system for traffic signals control

Q-learning

Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory

Agents in traffic modelling -from reactive to social behavior

Extended version appeared in Proc. of the U.K. Special Interest Group on Multi-Agent Systems (UKMAS)

Route decision behaviour in a commuting scenario

Simulation studies on adaptative route decision and the influence of information on commuter scenarios

Learning to coordinate in a network of social drivers: The role of information

Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models

Über ein Paradoxon aus der Verkehrsplanung

An improved learning automata approach for the route choice problem

A multiagent reinforcement learning approach to en-route trip building

Analysing the impact of travel information for minimising the regret of route choice

Traffic optimization on islands

Aligning individual and collective welfare in complex socio-technical systems by combining metaheuristics and reinforcement learning

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

A multiagent solution to overcome selfish routing in transportation networks

Identification of central points in road networks using betweenness centrality combined with traffic demand

Analysis of traffic behavior in regular grid and real world networks

Using topological statistics to bias and accelerate route choice: preliminary findings in synthetic and real-world road networks

Extending a coupling metric for characterization of traffic networks: an application to the route choice problem

How hard is for agents to learn the user equilibrium? characterizing traffic networks by means of entropy

A multiagent based road pricing approach for urban traffic management

Traffic optimization using a coordinated route updating mechanism

Sharing diverse information gets driver agents to learn faster: an application in en route trip building

Experience sharing in a traffic scenario

Adapt or not to adapt -consequences of adapting driver and traffic light agents

Combining adaptation at supply and demand levels in microscopic simulation: a multiagent learning approach

Co-adaptive reinforcement learning in microscopic traffic systems

Pareto-dqn: Approximating the pareto front in complex multi-objective decision problems

Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand

Finite-time analysis of the multiarmed bandit problem

Solving multi-objective traffic assignment

Ana Bazzan is partially supported by CNPq (grant 307215/2017-2). This work was partially supported by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil) -Finance Code 001.