Leveraging Multi-Agent Reinforcement Learning for a safe personalized escape route

Date: November 07, 2023

An effective response is paramount in the event of unforeseen calamities such as natural disasters or attacks. EXTRACT seeks to help prepare effective responses in the case of emergency situations by extracting extreme volumes of data to inform actionable knowledge. The project is developing a platform that will generate personalized escape routes for the city of Venice to guide inhabitants and visitors to safety. At the heart of this initiative lies an intricate interplay of a real-time Urban Digital Twin (UDT), a specialized simulator, and a Reinforcement Learning (RL) solution.

The UDT is a cutting-edge digital representation of the city, meticulously tracking and reflecting its real-time state. Its primary role is to provide a comprehensive and up-to-date digital snapshot of the city, helping authorities grasp the on-ground situation instantaneously.

A dedicated simulator complements the UDT and is an essential tool that replicates various potential disaster scenarios, and more specifically, the different behavioural patterns of people in response to them. Its primary function is to generate data-rich scenarios that simulate how different catastrophes could unfold, facilitating the training of the RL models. This vast amount of simulated data ensures that the RL system is prepared for a wide spectrum of possible events.

Our central focus in this use case is the Reinforcement Learning model. The core of the RL system is grounded in a mapped segment of Venice, represented through matrices indicating civilian presence and nodal points delineating potential movement paths, including crucial safety zones where people could be moved to. This RL system is trained with a straightforward yet vital objective: when a disaster strikes, it should offer optimal directions to individuals, prioritizing their well-being by maximizing its reward functions. In essence, the RL model is conditioned to value human life above all, and its directives will always lean towards maximizing survival rates valuing each life equally, without prejudice or discrimination.

More concretely, our model consists of Multi-Agent Reinforcement Learning (MARL), which is an extension of the standard reinforcement learning. Unlike traditional RL, which typically involves a single agent learning to navigate an environment, MARL encompasses a group of agents that simultaneously learn and make decisions. In the case of the PER use case, each individual in Venice functions as an autonomous agent with unique attributes (gender, age, physical condition, etc.), that is striving to reach safety during a crisis. The MARL framework facilitates a complex interplay where agents must not only learn from their own experiences but also consider the actions and safety of others in a shared environment. This collective approach enables our system to dynamically adapt to the evolving conditions of an emergency, optimizing escape routes in real-time while considering the city’s constantly updated digital twin state.

An essential aspect of the RL model’s operation is its training regimen. Most of its foundational training is conducted offline, well in advance of any potential disaster. The simulator plays a pivotal role here, providing the RL system with a diverse array of disaster scenarios, enabling it to formulate diverse response strategies and evaluating the consequences of the actions taken. Yet, adaptability is key: when a disaster becomes a reality, the system fine-tunes its strategies, responding with precision to the unique characteristics of the emergency.

A continuous line of communication is maintained between the RL system, the UDT, and the simulator. The UDT’s real-time updates on the city’s status are vital for the RL system’s operations. By integrating real-time geo-location data from individuals’ mobile devices, the UDT provides the RL system with a dynamic picture of civilian movements. This constant flow of data ensures the RL system’s instructions remain aligned with the evolving ground situation, optimizing the safety and efficiency of the escape routes.