Wubing Chen - Authorea

Effective management of multi-intersection traffic signal control (MTSC) is vital for intelligent transportation systems. Multi-agent reinforcement learning (MARL) has shown promise in achieving MTSC. However, existing MARL-based MTSC algorithms have primarily focused on capturing the spatial relationship between multi-intersection traffic signals but have overlooking the importance of the temporally stable traffic pattern. This pattern refers to the fixed positions and relatively stable traffic flow between intersections over short periods in real-world MTSC scenarios, which indicates that the learned spatial relationships between traffic signals should co- evolve over time. To this end, we propose a novel algorithm called Coevolutionary Multi-Agent Reinforcement Learning (Co- evoMARL). CoevoMARL employs a graph neural network to capture the complex spatial interaction network among traffic signals. Furthermore, we propose a relationship-driven progres- sive LSTM (RDP-LSTM) that dynamically evolves the learned spatial interaction network over time by leveraging insights from the temporally stable traffic pattern. To accelerate convergence, we also propose the mutual information reward optimization (MIRO) technique, which strengthens the correlation between policy learning and high-performance samples by using a mutual information-based intrinsic reward. Experimental results on both synthetic and real-world datasets demonstrate the superiority of CoevoMARL over existing MTSC algorithms, providing valuable insights into incorporating the temporally stable traffic pattern.