Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.

Pang, B., Zhang, M., Hu, X., Pham, D., Alam, S., Lulli, G. (2026). Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning. In AAMAS 2026 Conference Proceedings (pp.1928-1937).

Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning

Lulli, G
2026

Abstract

Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.
paper
Multi-agent system; planning under uncertainty; decentralized decision making; deep reinforcement learning; action masking
English
25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026) - 25-29 May 2026
2026
AAMAS 2026 Conference Proceedings
2026
1928
1937
https://ifaamas.org/Proceedings/aamas2026/forms/contents.htm
open
Pang, B., Zhang, M., Hu, X., Pham, D., Alam, S., Lulli, G. (2026). Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning. In AAMAS 2026 Conference Proceedings (pp.1928-1937).
File in questo prodotto:
File Dimensione Formato  
Pang et al-2026-AAMAS-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.53 MB
Formato Adobe PDF
1.53 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611324
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact