Article Highlight | 31-Oct-2025

A graph reinforcement learning framework for real-time distributed multi-robot task allocation

Shanghai Jiao Tong University Journal Center

The Structure and Performance of the STFRL Model — **image:**
**The overall architecture of the STFRL policy network and the asynchronous decision-making process.**
view more

Credit: Dian Zhang , Peng Dong, Pai Peng, Yubo Dong.

Shanghai Jiao Tong University Research Team Proposes Novel Graph Reinforcement Learning for Real-Time Multi-Robot Rescue Missions

Effective task allocation has become a critical challenge for multi-robot systems operating in dynamic environments like search and rescue. Traditional methods, often based on static data and centralized control, frequently fail in real-world scenarios where conditions change rapidly. The lack of real-time, decentralized decision-making capabilities severely limits their practical application.

This work's highlights:

A novel Graph Reinforcement Learning (GRL) framework, STFRL, is proposed for real-time, distributed Multi-Robot Task Allocation (MRTA).

The model features a unique Spatial-Temporal Fusing Encoder for dynamic feature extraction and an attention-based decoder for efficient decision-making.

STFRL demonstrates superior performance in path efficiency and inference speed, outperforming state-of-the-art methods, and shows excellent scalability to larger, unseen problems.

Content Summary
A research team from the School of Aeronautics and Astronautics at Shanghai Jiao Tong University has proposed a groundbreaking Graph Reinforcement Learning (GRL) framework to address the complex problem of real-time distributed Multi-Robot Task Allocation (MRTA). This new framework, named Spatial-Temporal Fusing Reinforcement Learning (STFRL), is specifically designed for dynamic scenarios such as unmanned search and rescue, where survivors are moving and robots must make fast, independent decisions.

The core of the STFRL model is an encoder-decoder architecture. The encoder employs a novel Temporal-Spatial Fusing Encoder (TSFE) that uses parallel attention blocks to independently capture spatial relationships and temporal trends of moving targets, followed by a lightweight convolutional network (FuseCNN) to fuse these features effectively. The decoder then uses a multi-head attention mechanism to generate probability distributions for task selection, enabling distributed decision-making for each robot. The entire policy network is trained end-to-end using the REINFORCE algorithm with a baseline, ensuring stable and efficient learning.

Experimental results demonstrate that STFRL achieves shorter total path lengths and significantly faster inference times compared to classical methods like Ant Colony Optimization (ACO) and other learning-based approaches. Notably, the model exhibits strong generalization, performing effectively on problem scales much larger than those it was trained on.

Research Insights

I. Problem Formulation and the STFRL Framework
The MRTA problem in search and rescue is formulated as a combinatorial optimization on a dynamic graph, where both robots and moving survivors are nodes. The objective is to minimize the total time to rescue all survivors. The STFRL framework (Fig. 1) processes observational data over a time window. The encoder extracts and fuses spatio-temporal features, which the decoder then uses, along with the robot's current context, to select the next task.

II. Model Architecture and Performance Analysis
The TSFE encoder is key to handling dynamic environments. It uses Spatial-Temporal Attention (STA) blocks to process information, followed by a FuseCNN to integrate temporal features into a compact representation (Fig. 2). This design allows the model to anticipate target movements.

Extensive tests were conducted comparing STFRL against baselines, including the commercial solver Gurobi, heuristic ACO, a Dynamic Greedy algorithm, and another attention-based RL method (AM-RL). As shown in Table 3 and Fig. 3, STFRL consistently achieves lower path costs. Crucially, as illustrated in Fig. 4, its inference time is orders of magnitude faster than ACO and comparable to other learning methods, making it suitable for real-time applications.

III. Generalizability and Ablation Studies
The model demonstrates remarkable scalability. As shown in Table 5, a model trained on only 20 nodes performed effectively on test scenarios with 50 and even 100 nodes, outperforming the simple greedy algorithm. This indicates that STFRL learns robust allocation strategies rather than just memorizing patterns.

Ablation studies on the time window parameter (TW) confirmed the importance of the FuseCNN module. Models without it performed worse, and a time window that is too long (TW=10) can slow convergence compared to an optimal one (TW=5), as seen in the training curves (Fig. 5).

IV. Application Potential in Real-World Scenarios
The proposed STFRL framework, with its superior path efficiency, real-time inference speed, and robust scalability, presents a significant advancement for deploying multi-robot systems in critical dynamic missions. Its ability to enable a fleet of robots to make coordinated, decentralized decisions in complex and unpredictable environments like search and rescue highlights its immense practical application potential.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.