News Release

A reinforcement learning framework for guiding the agent to perform exploration based on clustering

Peer-Reviewed Publication

Higher Education Press

Fig. 1

image: 

Comparison between clustering-based bonus rewards with novelty alone (η = 1.0) and clustering-based bonus rewards (η = 0.5). Here, the collected states (blue dots) are clustered into 5 clusters and the agent is rewarded with 1 in the orange area and receives no reward in other areas.

view more 

Credit: Xiao MA, Shen-Yi ZHAO, Zhao-Heng YIN, Wu-Jun LI

Exploration strategy design is a challenging problem in reinforcement learning (RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover unexplored (novel) areas or high reward (quality) areas. However, most existing methods perform exploration by only utilizing the novelty of states.

To solve the problems, a research team led by Prof. Wu-Jun LI published their new research on 15 Apr 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed a novel reinforcement learning framework, clustered reinforcement learning (CRL), for efficient exploration in RL. This framework is evaluated in four continuous control tasks and six hard-exploration Atari-2600 games. Compared with the existing research results, the proposed method can effectively guide the agent to perform efficient exploration.

In the research, they analyze the limited effectiveness of existing exploration strategies, which only use the novelty of states to guide the agent to perform exploration. To use the novelty and quality of states for exploration simultaneously, they adopt clustering to divide the collected states into several clusters based on which a bonus reward reflecting both novelty and quality in the neighboring area (cluster) of the current state is given to the agent. Furthermore, their proposed method can be combined with existing exploration strategies to boost their performance, as the bonus rewards employed by these existing exploration strategies solely capture the novelty of states. The experiments are performed on four continuous control tasks and six hard-exploration Atari-2600 games. The experimental results show that the proposed method can perform better than the existing exploration strategies.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.