News Release

Breaking through the dilemma of reinforcement learning by microwave photonics

A brand-new photonic accelerator for speeding up reinforcement learning is proposed and demonstrated based on a nonlinear optoelectronic oscillator

Peer-Reviewed Publication

Chinese Society for Optical Engineering

Decision-making machine based on microwave photonics

image: 

Schematic diagram of the NOEO-based photonic accelerator. (a) Experimental setup of the NOEO. (b) Evolution of the temporal sequences generated by the NOEO with an increasing net gain β. (c) MAB problem accelerated by the photonic accelerator. (d) TTT game simulated by the photonic accelerator.

view more 

Credit: University of Electronic Science and Technology of China

Nowadays, the attainment of high-performance decision-making machine is an urgent requirement for accelerating reinforcement learning, which is an indispensable branch of artificial intelligence (AI). In general, decision-making machines are realized based on integrated circuits. As the Moore’s law is coming to an end, the operation speed and the energy consumption of the advanced integrated circuits are gradually unable to meet the ever-increasing scale of reinforcement learning. Hence, it is urgent to find new solutions.

Recently, researchers have addressed these problems by designing a photonic accelerator based on a nonlinear optoelectronic oscillator (NOEO). The NOEO-based photonic accelerator was placed at the front end of a decision-making machine, assisting in solving reinforcement learning problems more efficiently and accurately. The photonic accelerator was booted through adjusting the gain and the nonlinearity in the broadband NOEO cavity with four orthogonal extremely-complex chaotic temporal sequences generated. These chaotic temporal sequences were featured with a 6-dB bandwidth up to 18.18 GHz, a permutation entropy as high as 0.9983 and a periodicity as low as 0.081, which were much superior to those obtained through other methods such as oscillation circuits, field programmable gate arrays, and semiconductor lasers (SCLs). Benefitting from this high-performance noise entropy source, the exploration-exploitation dilemma in reinforcement learning could be cracked down with a high speed. Two famous reinforcement learning applications were explored based on the proposed method, i.e., the multi-armed bandit (MAB) problem and the Tic Tac Toe (TTT) game. For the MAB problem, the broadband NOEO-based decision-making machine expanded the scale of slot machines to 512 with the fastest speed and the highest accuracy compared with other existing decision-making machines based on SCLs, random algorithm and 0.02-Greedy algorithm. Thereinto, the proposed scheme could reach a correct decision rate (CDR) of 100% after 80 cycles for the 64-MAB problem, while the 0.02-Greedy method could never reach beyond 98%. In addition, the number of cycles for obtaining successful exploration (i.e., SE) with a CDR beyond 95% was fitted as SE=10.58N1.023 for the proposed scheme, while the exponents for other three methods were 1.032, 1.033 and 1.091, indicating that the NOEO-based photonic accelerator had the lowest growth rate, and was suitable for solving large-scale MAB problems. For the TTT game, two players showed exciting offensive and defensive battles under the guidance of the proposed broadband NOEO-based decision-making machine. Neither of them could win the game with the assistance of the proposed photonic accelerator. If only one player used the photonic accelerator, the probability of winning reached 100%.

The potential impacts of this study extend beyond the academic realm, as it addresses a critical need in the industry for highly efficient decision-making machine. Conventional decision-making machines are always limited to the MAB problems, which restricts their industry applications. In this work, except for the MAB problem, the TTT game, i.e., a typical kind of chess games, is selected to be successfully accelerated by the proposed broadband NOEO-based decision-making machine. The battle process in the chess game has a closer connection to reinforcement learning. Hence, this work has taken a large step forward in the application of photonic accelerators in AI. In addition, considering the multi-function characteristic and the loop-based architecture of the broadband NOEO, the proposed photonic accelerator can find applications in other AI fields, such as deep learning, reservoir computing and combinatorial optimization.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.