Chaos-based reinforcement learning (CBRL) is a method in which the agent's
internal chaotic dynamics drives exploration. This approach offers a model for
considering how the biological brain can create variability in its behavior and
learn in an exploratory manner. At the same time, it is a learning model that
has the ability to automatically switch between exploration and exploitation
modes and the potential to realize higher explorations that reflect what it has
learned so far. However, the learning algorithms in CBRL have not been
well-established in previous studies and have yet to incorporate recent
advances in reinforcement learning. This study introduced Twin Delayed Deep
Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep
reinforcement learning algorithms that can treat deterministic and continuous
action spaces, to CBRL. The validation results provide several insights. First,
TD3 works as a learning algorithm for CBRL in a simple goal-reaching task.
Second, CBRL agents with TD3 can autonomously suppress their exploratory
behavior as learning progresses and resume exploration when the environment
changes. Finally, examining the effect of the agent's chaoticity on learning
shows that extremely strong chaos negatively impacts the flexible switching
between exploration and exploitation.

混沌增强学习（Chaos-based reinforcement learning，CBRL）是一种通过内部混沌动力学驱动探索的方法，本研究将最新的深度强化学习算法之一，即双延迟深度确定性策略梯度算法（Twin Delayed Deep Deterministic Policy Gradients，TD3），引入到 CBRL 中并进行验证。TD3 在简单目标达成任务中作为学习算法有效，CBRL 代理可在学习过程中自主抑制探索行为并在环境变化时恢复探索，而且研究还发现强混沌性对于探索与开采之间的灵活切换产生负面影响。

基于混沌的深度增强学习与 TD3 算法

Chaos-based reinforcement learning with TD3

The advent of artificial intelligence technology paved the way of many
researches to be made within air combat sector. Academicians and many other
researchers did a research on a prominent research direction called autonomous
maneuver decision of UAV. Elaborative researches produced some outcomes, but
decisions that include Reinforcement Learning(RL) came out to be more
efficient. There have been many researches and experiments done to make an
agent reach its target in an optimal way, most prominent are Genetic
Algorithm(GA) , A star, RRT and other various optimization techniques have been
used. But Reinforcement Learning is the well known one for its success. In
DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real
veteran F16 human pilot who was trained by Boeing. This successor model was
developed by Heron Systems. After this accomplishment, reinforcement learning
bring tremendous attention on itself. In this research we aimed our UAV which
has a dubin vehicle dynamic property to move to the target in two dimensional
space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients
(TD3) and used in experience replay Hindsight Experience Replay(HER).We did
tests on two different environments and used simulations.

本研究提出了一种利用深度强化学习技术（TD3）和经验回放（HER）来优化具有 Dubin 车辆动力学特性的无人机在二维空间中达到目标路径的方法，并在两种不同环境下进行了模拟实验，可用于 UAV 自主机动决策等领域。