This paper compares two deep reinforcement learning approaches for cyber
security in software defined networking. Neural Episodic Control to Deep
Q-Network has been implemented and compared with that of Double Deep
Q-Networks. The two algorithms are implemented in a format similar to that of a
zero-sum game. A two-tailed T-test analysis is done on the two game results
containing the amount of turns taken for the defender to win. Another
comparison is done on the game scores of the agents in the respective games.
The analysis is done to determine which algorithm is the best in game performer
and whether there is a significant difference between them, demonstrating if
one would have greater preference over the other. It was found that there is no
significant statistical difference between the two approaches.

本文比较了两种深度强化学习算法在软件定义网络的网络安全方面的应用：神经情景控制和深度 Q 网络。该论文通过类似于零和博弈的形式进行算法实现和比较，运用双尾 T 检验分析了两者游戏结果以及智能体的游戏得分，发现两个算法在表现方面没有显著的统计学差异。

软件定义网络的无模型深度强化学习

Model-Free Deep Reinforcement Learning in Software-Defined Networks

Deep reinforcement learning methods attain super-human performance in a wide
range of environments. Such methods are grossly inefficient, often taking
orders of magnitudes more data than humans to achieve reasonable performance.
We propose Neural Episodic Control: a deep reinforcement learning agent that is
able to rapidly assimilate new experiences and act upon them. Our agent uses a
semi-tabular representation of the value function: a buffer of past experience
containing slowly changing state representations and rapidly updated estimates
of the value function. We show across a wide range of environments that our
agent learns significantly faster than other state-of-the-art, general purpose
deep reinforcement learning agents.

本文提出了一种深度强化学习代理 —— 神经记忆控制器，该代理能够快速地接受和处理新经验并表现出针对这些经验的行为，并使用半表格化价值函数表示：包含缓慢变化状态表示和快速更新价值函数估计的过去经验缓冲区。研究表明，与其他最先进的通用型深度强化学习代理相比，该代理在各种环境中学习速度明显更快。