Spiking Neural Networks (SNNs) contain more biologically realistic structures
and biologically-inspired learning principles than those in standard Artificial
Neural Networks (ANNs). SNNs are considered the third generation of ANNs,
powerful on the robust computation with a low computational cost. The neurons
in SNNs are non-differential, containing decayed historical states and
generating event-based spikes after their states reaching the firing threshold.
These dynamic characteristics of SNNs make it difficult to be directly trained
with the standard backpropagation (BP), which is also considered not
biologically plausible. In this paper, a Biologically-plausible Reward
Propagation (BRP) algorithm is proposed and applied to the SNN architecture
with both spiking-convolution (with both 1D and 2D convolutional kernels) and
full-connection layers. Unlike the standard BP that propagates error signals
from post to presynaptic neurons layer by layer, the BRP propagates target
labels instead of errors directly from the output layer to all pre-hidden
layers. This effort is more consistent with the top-down reward-guiding
learning in cortical columns of the neocortex. Synaptic modifications with only
local gradient differences are induced with pseudo-BP that might also be
replaced with the Spike-Timing Dependent Plasticity (STDP). The performance of
the proposed BRP-SNN is further verified on the spatial (including MNIST and
Cifar-10) and temporal (including TIDigits and DvsGesture) tasks, where the SNN
using BRP has reached a similar accuracy compared to other state-of-the-art
BP-based SNNs and saved 50% more computational cost than ANNs. We think the
introduction of biologically plausible learning rules to the training procedure
of biologically realistic SNNs will give us more hints and inspirations toward
a better understanding of the biological system's intelligent nature.

该论文提出一种基于奖励传播的算法，该算法应用于脉冲神经网络（SNN）架构中的脉冲卷积和全连接层，该算法能够替代标准反向传播算法，实现对 SNN 的训练。使用该算法的 SNN 在空间和时间任务上的表现已经得到验证，达到了 BP-SNN 的类似准确度并节省了 50％的计算成本。

使用生物合理奖励传播调整卷积脉冲神经网络

Tuning Convolutional Spiking Neural Network with Biologically-plausible  Reward Propagation

We propose a novel training algorithm for reinforcement learning which
combines the strength of deep Q-learning with a constrained optimization
approach to tighten optimality and encourage faster reward propagation. Our
novel technique makes deep reinforcement learning more practical by drastically
reducing the training time. We evaluate the performance of our approach on the
49 games of the challenging Arcade Learning Environment, and report significant
improvements in both training time and accuracy.

论文提出一种新的强化学习算法，将深度 Q-learning 与约束优化方法相结合，以加强最优性并促进更快的奖励传播，并得出了在 Arcade Learning Environment 中的性能评估结果，表明该方法能够显著缩短训练时间并提高准确性。