Perturbation and operator adjoint method are used to give the right adjoint
form rigourously. From the derivation, we can have following results: 1) The
loss gradient is not an ODE, it is an integral and we shows the reason; 2) The
traditional adjoint form is not equivalent with the back propagation results.
3) The adjoint operator analysis shows that if and only if the discrete adjoint
has the same scheme with the discrete neural ODE, the adjoint form would give
the same results as BP does.

通过摄动和算子伴随方法，我们严格地给出了右伴随形式。从推导中我们得到以下结果：1）损失梯度不是一个 ODE，而是一个积分，我们展示了原因；2）传统的伴随形式与反向传播结果不等价。3）伴随算子分析表明，只有在离散伴随具有与离散神经 ODE 相同的方案时，伴随形式才能给出与 BP 相同的结果。

神经常微分方程网络的伴随方法注记

A note on the adjoint method for neural ordinary differential equation  network

Label noise and class imbalance are two major issues coexisting in real-world
datasets. To alleviate the two issues, state-of-the-art methods reweight each
instance by leveraging a small amount of clean and unbiased data. Yet, these
methods overlook class-level information within each instance, which can be
further utilized to improve performance. To this end, in this paper, we propose
Generalized Data Weighting (GDW) to simultaneously mitigate label noise and
class imbalance by manipulating gradients at the class level. To be specific,
GDW unrolls the loss gradient to class-level gradients by the chain rule and
reweights the flow of each gradient separately. In this way, GDW achieves
remarkable performance improvement on both issues. Aside from the performance
gain, GDW efficiently obtains class-level weights without introducing any extra
computational cost compared with instance weighting methods. Specifically, GDW
performs a gradient descent step on class-level weights, which only relies on
intermediate gradients. Extensive experiments in various settings verify the
effectiveness of GDW. For example, GDW outperforms state-of-the-art methods by
$2.56\%$ under the $60\%$ uniform noise setting in CIFAR10. Our code is
available at this https URL

本文提出了一种称为广义数据加权（GDW）的方法，通过操作类别层面的梯度，同时减轻标签噪声（label noise）和类别不平衡（class imbalance）两大问题，从而实现了有效的性能改善，而且并不会引入额外的计算成本。