Current training pipelines in object recognition neglect Hue Jittering when
doing data augmentation as it not only brings appearance changes that are
detrimental to classification, but also the implementation is inefficient in
practice. In this study, we investigate the effect of hue variance in the
context of video recognition and find this variance to be beneficial since
static appearances are less important in videos that contain motion
information. Based on this observation, we propose a data augmentation method
for video recognition, named Motion Coherent Augmentation (MCA), that
introduces appearance variation in videos and implicitly encourages the model
to prioritize motion patterns, rather than static appearances. Concretely, we
propose an operation SwapMix to efficiently modify the appearance of video
samples, and introduce Variation Alignment (VA) to resolve the distribution
shift caused by SwapMix, enforcing the model to learn appearance invariant
representations. Comprehensive empirical evaluation across various
architectures and different datasets solidly validates the effectiveness and
generalization ability of MCA, and the application of VA in other augmentation
methods. Code is available at this https URL

本研究探讨了色调变化对视频识别的影响，并提出了一种名为 Motion Coherent Augmentation（MCA）的数据增强方法，通过引入视频中的外观变化，隐式地鼓励模型优先考虑动态模式而非静态外观。我们提出了一个名为 SwapMix 的操作来高效地修改视频样本的外观，并引入了 Variation Alignment（VA）来解决 SwapMix 引起的分布偏移，强制模型学习外观不变表示。全面的实证评估验证了 MCA 的有效性和泛化能力，以及 VA 在其他增强方法中的应用。

不要以貌取人：视频识别的运动一致增强

Don't Judge by the Look: A Motion Coherent Augmentation for Video  Recognition

Training state-of-the-art neural networks requires a high cost in terms of
compute and time. Model scale is recognized to be a critical factor to achieve
and improve the state-of-the-art. Increasing the scale of a neural network
normally requires restarting from scratch by randomly initializing all the
parameters of the model, as this implies a change of architecture's parameters
that does not allow for a straightforward transfer of knowledge from smaller
size models. In this work, we propose six composable transformations to
incrementally increase the size of transformer-based neural networks while
preserving functionality, allowing to expand the capacity of the model as
needed. We provide proof of exact function preservation under minimal
initialization constraints for each transformation. The proposed methods may
enable efficient training pipelines for larger and more powerful models by
progressively expanding the architecture throughout training.

通过逐步增加转换器神经网络的大小，以保留功能，并在最小初始化约束下提供确切的功能保留证明，本研究提出六种组合的转换方法，可能通过逐步扩展架构来实现更大、更强大的模型的高效训练管道。

可组合功能保持扩展的 Transformer 架构

Composable Function-preserving Expansions for Transformer Architectures

This paper presents a simple recipe to train state-of-the-art multilingual
Grammatical Error Correction (GEC) models. We achieve this by first proposing a
language-agnostic method to generate a large number of synthetic examples. The
second ingredient is to use large-scale multilingual language models (up to 11B
parameters). Once fine-tuned on language-specific supervised sets we surpass
the previous state-of-the-art results on GEC benchmarks in four languages:
English, Czech, German and Russian. Having established a new set of baselines
for GEC, we make our results easily reproducible and accessible by releasing a
cLang-8 dataset. It is produced by using our best model, which we call gT5, to
clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly
simplifies typical GEC training pipelines composed of multiple fine-tuning
stages -- we demonstrate that performing a single fine-tuning step on cLang-8
with the off-the-shelf language models yields further accuracy improvements
over an already top-performing gT5 model for English.

本文提出了一种简单的方法来训练最先进的多语言语法纠错模型，使用大规模多语言语言模型来优化，建立并发布了基线数据集 cLang-8，通过单步微调得出了英语的精确度提高。