We provide results of our study on text-based 3D human motion retrieval and particularly focus on cross-dataset generalization. Due to practical reasons such as dataset-specific human body representations, existing works typically benchmarkby training and testing on partitions from the same dataset. Here, we employ a unified SMPL body format for all datasets, which allows us to perform training on one dataset, testing on the other, as well as training on a combination of datasets. Our results suggest that there exist dataset biases in standard text-motion benchmarks such as HumanML3D, KIT Motion-Language, and BABEL. We show that text augmentations help close the domain gap to some extent, but the gap remains. We further provide the first zero-shot action recognition results on BABEL, without using categorical action labels during training, opening up a new avenue for future research.

我们提供了一项关于基于文本的三维人体动作检索的研究结果，重点关注跨数据集的泛化问题。通过采用统一的SMPL人体格式，我们能够对一个数据集进行训练，对另一个进行测试，或者对多个数据集进行训练。研究结果表明，标准的文本-动作基准数据集（如HumanML3D、KIT Motion-Language和BABEL）存在数据集偏差。我们展示了文本增强在一定程度上可以缩小领域差距，但仍存在差距。此外，我们首次提供了使用BABEL进行零样本动作识别的结果，而在训练过程中没有使用分类动作标签，开辟了未来研究的新方向。

基于文本的三维人体动作检索的跨数据库研究