This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5. We study various designs to pretrain T5 using an auxiliary model to construct more challenging token replacements for the main model to denoise. Key aspects under study include the decoding target, the location of the RTD head, and the masking pattern. Based on these studies, we develop a new model, METRO-T0, which is pretrained using the redesigned ELECTRA-Style pretraining strategies and then prompt-finetuned on a mixture of NLP tasks. METRO-T0 outperforms all similar-sized baselines on prompted NLP benchmarks, such as T0 Eval and MMLU, and rivals the state-of-the-art T0-11B model with only 8% of its parameters. Our analysis on model's neural activation and parameter sensitivity reveals that the effectiveness of METRO-T0 stems from more balanced contribution of parameters and better utilization of their capacity. The code and model checkpoints are available at https://github.com/gonglinyuan/metro_t0.

本文探讨了模型生成信号在改善零样本泛化文本到文本转换器（如T5）中的效果。 我们研究了使用辅助模型预训练T5的各种设计，以构造更具挑战性的标记替换作为主要模型的去噪前缀。 基于这些研究，我们开发了一个新模型METRO-T0，并改进了ELECTRA-Style的预训练策略，并在多种NLP任务上进行了提示微调。METRO-T0在提示的NLP基准测试中胜过所有类似大小的基线，例如T0 Eval和MMLU，并仅使用其8％的参数即可与最先进的T0-11B模型相媲美。 我们对模型的神经激活和参数敏感性的分析表明，METRO-T0的有效性源于更平衡的参数贡献和更好的利用它们的能力。

模型生成的预训练信号改进了文本-文本转换器的零-shot 泛化能力