Jun, 2024
CoLoR-Filter:有条件的丢失减少过滤器用于目标语言模型预训练
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
David Brandfonbrener, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, Sham Kakade
TL;DR使用 CoLoR-Filter 方法和经验贝叶斯启发式方法选择优质数据,以提高语言模型在下游任务中的性能。