Nov, 2023

克莱尔法语对话数据集

TL;DRClaire French Dialogue Dataset (CFDD) is a multilingual, open source corpus of roughly 160 million words from transcripts and stage plays in French, created to further the development of language models, with descriptions of its composition, subcorpora breakdown, and standardization process.