This study introduces the leveled-text generation task, aiming to rewrite
educational materials to specific readability levels while preserving meaning.
We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate
content at various readability levels through zero-shot and few-shot prompting.
Evaluating 100 processed educational materials reveals that few-shot prompting
significantly improves performance in readability manipulation and information
preservation. LLaMA-2 70B performs better in achieving the desired difficulty
range, while GPT-3.5 maintains original meaning. However, manual inspection
highlights concerns such as misinformation introduction and inconsistent edit
distribution. These findings emphasize the need for further research to ensure
the quality of generated educational content.

本研究介绍了分级文本生成任务，旨在将教育材料重写为特定可读性水平同时保持意义不变。通过零 - shot 和少量样本提示，我们评估了 GPT-3.5、LLaMA-2 70B 和 Mixtral 8x7B 在不同可读性水平上生成内容的能力。对 100 份处理过的教育材料进行评估，结果显示少量样本提示显著提高了可读性操作和信息保留的性能。LLaMA-2 70B 在实现所需难度范围方面表现更好，而 GPT-3.5 保持了原始意义。然而，手动检查还揭示出了诸如引入错误信息和不一致的编辑分布等问题。这些发现强调了进一步研究以确保生成的教育内容的质量的需求。

使用 LLMs 生成具有不同可读性水平的教育材料

Generating Educational Materials with Different Levels of Readability  using LLMs

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model.
Mixtral has the same architecture as Mistral 7B, with the difference that each
layer is composed of 8 feedforward blocks (i.e. experts). For every token, at
each layer, a router network selects two experts to process the current state
and combine their outputs. Even though each token only sees two experts, the
selected experts can be different at each timestep. As a result, each token has
access to 47B parameters, but only uses 13B active parameters during inference.
Mixtral was trained with a context size of 32k tokens and it outperforms or
matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular,
Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and
multilingual benchmarks. We also provide a model fine-tuned to follow
instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo,
Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both
the base and instruct models are released under the Apache 2.0 license.

引入了 Mixtral 8x7B，一种稀疏的专家混合（SMoE）语言模型，采用了与 Mistral 7B 相同的架构，每个层由 8 个前馈块（即专家）组成，并通过路由网络选择两个专家处理当前状态和组合它们的输出，最终得到使用 13B 活跃参数的 47B 参数模型，在数学、代码生成和多语言基准测试中表现优秀，并提供了针对指令的精调模型 Mixtral 8x7B - Instruct，在人类基准测试中超过了 GPT-3.5 Turbo、Claude-2.1、Gemini Pro 和 Llama 2 70B - chat model。