Recent large language models (LLMs) have been shown to be effective for
misinformation detection. However, the choice of LLMs for experiments varies
widely, leading to uncertain conclusions. In particular, GPT-4 is known to be
strong in this domain, but it is closed source, potentially expensive, and can
show instability between different versions. Meanwhile, alternative LLMs have
given mixed results. In this work, we show that Zephyr-7b presents a
consistently viable alternative, overcoming key limitations of commonly used
approaches like Llama-2 and GPT-3.5. This provides the research community with
a solid open-source option and shows open-source models are gradually catching
up on this task. We then highlight how GPT-3.5 exhibits unstable performance,
such that this very widely used model could provide misleading results in
misinformation detection. Finally, we validate new tools including approaches
to structured output and the latest version of GPT-4 (Turbo), showing they do
not compromise performance, thus unlocking them for future research and
potentially enabling more complex pipelines for misinformation mitigation.

本研究从大语言模型在辨别虚假信息方面的应用出发，探讨了 GPT-4、Zephyr-7b 等模型的优劣，并指出开源模型逐渐在该任务上赶超其他模型，同时呈现出 GPT-3.5 性能不稳定的现象。研究还验证了结构化输出方法和最新版本的 GPT-4（Turbo），证明它们在性能上没有妥协，从而为未来研究解锁了更复杂的虚假信息缓解管道。

比较 GPT-4 和开源语言模型在虚假信息防范中的应用

Comparing GPT-4 and Open-Source Language Models in Misinformation  Mitigation

We aim to produce a smaller language model that is aligned to user intent.
Previous research has shown that applying distilled supervised fine-tuning
(dSFT) on larger models significantly improves task accuracy; however, these
models are unaligned, i.e. they do not respond well to natural prompts. To
distill this property, we experiment with the use of preference data from AI
Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model,
we apply distilled direct preference optimization (dDPO) to learn a chat model
with significantly improved intent alignment. The approach requires only a few
hours of training without any additional sampling during fine-tuning. The final
result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B
parameter models, and requires no human annotation. In particular, results on
MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access
RLHF-based model. Code, models, data, and tutorials for the system are
available at this https URL

通过借鉴 AI Feedback（AIF）中的偏好数据，我们使用蒸馏的直接偏好优化（dDPO）方法，训练了一种具有显著改进的意图对齐的聊天模型 Zephyr-7B，该方法只需要数小时的训练时间，无需额外采样。该模型在 7B 参数模型的聊天基准测试中表现出色，超过了基于强化学习和高自由度增强（RLHF-based）的最佳开放访问模型 Llama2-Chat-70B，并且无需人工标注。