This paper describes and analyzes our participation in the 2023 Eval4NLP shared task, which focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation, particularly in the context of evaluating machine translations and summaries. We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting. In addition, we integrated these approaches with zero-shot and one-shot learning methods to maximize the efficacy of our evaluation procedures. Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.

本文描述和分析了我们参与2023 Eval4NLP共享任务的工作，该任务主要关注评估基于提示的技术对大型语言模型在质量估计任务中的有效性，特别是在评估机器翻译和摘要的背景下。我们进行了系统实验，尝试了各种提示技术，包括标准提示、基于注释人指示的提示和创新的思路链提示。此外，我们结合了零样本学习和一次性学习方法，以最大化我们的评估程序的效力。我们的工作表明，使用“小型”开源模型（orca_mini_v3_7B）结合这些方法可以取得具有竞争力的结果。

小巨人：探索小型LLMs作为Eval4NLP 2023共享任务摘要评估度量的潜力