While LLMs have shown great success in understanding and generating text in
traditional conversational settings, their potential for performing ill-defined
complex tasks is largely under-studied. Indeed, we are yet to conduct
comprehensive benchmarking studies with multiple LLMs that are exclusively
focused on a complex task. However, conducting such benchmarking studies is
challenging because of the large variations in LLMs' performance when different
prompt types/styles are used and different degrees of detail are provided in
the prompts. To address this issue, the paper proposes a general taxonomy that
can be used to design prompts with specific properties in order to perform a
wide range of complex tasks. This taxonomy will allow future benchmarking
studies to report the specific categories of prompts used as part of the study,
enabling meaningful comparisons across different studies. Also, by establishing
a common standard through this taxonomy, researchers will be able to draw more
accurate conclusions about LLMs' performance on a specific complex task.

提出了一种通用分类法，以设计具有特定属性的提示来执行广泛的复杂任务，从而解决了使用不同提示类型 / 样式和提示中提供不同程度细节时 LLMs 性能差异的问题，使未来的基准研究能够报告所使用的特定类别的提示，启用对不同研究的有意义的比较，并通过这种分类法建立共同标准，研究人员将能够更准确地得出关于 LLMs 在特定复杂任务上的表现的结论。

TELeR: 用于复杂任务基准测试的 LLM 提示的通用分类

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

Detecting sarcasm and verbal irony is critical for understanding people's
actual sentiments and beliefs. Thus, the field of sarcasm analysis has become a
popular research problem in natural language processing. As the community
working on computational approaches for sarcasm detection is growing, it is
imperative to conduct benchmarking studies to analyze the current
state-of-the-art, facilitating progress in this area. We report on the shared
task on sarcasm detection we conducted as a part of the 2nd Workshop on
Figurative Language Processing (FigLang 2020) at ACL 2020.

研究发现，找出人们的反讽和言语讽刺对于理解他们的实际情感和信仰至关重要。因此，反讽分析已成为自然语言处理中一个热门的研究问题。本文作为 FigLang2020 会议的一部分，介绍了一个 sarcasm detection 共享任务，旨在进行基准研究，以分析最先进的技术，推动该领域的进展。

2020 年讽刺检测共享任务报告

A Report on the 2020 Sarcasm Detection Shared Task

In recent years, the power systems research community has seen an explosion
of novel methods for formulating the AC power flow equations. Consequently,
benchmarking studies using the seminal AC Optimal Power Flow (AC-OPF) problem
have emerged as the primary method for evaluating these emerging methods.
However, it is often difficult to directly compare these studies due to subtle
differences in the AC-OPF problem formulation as well as the network,
generation, and loading data that are used for evaluation. To help address
these challenges, this IEEE PES Task Force report proposes a standardized
AC-OPF mathematical formulation and the PGLib-OPF networks for benchmarking
AC-OPF algorithms. A motivating study demonstrates some limitations of the
established network datasets in the context of benchmarking AC-OPF algorithms
and a validation study demonstrates the efficacy of using the PGLib-OPF
networks for this purpose. In the interest of scientific discourse and future
additions, the PGLib-OPF benchmark library is open-access and all the of
network data is provided under a creative commons license.

该研究提出了一种标准化的交流最优功率流（AC-OPF）数学公式和用于评估 AC-OPF 算法的 PGLib-OPF 网络，其中评估使用的网络、发电和负载数据可能会存在细微的差异，该研究通过提供开源的 PGLib-OPF 基准库和网络数据以期促进科学交流和未来的增加。