Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific instructions. However, due to the computational demands associated with training these models, their applications often rely on zero-shot settings. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and OpenAssistant, in the context of Computational Social Science classification tasks, while also investigating the effects of various prompting strategies. Our experiment considers the impact of prompt complexity, including the effect of incorporating label definitions into the prompt, using synonyms for label names, and the influence of integrating past memories during the foundation model training. The findings indicate that in a zero-shot setting, the current LLMs are unable to match the performance of smaller, fine-tuned baseline transformer models (such as BERT). Additionally, we find that different prompting strategies can significantly affect classification accuracy, with variations in accuracy and F1 scores exceeding 10%.

在计算社会科学分类任务中，评估了ChatGPT和OpenAssistant两种公共可访问的LLM的零次效果，并研究了各种提示策略的影响。发现在零次设置下，当前LLMs无法与较小的经过微调的基线变压器模型（如BERT）的性能匹配。此外，发现不同的提示策略可以显着影响分类准确性，准确性和F1分数的差异超过10％。

零样本分类中的提示复杂度导航：计算社会科学中大型语言模型的研究