Dialogue systems controlled by predefined or rule-based scenarios derived from counseling techniques, such as cognitive behavioral therapy (CBT), play an important role in mental health apps. Despite the need for responsible responses, it is conceivable that using the newly emerging LLMs to generate contextually relevant utterances will enhance these apps. In this study, we construct dialogue modules based on a CBT scenario focused on conventional Socratic questioning using two kinds of LLMs: a Transformer-based dialogue model further trained with a social media empathetic counseling dataset, provided by Osaka Prefecture (OsakaED), and GPT-4, a state-of-the art LLM created by OpenAI. By comparing systems that use LLM-generated responses with those that do not, we investigate the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality (e.g., empathy). As a result, no notable improvements are observed when using the OsakaED model. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly. Results suggest that GPT-4 possesses a high counseling ability. However, they also indicate that even when using a dialogue model trained with a human counseling dataset, it does not necessarily yield better outcomes compared to scenario-based dialogues. While presenting LLM-generated responses, including GPT-4, and having them interact directly with users in real-life mental health care services may raise ethical issues, it is still possible for human professionals to produce example responses or response templates using LLMs in advance in systems that use rules, scenarios, or example responses.

通过比较基于LLMs生成的回应与非基于LLMs生成的回应的系统，研究了生成回应对主观评价（如情绪变化、认知变化和对话质量）的影响。结果表明，使用GPT-4时，情绪变化、共情和其他对话品质显著改善，说明GPT-4具有较高的心理咨询能力。然而，研究还指出，即使使用了人类心理咨询数据集训练的对话模型，与基于情景的对话相比，并不能产生更好的结果。在使用了规则、情景或示例回应的系统中，可以通过人工专业人士提前使用LLMs生成示例回应或回应模板的方式呈现基于LLMs生成的回应，并且直接与用户在现实的心理健康服务中进行交互，这可能引发一些伦理问题。

大型语言模型用于认知行为疗法中的响应生成：与苏格拉底式质询的比较研究