Large language models (LLMs) are emerging as promising tools for mental health care, offering scalable support through their ability to generate human-like responses. However, the effectiveness of these models in clinical settings remains unclear. This scoping review aimed to assess the current generative applications of LLMs in mental health care, focusing on studies where these models were tested with human participants in real-world scenarios. A systematic search across APA PsycNet, Scopus, PubMed, and Web of Science identified 726 unique articles, of which 17 met the inclusion criteria. These studies encompassed applications such as clinical assistance, counseling, therapy, and emotional support. However, the evaluation methods were often non-standardized, with most studies relying on ad hoc scales that limit comparability and robustness. Privacy, safety, and fairness were also frequently underexplored. Moreover, reliance on proprietary models, such as OpenAI's GPT series, raises concerns about transparency and reproducibility. While LLMs show potential in expanding mental health care access, especially in underserved areas, the current evidence does not fully support their use as standalone interventions. More rigorous, standardized evaluations and ethical oversight are needed to ensure these tools can be safely and effectively integrated into clinical practice.

本文研究了大型语言模型（LLMs）在心理健康护理中的应用，评估其在人类参与者中的有效性及临床适用性。研究发现，尽管LLMs在扩展心理健康护理服务方面具有潜力，但多数研究方法不标准，并且缺乏对隐私、安全和公平性的深入探索，表明需要更严格的评估和伦理监督以确保其安全有效地整合入临床实践。

将大型语言模型应用于心理健康护理：人类评估生成任务的范围审查