Large language models (LLMs) for code are typically trained to align with natural language instructions to closely follow their intentions and requirements. However, in many practical scenarios, it becomes increasingly challenging for these models to navigate the intricate boundary between helpfulness and safety, especially against highly complex yet potentially malicious instructions. In this work, we introduce INDICT: a new framework that empowers LLMs with Internal Dialogues of Critiques for both safety and helpfulness guidance. The internal dialogue is a dual cooperative system between a safety-driven critic and a helpfulness-driven critic. Each critic provides analysis against the given task and corresponding generated response, equipped with external knowledge queried through relevant code snippets and tools like web search and code interpreter. We engage the dual critic system in both code generation stage as well as code execution stage, providing preemptive and post-hoc guidance respectively to LLMs. We evaluated INDICT on 8 diverse tasks across 8 programming languages from 5 benchmarks, using LLMs from 7B to 70B parameters. We observed that our approach can provide an advanced level of critiques of both safety and helpfulness analysis, significantly improving the quality of output codes ($+10\%$ absolute improvements in all models).

LLMs被用于align with自然语言指令以满足使用者的意图和要求，但在实践中，LLMs在安全与效用之间的微妙平衡变得愈发具有挑战性。为此，本研究提出了INDICT框架，通过内部对话协同系统为LLMs提供安全和有益的指导，其中包括安全导向评论家和有益性导向评论家的对话分析。在8个不同任务、8种编程语言和5个基准测试上评估了INDICT，使用了参数为7B至70B的LLMs，并观察到我们的方法在安全和有益性分析方面提供了高水平的批评，显著提高了输出代码的质量（所有模型中绝对改进率为10%）。

INDICT：安全性和实用性的内部对话生成代码