Prior research in representation engineering has revealed that LLMs encode concepts within their representation spaces, predominantly centered around English. In this study, we extend this philosophy to a multilingual scenario, delving into multilingual human value concepts in LLMs. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct multilinguality, we empirically substantiate the existence of multilingual human values in LLMs. Further cross-lingual analysis on these concepts discloses 3 traits arising from language resource disparities: cross-lingual inconsistency, distorted linguistic relationships, and unidirectional cross-lingual transfer between high- and low-resource languages, all in terms of human value concepts. Additionally, we validate the feasibility of cross-lingual control over value alignment capabilities of LLMs, leveraging the dominant language as a source language. Drawing from our findings on multilingual value alignment, we prudently provide suggestions on the composition of multilingual data for LLMs pre-training: including a limited number of dominant languages for cross-lingual alignment transfer while avoiding their excessive prevalence, and keeping a balanced distribution of non-dominant languages. We aspire that our findings would contribute to enhancing the safety and utility of multilingual AI.

通过全面的研究，我们证实了多语言语言模型中存在多语言人类价值观念，进一步的跨语言分析揭示了语言资源差异引起的三个特征：跨语言不一致性、扭曲的语言关系以及高资源语言与低资源语言之间的单向跨语言传递，同时验证了通过使用主导语言作为源语言来控制多语言语言模型的价值观调整能力的可行性。我们的发现在多语言语言模型的预训练数据构成方面给出了明智的建议：在跨语言对齐传递时包含有限数量的主导语言，避免其过度流行，并保持非主导语言的平衡分布。我们希望我们的发现能够为增强多语言人工智能的安全性和实用性做出贡献。

探索大规模语言模型中的多语言人类价值观念：价值一致性、可传递性和可控性是否跨语言保持一致？