The evolution of artificial intelligence (AI) has profoundly impacted human
society, driving significant advancements in multiple sectors. Yet, the
escalating demands on AI have highlighted the limitations of AI's current
offerings, catalyzing a movement towards Artificial General Intelligence (AGI).
AGI, distinguished by its ability to execute diverse real-world tasks with
efficiency and effectiveness comparable to human intelligence, reflects a
paramount milestone in AI evolution. While existing works have summarized
specific recent advancements of AI, they lack a comprehensive discussion of
AGI's definitions, goals, and developmental trajectories. Different from
existing survey papers, this paper delves into the pivotal questions of our
proximity to AGI and the strategies necessary for its realization through
extensive surveys, discussions, and original perspectives. We start by
articulating the requisite capability frameworks for AGI, integrating the
internal, interface, and system dimensions. As the realization of AGI requires
more advanced capabilities and adherence to stringent constraints, we further
discuss necessary AGI alignment technologies to harmonize these factors.
Notably, we emphasize the importance of approaching AGI responsibly by first
defining the key levels of AGI progression, followed by the evaluation
framework that situates the status-quo, and finally giving our roadmap of how
to reach the pinnacle of AGI. Moreover, to give tangible insights into the
ubiquitous impact of the integration of AI, we outline existing challenges and
potential pathways toward AGI in multiple domains. In sum, serving as a
pioneering exploration into the current state and future trajectory of AGI,
this paper aims to foster a collective comprehension and catalyze broader
public discussions among researchers and practitioners on AGI.

人工智能的发展深刻影响了人类社会，并在多个领域取得了重大进展。然而，对人工智能的不断需求突显了其现有能力的局限性，推动了向人工通用智能（AGI）的发展。AGI 以其与人类智能相当的效率和有效性，具备执行多样化现实任务的能力，代表着人工智能演进的重要里程碑。本文通过广泛的调查、讨论和原创观点，深入探讨了接近 AGI 的关键问题及其实现所需的策略，不同于现有的综述文献。我们首先阐述了 AGI 的必要能力框架，整合了内部、界面和系统维度。由于实现 AGI 需要更先进的能力和严格的约束条件，我们进一步讨论了必要的 AGI 对齐技术，以协调这些因素。值得注意的是，我们强调通过首先定义 AGI 进展的关键级别来负责任地对待 AGI，然后评估现状的评价框架，并最终提出了达到 AGI 巅峰的路线图。此外，为了提供对 AI 整合的普遍影响的切实见解，我们概述了在多个领域实现 AGI 所面临的挑战和可能的途径。总之，作为对 AGI 当前状态和未来轨迹的先驱性探索，本文旨在促进研究人员和实践者之间的集体理解，并引发更广泛的公众讨论。

我们离 AGI 还有多远

How Far Are We From AGI

Big models have achieved revolutionary breakthroughs in the field of AI, but
they might also pose potential concerns. Addressing such concerns, alignment
technologies were introduced to make these models conform to human preferences
and values. Despite considerable advancements in the past year, various
challenges lie in establishing the optimal alignment strategy, such as data
cost and scalable oversight, and how to align remains an open question. In this
survey paper, we comprehensively investigate value alignment approaches. We
first unpack the historical context of alignment tracing back to the 1920s
(where it comes from), then delve into the mathematical essence of alignment
(what it is), shedding light on the inherent challenges. Following this
foundation, we provide a detailed examination of existing alignment methods,
which fall into three categories: Reinforcement Learning, Supervised
Fine-Tuning, and In-context Learning, and demonstrate their intrinsic
connections, strengths, and limitations, helping readers better understand this
research area. In addition, two emerging topics, personal alignment, and
multimodal alignment, are also discussed as novel frontiers in this field.
Looking forward, we discuss potential alignment paradigms and how they could
handle remaining challenges, prospecting where future alignment will go.

大型模型在人工智能领域取得了革命性突破，但也可能引发一些潜在的担忧。本文综合研究价值对齐方法，探究了历史背景、数学本质以及现有对齐方法（强化学习、监督微调和上下文学习）的联系、优势和局限，并讨论了个性对齐和多模态对齐作为该领域的新兴方向。最后，展望了未来对齐范式以及如何处理剩余挑战。

关于本质和前景的研究：大型模型的对齐方法调查

On the Essence and Prospect: An Investigation of Alignment Approaches  for Big Models

Large language models (LLMs) have revolutionized the role of AI, yet also
pose potential risks of propagating unethical content. Alignment technologies
have been introduced to steer LLMs towards human preference, gaining increasing
attention. Despite notable breakthroughs in this direction, existing methods
heavily rely on high-quality positive-negative training pairs, suffering from
noisy labels and the marginal distinction between preferred and dispreferred
response data. Given recent LLMs' proficiency in generating helpful responses,
this work pivots towards a new research focus: achieving alignment using solely
human-annotated negative samples, preserving helpfulness while reducing
harmfulness. For this purpose, we propose Distributional Dispreference
Optimization (D$^2$O), which maximizes the discrepancy between the generated
responses and the dispreferred ones to effectively eschew harmful information.
We theoretically demonstrate that D$^2$O is equivalent to learning a
distributional instead of instance-level preference model reflecting human
dispreference against the distribution of negative responses. Besides, D$^2$O
integrates an implicit Jeffrey Divergence regularization to balance the
exploitation and exploration of reference policies and converges to a
non-negative one during training. Extensive experiments demonstrate that our
method achieves comparable generation quality and surpasses the latest
baselines in producing less harmful and more informative responses with better
training stability and faster convergence.

通过利用人工标注的负样本，提出了一种基于分布差异优化的方法来使大型语言模型朝向人类偏好对齐，以减少有害信息的生成。实验证明该方法在生成质量、有害信息数量和训练稳定性方面均优于最新基线模型。