Aligning large language models (LLMs) to human values has become increasingly
important as it enables sophisticated steering of LLMs, e.g., making them
follow given instructions while keeping them less toxic. However, it requires a
significant amount of human demonstrations and feedback. Recently, open-sourced
models have attempted to replicate the alignment learning process by distilling
data from already aligned LLMs like InstructGPT or ChatGPT. While this process
reduces human efforts, constructing these datasets has a heavy dependency on
the teacher models. In this work, we propose a novel framework for alignment
learning with almost no human labor and no dependency on pre-aligned LLMs.
First, we perform reward modeling (RM) with synthetic feedback by contrasting
responses from vanilla LLMs with various sizes and prompts. Then, we use the RM
for simulating high-quality demonstrations to train a supervised policy and for
further optimizing the model with reinforcement learning. Our resulting model,
Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms
open-sourced models, including Alpaca, Dolly, and OpenAssistant, which are
trained on the outputs of InstructGPT or human-annotated instructions. Our
7B-sized model outperforms the 12-13B models in the A/B tests using GPT-4 as
the judge with about 75% winning rate on average.

本研究提出了一个新的框架，利用奖励建模 (RM) 方法和模拟高质量演示来进行对齐语言模型的训练，避免了对已对齐的 LLMs 的依赖，这种方法的结果是，我们的模型 ALMoST 在对 InstructGPT 或人工注释指令训练的开放源代码模型中表现良好，我们的 7B 大小的模型在使用 GPT-4 作为评判员的 A /B 测试中表现优异，平均获胜率约为 75％。

通过合成反馈对齐大型语言模型

Aligning Large Language Models through Synthetic Feedback

In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed,
Mask-free, and Universal Photometric Stereo network. Our approach can recover
astonishingly intricate surface normal maps, rivaling the quality of 3D
scanners, even when images are captured under unknown, spatially-varying
lighting conditions in uncontrolled environments. We have extended previous
universal photometric stereo networks to extract spatial-light features,
utilizing all available information in high-resolution input images and
accounting for non-local interactions among surface points. Moreover, we
present a new synthetic training dataset that encompasses a diverse range of
shapes, materials, and illumination scenarios found in real-world scenes.
Through extensive evaluation, we demonstrate that our method not only surpasses
calibrated, lighting-specific techniques on public benchmarks, but also excels
with a significantly smaller number of input images even without object masks.

本文介绍了 SDM-UniPS，一种具有突破性的可扩展，详细，无遮挡和通用的光度立体网络。我们的方法可以恢复惊人复杂的表面法线图，即使在未知的，空间变化的采集条件和非受控环境下。我们已经扩展了先前的通用光度立体网络，以提取空间光特性，利用高分辨率输入图像中的所有可用信息，并考虑表面点之间的非局部交互。此外，我们提出了一个新的合成训练数据集，包括现实世界场景中的各种形状，材料和照明情况。通过广泛的评估，我们证明了我们的方法不仅可以胜过公共基准上的校准的，特定于照明的技术，而且即使没有目标掩模，也可以在显著较少的输入图像下表现出色。

可扩展、详细且无蒙版通用光度立体

Scalable, Detailed and Mask-Free Universal Photometric Stereo

Interactive object cutout tools are the cornerstone of the image editing
workflow. Recent deep-learning based interactive segmentation algorithms have
made significant progress in handling complex images and rough binary
selections can typically be obtained with just a few clicks. Yet, deep learning
techniques tend to plateau once this rough selection has been reached. In this
work, we interpret this plateau as the inability of current algorithms to
sufficiently leverage each user interaction and also as the limitations of
current training/testing datasets.
We propose a novel interactive architecture and a novel training scheme that
are both tailored to better exploit the user workflow. We also show that
significant improvements can be further gained by introducing a synthetic
training dataset that is specifically designed for complex object boundaries.
Comprehensive experiments support our approach, and our network achieves state
of the art performance.

本文提出了一种新的交互式架构和训练方案，旨在更好地利用用户工作流，并展示出引入专门设计用于复杂对象边界的合成训练数据集可以进一步获得显着的改进，该网络达到了最先进性能。