In-context learning (ICL) allows LLMs to learn from examples without changing
their weights, which is a particularly promising capability for long-context
LLMs that can potentially learn from many examples. Recently, Lin et al. (2024)
proposed URIAL, a method using only three in-context examples to align base
LLMs, achieving non-trivial instruction following performance. In this work, we
show that, while effective, ICL alignment with URIAL still underperforms
compared to instruction fine-tuning on established benchmarks such as MT-Bench
and AlpacaEval 2.0 (LC), especially with more capable base LMs. Unlike for
tasks such as classification, translation, or summarization, adding more ICL
demonstrations for long-context LLMs does not systematically improve
instruction following performance. To address this limitation, we derive a
greedy selection approach for ICL examples that noticeably improves
performance, yet without bridging the gap to instruction fine-tuning. Finally,
we provide a series of ablation studies to better understand the reasons behind
the remaining gap, and we show how some aspects of ICL depart from the existing
knowledge and are specific to the instruction tuning setting. Overall, our work
advances the understanding of ICL as an alignment technique. We provide our
code at this https URL

通过对长文本 LLMs 进行多个 in-context 学习示例的贪婪选择，我们改进了 ICL 与 URIAL 的对齐效果，但仍未消除与指令微调之间的差距，进一步的削减研究揭示了 ICL 在指令调整的环境中的特殊性，从而推进了对 ICL 作为对齐技术的理解。

在 LLM 中，上下文学习是否足够用于指令遵循？

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong
method to align Pretrained Large Language Models (LLMs) with human preferences.
But training models with RLHF is computationally expensive, and an overall
complex process. In this work, we study RLHF where the underlying models are
trained using the parameter efficient method of Low-Rank Adaptation (LoRA)
introduced by Hu et al. [2021]. We investigate the setup of "Parameter
Efficient Reinforcement Learning" (PERL), in which we perform reward model
training and reinforcement learning using LoRA. We compare PERL to conventional
fine-tuning (full-tuning) across various configurations for 7 benchmarks,
including 2 novel datasets, of reward modeling and reinforcement learning. We
find that PERL performs on par with the conventional RLHF setting, while
training faster, and with less memory. This enables the high performance of
RLHF, while reducing the computational burden that limits its adoption as an
alignment technique for Large Language Models. We also release 2 novel thumbs
up/down preference datasets: "Taskmaster Coffee", and "Taskmaster Ticketing" to
promote research around RLHF.

使用参数高效的强化学习（PERL）方法，研究了从人类反馈中进行增强学习的方法，该方法能够降低计算复杂度并提高模型的性能，为大型语言模型的对齐技术提供了可能性。

PERL: 从人类反馈学习的参数高效强化学习

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Recently, there has been considerable interest in new tiered network cellular
architectures, which would likely use many more cell sites than found today.
Two major challenges will be i) providing backhaul to all of these cells and
ii) finding efficient techniques to leverage higher frequency bands for mobile
access and backhaul. This paper proposes the use of outdoor millimeter wave
communications for backhaul networking between cells and mobile access within a
cell. To overcome the outdoor impairments found in millimeter wave propagation,
this paper studies beamforming using large arrays. However, such systems will
require narrow beams, increasing sensitivity to movement caused by pole sway
and other environmental concerns. To overcome this, we propose an efficient
beam alignment technique using adaptive subspace sampling and hierarchical beam
codebooks. A wind sway analysis is presented to establish a notion of beam
coherence time. This highlights a previously unexplored tradeoff between array
size and wind-induced movement. Generally, it is not possible to use larger
arrays without risking a corresponding performance loss from wind-induced beam
misalignment. The performance of the proposed alignment technique is analyzed
and compared with other search and alignment methods. The results show
significant performance improvement with reduced search time.

本研究提出了一种利用室外毫米波通信实现基站间回程网络和单个基站内移动接入的方案，同时，这篇论文通过研究大型阵列的波束成形来克服毫米波传输中的室外障碍，并提出了一种高效的波束对准技术来适应柱式摇晃和其他环境影响。