We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained
expert Large Language Models (LLMs) in online time-series prediction tasks by
adaptively forecasting the best weighting of LLM predictions at every time
step. Our mechanism leverages the conditional information in each expert's
running performance to forecast the best combination of LLMs for predicting the
time series in its next step. Diverging from static (learned) Mixture of
Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering
techniques to combine experts. By framing the expert selection problem as a
finite state-space, continuous-time Hidden Markov model (HMM), we can leverage
the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters
corresponding to each of the $N$ individual LLMs. Each filter proposes its best
combination of LLMs, given the information that they have access to.
Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound
for the loss of the aggregated LLMs, which can be optimized in closed-form,
thus generating our ensemble predictor. Our contributions here are: (I) the
MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II)
theoretical optimality guarantees of the proposed filtering-based gating
algorithm, and (III) empirical evaluation and ablative results using state of
the art foundational and MoE LLMs on a real-world Financial Market Movement
task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1
measure improvement over the next best performing individual LLM expert.

提出了 MoE-F 机制，用于在在线时间序列预测任务中结合 N 个预训练的大型语言模型 (LLMs)，通过自适应性地预测在每个时间步骤中 LLMs 预测的最佳加权。通过利用每个专家的运行表现中的条件信息，我们的机制可以预测最佳的 LLMs 组合以预测时间序列的下一个步骤。通过将专家选择问题构建为有限状态空间、连续时间的隐马尔可夫模型 (HMM)，我们可以利用 Wohman-Shiryaev 滤波器。我们的方法首先构建了 N 个并行滤波器，分别对应于 N 个单独的 LLMs。每个滤波器根据其拥有的信息提出其最佳的 LLMs 组合。随后，将 N 个滤波器的输出聚合以优化聚合 LLMs 的损失下界，可以通过闭合形式进行优化，从而生成我们的集成预测器。本文贡献包括：(I) MoE-F 算法 - 可作为即插即用的滤波器框架使用；(II) 提出的基于滤波的门控算法的理论最优性保证；(III) 使用最先进的基础和 MoE LLMs 在真实世界的金融市场动态预测任务上进行的实证评估和分析结果，其中 MoE-F 相对于表现最好的单个 LLM 专家获得了显著的 17% 的绝对值和 48.5% 的相对 F1 度量改进。