In question answering (QA), different questions can be effectively addressed with different answering strategies. Some require a simple lookup, while others need complex, multi-step reasoning to be answered adequately. This observation motivates the development of a dynamic method that adaptively selects the most suitable QA strategy for each question, enabling more efficient and effective systems capable of addressing a broader range of question types. To this aim, we build on recent advances in the orchestration of multiple large language models (LLMs) and formulate adaptive QA as a dynamic orchestration challenge. We define this as a contextual multi-armed bandit problem, where the context is defined by the characteristics of the incoming question and the action space consists of potential communication graph configurations among the LLM agents. We then train a linear upper confidence bound model to learn an optimal mapping between different question types and their corresponding optimal multi-LLM communication graph representation. Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules, as it combines the superior performance of more complex strategies while avoiding their costs when simpler strategies suffice.

本研究解决了在问题回答中不同类型问题需要不同回答策略的难题，提出了一种动态选择最合适的问题回答策略的方法。通过将自适应问题回答建模为上下文多臂老虎机问题，并利用多个大语言模型的协作，实验表明该方法能够有效提高多模块QA系统的效率与性能。

自适应问题回答：基于上下文的多臂老虎机在大语言模型社会中的应用