In multi-agent systems utilizing Large Language Models (LLMs), communication between agents traditionally relies on natural language. This communication often includes the full context of the query so far, which can introduce significant prefill-phase latency, especially with long contexts. We introduce DroidSpeak, a novel framework to target this cross-LLM communication by leveraging the reuse of intermediate data, such as input embeddings (E-cache) and key-value caches (KV-cache). We efficiently bypass the need to reprocess entire contexts for fine-tuned versions of the same foundational model. This approach allows faster context integration while maintaining the quality of task performance. Experimental evaluations demonstrate DroidSpeak's ability to significantly accelerate inter-agent communication, achieving up to a 2.78x speedup in prefill latency with negligible loss in accuracy. Our findings underscore the potential to create more efficient and scalable multi-agent systems.

本研究解决了多智能体系统中大型语言模型（LLMs）之间通信的效率问题，尤其是在处理长上下文时的延迟。通过引入DroidSpeak框架，利用中间数据（如输入嵌入和键值缓存）来提高跨LLM通信的速度，实现了预填充延迟高达2.78倍的加速，同时保持任务性能的质量。这一发现为构建更高效、可扩展的多智能体系统提供了新的可能性。

DroidSpeak：增强跨大型语言模型的通信