Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons...
TL;DRLLM2Speech架构用于通过LLM生成语音,以减少显著的延迟并实现自然对话。
Abstract
large language models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose <