Self-driving vehicles rely on multimodal motion forecasts to effectively interact with their environment and plan safe maneuvers. We introduce SceneMotion, an attention-based model for forecasting scene-wide motion modes of multiple traffic agents. Our model transforms local agent-centric embeddings into scene-wide forecasts using a novel latent context module. This module learns a scene-wide latent space from multiple agent-centric embeddings, enabling joint forecasting and interaction modeling. The competitive performance in the Waymo Open Interaction Prediction Challenge demonstrates the effectiveness of our approach. Moreover, we cluster future waypoints in time and space to quantify the interaction between agents. We merge all modes and analyze each mode independently to determine which clusters are resolved through interaction or result in conflict. Our implementation is available at: https://github.com/kit-mrt/future-motion

本研究解决了自动驾驶车辆在环境交互和安全规划中对多模态运动预测的需求。提出的SceneMotion模型通过一种新颖的潜在上下文模块，将局部代理中心嵌入转化为场景范围的预测，实现了联合预测与交互建模。实验结果显示，该方法在优秀的表现下，为交通代理的互动量化提供了新的视角。

场景运动：从以代理为中心的嵌入到场景范围的预测