This paper tackles the critical challenge of object navigation in autonomous navigation systems, particularly focusing on the problem of target approach and episode termination in environments with long optimal episode length in Deep Reinforcement Learning (DRL) based methods. While effective in environment exploration and object localization, conventional DRL methods often struggle with optimal path planning and termination recognition due to a lack of depth information. To overcome these limitations, we propose a novel approach, namely the Depth-Inference Termination Agent (DITA), which incorporates a supervised model called the Judge Model to implicitly infer object-wise depth and decide termination jointly with reinforcement learning. We train our judge model along with reinforcement learning in parallel and supervise the former efficiently by reward signal. Our evaluation shows the method is demonstrating superior performance, we achieve a 9.3% gain on success rate than our baseline method across all room types and gain 51.2% improvements on long episodes environment while maintaining slightly better Success Weighted by Path Length (SPL). Code and resources, visualization are available at: https://github.com/HuskyKingdom/DITA_acml2023

该研究论文探讨了自主导航系统中目标导航的关键挑战，特别关注了深度强化学习（DRL）方法中长期最优轨迹的目标接近和结束问题。论文提出了一种新颖的方法，称为深度推理终止代理（DITA），它通过将一个监督模型称为判决模型与强化学习相结合来隐式推断目标的深度并决定结束。评估显示该方法在各个房间类型上取得了9.3%的成功率提升，并在长期轨迹环境上取得了51.2%的改进，同时保持稍好的路径长度加权成功率。

学习在物体导航中终止