Small object detection in aerial imagery presents significant challenges in computer vision due to the minimal data inherent in small-sized objects and their propensity to be obscured by larger objects and background noise. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases, which adversely affect their performance with objects of varying orientations and scales. This underscores the need for more adaptable, lightweight models. In response, this paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects. Firstly, we explore the use of the SAHI framework on the newly introduced lightweight YOLO v9 architecture, which utilizes Programmable Gradient Information (PGI) to reduce the substantial information loss typically encountered in sequential feature extraction processes. The paper employs the Vision Mamba model, which incorporates position embeddings to facilitate precise location-aware visual understanding, combined with a novel bidirectional State Space Model (SSM) for effective visual context modeling. This State Space Model adeptly harnesses the linear complexity of CNNs and the global receptive field of Transformers, making it particularly effective in remote sensing image classification. Our experimental results demonstrate substantial improvements in detection accuracy and processing efficiency, validating the applicability of these approaches for real-time small object detection across diverse aerial scenarios. This paper also discusses how these methodologies could serve as foundational models for future advancements in aerial object recognition technologies. The source code will be made accessible here.

本文介绍了两种创新方法，显著提升了小型空中物体的检测和分割能力。首先，我们探索了SAHI框架在新引入的轻量级YOLO v9架构上的应用，利用可编程梯度信息（PGI）来减少在串行特征提取过程中常见的大量信息损失。同时，本文还采用了Vision Mamba模型，该模型结合位置嵌入以实现精确定位感知，结合了一种新颖的双向状态空间模型（SSM），用于有效的视觉背景建模。我们的实验结果显示，在不同的空中场景下，这些方法在检测准确度和处理效率上都取得了显著的改进，验证了这些方法在实时小型物体检测方面的适用性。本文还讨论了这些方法如何成为未来空中目标识别技术的基础模型。源代码将在此处提供。

使用状态空间模型和可编程梯度的航空影像中小体目标检测的进展