基于改进Deformable DETR的水面目标检测

Detection of water surface targets based on improved Deformable DETR

  • 摘要:
    目的 旨在提出一种基于改进Deformable DETR的目标检测算法实现对水面目标的智能识别,能在大幅提升算法模型推理和训练速度的同时提高检测准确率,以实现更加高效鲁棒的水面目标检测。
    方法 构建一个新的水面目标数据集,使用轻量化的MobileNetV3替换Deformable DETR原有特征提取网络并引入CBAM注意力机制模块,对Deformable DETR算法进行改进。通过在自构建的水面目标数据集和公开数据集ABOships开展消融实验以及横向对比试验验证改进算法的有效性。
    结果 在自构建数据集和ABOships 2个数据集上的消融实验结果证明,改进算法模型相较原算法模型参数量及大小减少至1/3,模型推理速度分别提升52.0%和82.7%,mAP0.5:0.95分别提升2.4%和7.5%,训练耗时分别为原算法的41.7%和51.9%。在ABOships数据集上进行的不同算法性能的对比测试结果进一步证明所提出的改进算法在推理速度和检测精度综合性能上均具有优越性。
    结论 DETR类算法在水面目标检测领域具有应用潜力。

     

    Abstract:
    Objective  With the development of technology and the increasing demand for water resource exploration, water surface target detection plays a crucial role in various applications such as ship navigation and maritime safety. However, traditional detection methods face challenges, and existing deep - learning - based algorithms have limitations in this field, including limited datasets and insufficient detection speed after improvement. The aim of this study is to develop an improved object - detection algorithm based on Deformable DETR for intelligent recognition of water surface targets. The algorithm aims to significantly enhance the inference and training speed of the model while improving the detection accuracy, thus achieving more efficient and robust water surface target detection.
    Methods  Firstly, a new water surface target dataset was constructed. Then, the original feature - extraction network of Deformable DETR was replaced with the lightweight MobileNetV3. MobileNetV3, which has different versions, is a lightweight network with high recognition accuracy and small model parameters. The MobileNetV3 - Small version was chosen as the feature - extraction backbone. It has a series of operations like depth - separable convolution, and it also includes SE modules and the Hard - swish activation function. To further reduce the model size and enhance the detection ability, three output feature maps from specific modules of MobileNetV3 - Small were directly used for multi - scale feature extraction. Secondly, the CBAM attention mechanism module was introduced. CBAM is a lightweight and universal module that combines channel attention and spatial attention. It can be easily integrated into the network. By replacing the SE module in MobileNetV3 with CBAM, the model's ability to extract features was further improved. The channel attention module in CBAM processes the input feature map through average pooling and max pooling, and then uses a shared neural network and a sigmoid function to generate channel - attention features. The spatial attention module, after performing pooling operations on the channel - dimension of the feature map processed by the channel - attention module, conducts convolution and sigmoid activation to obtain spatial - attention features. Finally, the improved Deformable DETR network was obtained by integrating MobileNetV3 and the CBAM attention - mechanism module. The input image passes through the MobileNetV3 - Small network with embedded CBAM, and three different - scale feature maps are extracted. These feature maps are processed and then fed into the Deformable DETR's Transformer structure for further processing.
    Results  Ablation experiments were carried out on the self - constructed dataset and the ABOships dataset. On the self - constructed dataset, compared with the original Deformable DETR model, the improved algorithm reduced the model's parameter count and size to about one - third. The model inference speed increased by 52.0%, and the mAP0.5:0.95 increased by 2.4%. Training time was reduced to 41.7% of the original algorithm. On the ABOships dataset, the inference speed increased by 82.7%, the mAP0.5:0.95 increased by 7.5%, and the training time was 51.9% of the original. The model's loss function value during training converged faster and more stably. In the comparison tests with other common algorithms (YOLOv3, Faster R - CNN, Mask R - CNN) on the ABOships dataset, the improved algorithm showed superiority. In terms of mAP0.5, it reached 50.0%, higher than the other algorithms. In mAP0.5:0.95, it was 21.7%, leading in fine - grained detection. The model's parameter count was only 12.9M, much lower than others, indicating high parameter efficiency. Although the frame rate was slightly lower than that of YOLOv3 and Faster R - CNN, it was significantly higher than that of Mask R - CNN, maintaining a reasonable processing speed while ensuring high detection accuracy.
    Conclusions The improved Deformable DETR algorithm proposed in this paper effectively improves the performance of water surface target detection. It successfully reduces the model's parameter count and storage requirements, accelerates the training and inference speed, and enhances the recognition accuracy. The experimental results on different datasets verify the effectiveness of the algorithm. This study explores a new path for the application of DETR - class algorithms in water surface target detection, indicating their potential in this field.

     

/

返回文章
返回