如何在没有人类注释的情况下跟踪和细分鱼：一种自我监督的深度学习方法

论文标题

如何在没有人类注释的情况下跟踪和细分鱼：一种自我监督的深度学习方法

How to Track and Segment Fish without Human Annotations: A Self-Supervised Deep Learning Approach

论文作者

Saleh, Alzayat, Sheaves, Marcus, Jerry, Dean, Azghadi, Mostafa Rahimi

论文摘要

追踪鱼的运动和鱼的大小对于理解其生态和行为至关重要。知道鱼类在哪里迁移，它们如何与环境互动以及它们的规模如何影响其行为可以帮助生态学家制定更有效的保护和管理策略，以保护鱼类种群及其栖息地。深度学习是从水下视频中分析鱼类生态的有前途的工具。但是，培训深层神经网络（DNN）进行鱼类跟踪和细分需要高质量的标签，这很昂贵。我们提出了一种替代视频数据中的空间和时间变化来生成嘈杂的伪界图标签的替代无监督方法。我们使用这些伪标签训练多任务DNN。我们的框架由三个阶段组成：（1）光流模型使用帧之间的空间和时间一致性生成伪标签，（2）自我监督的模型会逐步完善伪标记，（3）段网络使用精制的标签进行训练。因此，我们进行了广泛的实验，以在三个公共水下视频数据集上验证我们的方法，并证明其在视频注释和细分方面的有效性。我们还评估了其对不同成像条件的鲁棒性并讨论其局限性。

Tracking fish movements and sizes of fish is crucial to understanding their ecology and behaviour. Knowing where fish migrate, how they interact with their environment, and how their size affects their behaviour can help ecologists develop more effective conservation and management strategies to protect fish populations and their habitats. Deep learning is a promising tool to analyze fish ecology from underwater videos. However, training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels, which are expensive to obtain. We propose an alternative unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. We train a multitask DNN using these pseudo-labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training. Consequently, we perform extensive experiments to validate our method on three public underwater video datasets and demonstrate its effectiveness for video annotation and segmentation. We also evaluate its robustness to different imaging conditions and discuss its limitations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题