长期外科视频中有关活动识别的实证研究

论文标题

长期外科视频中有关活动识别的实证研究

An Empirical Study on Activity Recognition in Long Surgical Videos

论文作者

He, Zhuohong, Mottaghi, Ali, Sharghi, Aidean, Jamal, Muhammad Abdullah, Mohareri, Omid

论文摘要

手术视频中的活动识别是开发下一代设备和工作流监视系统的关键研究领域。由于手术是具有高度变化长度的较长过程，因此用于手术视频的深度学习模型通常包括使用骨干和时间序列模型的两阶段设置。在本文中，我们研究了许多最新的骨干和时间模型，以找到为手术活动识别提供最强性能的体系结构。我们首先基准在大规模活动识别数据集上进行模型性能，该数据集包含在多个临床手术室中捕获的800多个手术视频。我们进一步评估了两个较小的公共数据集（Cholec80和Cataract-101数据集）上的模型，分别仅包含80个视频和101个视频。我们从经验上发现，Swin-Transformer+BigRU时间模型在两个数据集上都产生了强劲的性能。最后，我们通过微调模型对新医院进行微调模型，并尝试了最新的无监督域适应方法，研究了该模型对新领域的适应性。

Activity recognition in surgical videos is a key research area for developing next-generation devices and workflow monitoring systems. Since surgeries are long processes with highly-variable lengths, deep learning models used for surgical videos often consist of a two-stage setup using a backbone and temporal sequence model. In this paper, we investigate many state-of-the-art backbones and temporal models to find architectures that yield the strongest performance for surgical activity recognition. We first benchmark the models performance on a large-scale activity recognition dataset containing over 800 surgery videos captured in multiple clinical operating rooms. We further evaluate the models on the two smaller public datasets, the Cholec80 and Cataract-101 datasets, containing only 80 and 101 videos respectively. We empirically found that Swin-Transformer+BiGRU temporal model yielded strong performance on both datasets. Finally, we investigate the adaptability of the model to new domains by fine-tuning models to a new hospital and experimenting with a recent unsupervised domain adaptation approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题