论文标题
基于视频的术中手术技能评估
Video-based assessment of intraoperative surgical skill
论文作者
论文摘要
目的:这项调查的目的是对基于视频的手术技能评估手术室的最新方法进行全面分析。方法:使用99个Capsulorhexis视频的数据集,Capsulorhexis是白内障手术的关键步骤,我们评估了先前用于手术技能评估的基于功能的方法,主要是在台式设置下。此外,我们提出并验证了两种使用RGB视频直接评估技能的深度学习方法。在第一种方法中,我们将仪器提示预测为关键点,并使用时间卷积神经网络学习手术技能。在第二种方法中,我们提出了一种用于手术技能评估的新型体系结构,其中包括框架编码器(2D卷积神经网络),然后是时间模型(经常性神经网络),两者都通过视觉注意机制增强。我们通过每种方法通过5倍交叉验证来报告接收器操作特征,灵敏度,特异性和预测值下的面积。结果:对于二进制技能分类的任务(专家与新手),基于Deep神经网络的方法表现出比基于经典时空兴趣点的方法更高的AUC。使用注意机制的神经网络方法还显示出高灵敏度和特异性。结论:深度学习方法是基于视频的手术技能评估手术室中所必需的。我们使用注意力机制对网络的内部有效性的发现应直接使用RGB视频评估技能,以评估其他数据集中的外部有效性。
Purpose: The objective of this investigation is to provide a comprehensive analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. Methods: Using a data set of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluate feature based methods previously developed for surgical skill assessment mostly under benchtop settings. In addition, we present and validate two deep learning methods that directly assess skill using RGB videos. In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks. In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network), both of which are augmented by visual attention mechanisms. We report the area under the receiver operating characteristic curve, sensitivity, specificity, and predictive values with each method through 5-fold cross-validation. Results: For the task of binary skill classification (expert vs. novice), deep neural network based methods exhibit higher AUC than the classical spatiotemporal interest point based methods. The neural network approach using attention mechanisms also showed high sensitivity and specificity. Conclusion: Deep learning methods are necessary for video-based assessment of surgical skill in the operating room. Our findings of internal validity of a network using attention mechanisms to assess skill directly using RGB videos should be evaluated for external validity in other data sets.