论文标题
用于掌握位置检测和少量分类的多任务学习框架
A Multi-task Learning Framework for Grasping-Position Detection and Few-Shot Classification
论文作者
论文摘要
拾取机器人的深度学习模型需要许多标记的图像是一个大问题。重新培训模型的运营成本变得非常昂贵,因为产品或零件的对象形状经常在工厂中更改。重要的是减少训练拾取机器人模型所需的标记图像量。在这项研究中,我们提出了一个多任务学习框架,用于使用来自检测握把位置的模型中间层的特征向量进行几次分类。在制造领域,通常需要用于拾取机器人的多任务分类和握把检测。先前的多任务学习研究包括从深度神经网络(DNN)学习另一个任务的特征向量学习一项任务的方法。但是,用于检测抓地位置的DNN在从层中提取特征向量以进行形状分类有两个问题:(1)由于输入图像中的所有对象都会激活握把位置检测DNN的每一层,因此需要对每个握把位置进行改进。 (2)有必要选择一层以提取适合形状分类的功能。为了解决这些问题,我们提出了一种方法来完善每个抓地位置的功能,并从DNN的最佳层中选择功能。然后,我们使用握把位置的这些特征评估了形状分类精度。我们的结果证实,即使输入图像包含多个对象,并且可以训练的图像数量很少,我们提出的框架也可以对对象形状进行分类。
It is a big problem that a model of deep learning for a picking robot needs many labeled images. Operating costs of retraining a model becomes very expensive because the object shape of a product or a part often is changed in a factory. It is important to reduce the amount of labeled images required to train a model for a picking robot. In this study, we propose a multi-task learning framework for few-shot classification using feature vectors from an intermediate layer of a model that detects grasping positions. In the field of manufacturing, multitask for shape classification and grasping-position detection is often required for picking robots. Prior multi-task learning studies include methods to learn one task with feature vectors from a deep neural network (DNN) learned for another task. However, the DNN that was used to detect grasping positions has two problems with respect to extracting feature vectors from a layer for shape classification: (1) Because each layer of the grasping position detection DNN is activated by all objects in the input image, it is necessary to refine the features for each grasping position. (2) It is necessary to select a layer to extract the features suitable for shape classification. To tackle these issues, we propose a method to refine the features for each grasping position and to select features from the optimal layer of the DNN. We then evaluated the shape classification accuracy using these features from the grasping positions. Our results confirm that our proposed framework can classify object shapes even when the input image includes multiple objects and the number of images available for training is small.