论文标题
积极学习顺序嵌入:足球数据的用户研究
Active Learning of Ordinal Embeddings: A User Study on Football Data
论文作者
论文摘要
人类使用未知的相似性函数在未标记的数据集中天生测量实例之间的距离。距离指标只能作为信息检索相似实例的相似性的代理。从人类注释中学习良好的相似性功能可以提高检索的质量。这项工作使用深度度量学习来从很少的足球轨迹数据集中学习这些用户定义的相似性功能。我们将基于熵的活跃学习方法从三胞胎开采中进行了最新的工作,以收集易于招募的人,但仍可从人类参与者那里获得信息的注释,并使用它们来训练深度卷积网络,从而概括了看不见的样本。我们的用户研究表明,与以前依赖暹罗网络的深度度量学习方法相比,我们的方法提高了信息检索的质量。具体而言,我们通过分析参与者的反应效率来阐明被动抽样启发式方法和主动学习者的优势和缺点。为此,我们收集准确性,算法时间复杂性,参与者的疲劳和时间响应,定性自我评估和陈述,以及混合膨胀注释者的影响及其对模型性能和转移学习的一致性。
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset. We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples. Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.