论文标题
基于传感器的人类活动识别的多视图融合变压器
Multi-View Fusion Transformer for Sensor-Based Human Activity Recognition
论文作者
论文摘要
作为无处不在的计算和机器学习中的一个基本问题,基于传感器的人类活动识别(HAR)引起了广泛的关注,并在近年来取得了长足的进步。 HAR旨在根据从加速度计和陀螺仪等多模式传感器收集的丰富时间序列数据的可用性来识别人类活动。但是,最近的深度学习方法集中在数据的一种视图上,即时间视图,而浅方法倾向于利用手工艺特征来识别,例如统计数据。在本文中,为了提取更好的性能,我们提出了一种新颖的方法,即多视图融合变压器(MVFT)以及一种新颖的注意机制。首先,MVFT编码三个信息视图,即时间,频繁和统计视图,以生成多视图特征。其次,新型的注意机制揭示了内部和跨视图的线索,以催化三种视图之间的相互作用以进行详细的关系建模。此外,两个数据集上的广泛实验说明了我们方法比几种最新方法的优越性。
As a fundamental problem in ubiquitous computing and machine learning, sensor-based human activity recognition (HAR) has drawn extensive attention and made great progress in recent years. HAR aims to recognize human activities based on the availability of rich time-series data collected from multi-modal sensors such as accelerometers and gyroscopes. However, recent deep learning methods are focusing on one view of the data, i.e., the temporal view, while shallow methods tend to utilize the hand-craft features for recognition, e.g., the statistics view. In this paper, to extract a better feature for advancing the performance, we propose a novel method, namely multi-view fusion transformer (MVFT) along with a novel attention mechanism. First, MVFT encodes three views of information, i.e., the temporal, frequent, and statistical views to generate multi-view features. Second, the novel attention mechanism uncovers inner- and cross-view clues to catalyze mutual interactions between three views for detailed relation modeling. Moreover, extensive experiments on two datasets illustrate the superiority of our methods over several state-of-the-art methods.