论文标题
INESC-ID多模式系统2020 Adress挑战
The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge
论文作者
论文摘要
本文介绍了一种在INESC-ID人类语言技术实验室参与Adress 2020挑战的背景下,一种自动检测阿尔茨海默氏病的多模式方法。我们的分类框架利用了声学和文本特征嵌入的优势,这些嵌入方式是独立提取和随后合并的。语音信号使用从预训练模型中提取的DNN扬声器嵌入来编码为声学特征。对于文本输入,首先使用英语BERT模型提取上下文嵌入向量,然后用来直接计算句子嵌入或以双向LSTM-RNNS的注意来供应。最后,使用线性内核的SVM分类器用于三个系统的单个评估。根据语言信息和声学信息的组合,我们的最佳系统达到了81.25%的分类精度。结果表明,语言特征在阿尔茨海默氏病的分类中的重要性,这在准确性方面优于声学。早期特征融合没有提供额外的改进,证实语言数据在这种情况下通过语音传达的判别能力是顺畅的。
This paper describes a multi-modal approach for the automatic detection of Alzheimer's disease proposed in the context of the INESC-ID Human Language Technology Laboratory participation in the ADReSS 2020 challenge. Our classification framework takes advantage of both acoustic and textual feature embeddings, which are extracted independently and later combined. Speech signals are encoded into acoustic features using DNN speaker embeddings extracted from pre-trained models. For textual input, contextual embedding vectors are first extracted using an English Bert model and then used either to directly compute sentence embeddings or to feed a bidirectional LSTM-RNNs with attention. Finally, an SVM classifier with linear kernel is used for the individual evaluation of the three systems. Our best system, based on the combination of linguistic and acoustic information, attained a classification accuracy of 81.25%. Results have shown the importance of linguistic features in the classification of Alzheimer's Disease, which outperforms the acoustic ones in terms of accuracy. Early stage features fusion did not provide additional improvements, confirming that the discriminant ability conveyed by speech in this case is smooth out by linguistic data.