使用音频数据的深度卷积神经网络进行监视

论文标题

使用音频数据的深度卷积神经网络进行监视

Deep Convolutional Neural Network for Roadway Incident Surveillance Using Audio Data

论文作者

Islam, Zubayer, Abdel-Aty, Mohamed

论文摘要

崩溃事件的识别和预测在理解运输系统的安全条件中起着至关重要的作用。尽管现有系统使用与崩溃数据相关的流量参数来对这些模型进行分类和训练，但我们建议使用一个新型的感官单元，该单元还可以准确地识别崩溃事件：麦克风。可以收集和分析音频事件，以对诸如崩溃之类的事件进行分类。在本文中，我们证明了使用深卷积神经网络（CNN）进行道路事件分类。重要的音频参数，例如MEL频率曲线系数（MFCC），对数MEL-FILTERBANK能量谱和傅立叶光谱。此外，通过使用音频增强技术（例如时间和音高变化），还通过使用音频增强技术来增强数据集。与功能提取一起，此数据增强可以达到合理的精度。可以准确地识别出四个事件，例如崩溃，轮胎滑雪，喇叭和警笛声，指示道路危险对交通操作员或护理人员很有用。提出的方法可以达到高达94％的精度。这样的音频系统可以作为物联网（IoT）平台的一部分实现，该平台可以在不完全覆盖的情况下对基于视频的传感器进行补充。

Crash events identification and prediction plays a vital role in understanding safety conditions for transportation systems. While existing systems use traffic parameters correlated with crash data to classify and train these models, we propose the use of a novel sensory unit that can also accurately identify crash events: microphone. Audio events can be collected and analyzed to classify events such as crash. In this paper, we have demonstrated the use of a deep Convolutional Neural Network (CNN) for road event classification. Important audio parameters such as Mel Frequency Cepstral Coefficients (MFCC), log Mel-filterbank energy spectrum and Fourier Spectrum were used as feature set. Additionally, the dataset was augmented with more sample data by the use of audio augmentation techniques such as time and pitch shifting. Together with the feature extraction this data augmentation can achieve reasonable accuracy. Four events such as crash, tire skid, horn and siren sounds can be accurately identified giving indication of a road hazard that can be useful for traffic operators or paramedics. The proposed methodology can reach accuracy up to 94%. Such audio systems can be implemented as a part of an Internet of Things (IoT) platform that can complement video-based sensors without complete coverage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题