论文标题
人工耳蜗机械师和实时应用过滤器调整的卷积神经网络模型
A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications
论文作者
论文摘要
听觉模型通常用作自动语音识别系统的功能提取器,或者用作机器人技术,机器听力和听力aid应用的前端。尽管听觉模型可以详细地捕获人类听力的生物物理和非线性特性,但这些生物物理模型在计算上很昂贵,不能用于实时应用中。我们提出了一种混合方法,其中卷积神经网络与计算神经科学结合使用,以产生人工耳蜗机械师的实时端到端模型,包括依赖水平的滤波器调整(Connear)。使用(看不见的)声音刺激在耳蜗力学研究中使用(看不见的)声音刺激对Connear模型进行了培训,并评估了其性能和适用性。 Connear模型准确地模拟了人耳蜗频率选择性及其对声音强度的依赖性,这是在负面语音到背景噪声比率下进行稳健语音清晰度的必不可少的质量。 Connear架构基于平行和可区分的计算,并具有实现实时人类绩效的能力。这些独特的Connear功能将使下一代类似人类的机器听觉应用。
Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications.