论文标题
扬声器骑士:使用反复的神经网络
Speaker Diarization: Using Recurrent Neural Networks
论文作者
论文摘要
扬声器诊断是在音频中分开扬声器的问题。当说话者启动和结束时,可能会有任何数量的发言人,最终结果应陈述。在此项目中,我们分析了带有2个频道和2个扬声器的音频文件(在单独的频道上)。我们在一个人说话时训练神经网络学习。我们专门使用不同类型的神经网络,单层PESCEPTRON(SLP),多层PESCEPTRON(MLP),复发性神经网络(RNN)和卷积神经网络(CNN),我们实现了$ \ sim $ \ sim $ 92 \%$ 92 \%的精度。该项目的代码可从https://github.com/vishalshar/speakerdiarization_rnn_cnn_lstm获得
Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channel). We train Neural Network for learning when a person is speaking. We use different type of Neural Networks specifically, Single Layer Perceptron (SLP), Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Convolution Neural Network (CNN) we achieve $\sim$92\% of accuracy with RNN. The code for this project is available at https://github.com/vishalshar/SpeakerDiarization_RNN_CNN_LSTM