论文标题
Primock57:初级保健模拟咨询数据集
PriMock57: A Dataset Of Primary Care Mock Consultations
论文作者
论文摘要
自动语音识别(ASR)的最新进展使得可以可靠地产生临床医生对话的自动转录本。但是,由于患者的隐私,访问临床数据集受到严重限制,从而减慢了正常的研究实践。我们详细介绍了公共访问的开发,包括57个嘲笑的初级保健咨询的高质量数据集,包括录音,手动话语级转录以及相关的咨询说明。我们的工作说明了如何将数据集用作对话医学ASR的基准以及从成绩单中产生的咨询注释。
Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.