从其表面上对日本单词的重音估算，以及构建大型词汇词典的romanizization

论文标题

从其表面上对日本单词的重音估算，以及构建大型词汇词典的romanizization

Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries

论文作者

Tachibana, Hideyuki, Katayama, Yotaro

论文摘要

在日语文本到语音（TTS）中，有必要在输入句子中添加重音信息。但是，有数量有限的公开口音字典，以及这些词典，例如UNIDIC，不包含许多化合物单词，专有名词等，这是实用TTS系统所需的。为了构建包含这些单词的大型口音字典，作者开发了一种重音估计技术，该技术从其有限的信息中预测一个单词的口音，即表面（例如汉字）和Yomi（简化的语音信息）。实验表明，该技术可以用高准确性估计重音，尤其是对于某些类别的单词。作者将此技术应用于现有的大型词汇日本词典Neologd，并获得了大型词汇词汇。已经观察到许多案例使用该字典比Unidic更合适的语音信息。

In Japanese text-to-speech (TTS), it is necessary to add accent information to the input sentence. However, there are a limited number of publicly available accent dictionaries, and those dictionaries e.g. UniDic, do not contain many compound words, proper nouns, etc., which are required in a practical TTS system. In order to build a large scale accent dictionary that contains those words, the authors developed an accent estimation technique that predicts the accent of a word from its limited information, namely the surface (e.g. kanji) and the yomi (simplified phonetic information). It is experimentally shown that the technique can estimate accents with high accuracies, especially for some categories of words. The authors applied this technique to an existing large vocabulary Japanese dictionary NEologd, and obtained a large vocabulary Japanese accent dictionary. Many cases have been observed in which the use of this dictionary yields more appropriate phonetic information than UniDic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题