通过标签分布学习，对说话者的年龄估计

论文标题

通过标签分布学习，对说话者的年龄估计

Towards Speaker Age Estimation with Label Distribution Learning

论文作者

Si, Shijing, Wang, Jianzong, Peng, Junqing, Xiao, Jing

论文摘要

说话者年龄估计的现有方法通常将其视为多类分类或回归问题。但是，由于标签歧义，精确的年龄识别仍然是一个挑战，\ emph {i.e。}，同一人相邻年龄的话语通常是无法区分的。为了解决这个问题，我们利用年龄标签之间的模棱两可的信息，将每个年龄标签转换为离散标签分布，并利用标签分布学习（LDL）方法来适合数据。对于每个音频数据样本，我们的方法都会产生其说话者的年龄分布，除了分布之外，我们还执行了其他两个任务：年龄预测和年龄不确定性最小化。因此，我们的方法自然结合了年龄分类和回归方法，从而增强了我们方法的鲁棒性。我们在公共NIST SRE08-10数据集和一个现实世界数据集上进行实验，这些数据集表明我们的方法以相对较大的差距优于基线方法，从而在现实数据集上的平均绝对错误（MAE）降低了10 \％。

Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discrete label distribution and leverage the label distribution learning (LDL) method to fit the data. For each audio data sample, our method produces a age distribution of its speaker, and on top of the distribution we also perform two other tasks: age prediction and age uncertainty minimization. Therefore, our method naturally combines the age classification and regression approaches, which enhances the robustness of our method. We conduct experiments on the public NIST SRE08-10 dataset and a real-world dataset, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding a 10\% reduction in terms of mean absolute error (MAE) on a real-world dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题