论文标题

与Dirichlet过程混合物的聚类一致性

Clustering consistency with Dirichlet process mixtures

论文作者

Ascolani, Filippo, Lijoi, Antonio, Rebaudo, Giovanni, Zanella, Giacomo

论文摘要

Dirichlet工艺混合物是灵活的非参数模型,特别适合密度估计和概率聚类。在这项工作中,我们研究了随着样本量的增加而导致的后验分布,当观察到的数据是从有限混合物生成时,dirichlet工艺混合物引起的混合物所引起的后验分布,并且更具体地专注于未知簇数的一致性。至关重要的是,我们考虑了在基础迪里奇过程的浓度参数上放置先验的情况。文献中的先前发现表明,如果将浓度参数固定并来自有限的混合物,则dirichlet工艺混合物通常不一致。在这里,我们表明,如果以完全贝叶斯的方式对浓度参数进行调整,则可以实现簇数的一致性,就像在实践中通常这样做一样。我们的结果是从一类有限混合物的数据中得出的,对浓度参数的先验以及混合物的可能性核的多种选择。

Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源