与分组样品混合模型的普遍可识别性界限

论文标题

与分组样品混合模型的普遍可识别性界限

Generalized Identifiability Bounds for Mixture Models with Grouped Samples

论文作者

Vandermeulen, Robert A., Saitenmacher, René

论文摘要

最近的工作表明，具有$ M $组件的有限混合模型是可识别的，而对混合组件的假设没有假设，只要一个人可以访问已知来自同一混合物组件的尺寸$ 2M-1 $的样本。在这项工作中，我们概括了结果，并表明，如果混合模型的$ K $混合组件的每个子集都是线性独立的，则该混合模型仅可识别，只有$（2M-1）/（K-1）$样品。我们进一步表明该值无法改善。我们证明了一种类似的结果，即一种更强的可识别性，即“确定性”以及相应的下限。如果从$ k $维空间中随机选择混合组件，则几乎可以肯定地确定这种假设。我们描述了我们对多项式混合模型和主题建模的结果的一些含义。

Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题