论文标题
多个零泄漏结果的贝叶斯聚类
Bayesian clustering of multiple zero-inflated outcomes
论文作者
论文摘要
涉及计数的几种应用显示了很大一部分的零(二方数据)。此类数据的流行模型是障碍模型,该模型明确地模拟了零计数的概率,同时假设在正整数上进行采样分布。我们考虑来自多个计数过程的数据。在这种情况下,研究计数的模式并相应地集中了受试者。我们介绍了一种新型的贝叶斯非参数方法,用于群集多个,可能相关的零泄漏过程。我们提出了一个用于零充气计数的联合模型,为每个过程指定了一个跨越二项式采样分布的障碍模型。在模型参数上有条件地,假定不同的过程是独立的,与传统的多元方法相比,参数数量大大减少。通过具有随机数组件的富集有限混合物,可以灵活地模拟零通置和采样分布的参数的特定概率。这将基于零/非零模式(外聚类)和采样分布(内聚类)诱导受试者的两级聚类。后推断是通过量身定制的MCMC方案进行的。我们证明了有关涉及使用消息服务WhatsApp的应用程序的建议方法。
Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the Hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian nonparametric approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a Hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared to traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored MCMC schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp.