论文标题

共轭混合物模型用于聚类多模式数据

Conjugate Mixture Models for Clustering Multimodal Data

论文作者

Khalidov, Vasil, Forbes, Florence, Horaud, Radu

论文摘要

每当收集几个物理上不同的传感器数据时,就会出现多模式聚类的问题。从不同方式的观察中,不一定会在某种意义上保持一致,没有明显的方法可以在某些公共空间中关联或比较它们。解决方案可能包括针对每种模式独立考虑多个聚类任务。这种方法的主要困难是确保单峰聚类相互一致。在本文中,我们表明可以在新型框架中(即共轭混合物模型)中解决多模式聚类。这些模型利用了在未观察到的参数空间(对象)和每个观测空间(传感器)之间通常可用的显式转换。我们将问题提出为可能性最大化任务,并得出相关的共轭期望最大化算法。彻底研究了所提出算法的收敛特性。提出了几种本地/全球优化技术,以提高其收敛速度。提出并比较了两种初始化策略。提出了一致的模型选择标准。在使用听觉和视觉数据的几个说话者的3D定位任务中对算法及其变体进行了测试和评估。

The problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or to compare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality. The main difficulty with such an approach is to guarantee that the unimodal clusterings are mutually consistent. In this paper we show that multimodal clustering can be addressed within a novel framework, namely conjugate mixture models. These models exploit the explicit transformations that are often available between an unobserved parameter space (objects) and each one of the observation spaces (sensors). We formulate the problem as a likelihood maximization task and we derive the associated conjugate expectation-maximization algorithm. The convergence properties of the proposed algorithm are thoroughly investigated. Several local/global optimization techniques are proposed in order to increase its convergence speed. Two initialization strategies are proposed and compared. A consistent model-selection criterion is proposed. The algorithm and its variants are tested and evaluated within the task of 3D localization of several speakers using both auditory and visual data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源