使用EM算法的半监督学习：非结构化和结构化预测之间的比较研究

论文标题

使用EM算法的半监督学习：非结构化和结构化预测之间的比较研究

Semi-supervised Learning with the EM Algorithm: A Comparative Study between Unstructured and Structured Prediction

论文作者

He, Wenchong, Jiang, Zhe

论文摘要

半监督的学习旨在从标记和未标记样本中学习预测模型。该领域已经进行了广泛的研究。在现有工作中，由于清晰的统计特性，具有预期最大化（EM）的生成混合模型是一种流行的方法。但是，假设样品是独立的并且分布相同的，现有有关基于EM的半监督学习的文献主要集中在非结构化的预测上。关于结构化预测中基于EM的半监督方法的研究是有限的。本文旨在通过基于EM的半监督学习中的非结构化方法和结构化方法之间的比较研究来填补空白。具体而言，我们比较了它们的理论属性，并发现两种方法都可以视为自我训练的概括，而无标记的样本的软类分配，但是结构化方法还考虑了软类分配中的结构约束。我们对现实世界中的洪水映射数据集进行了案例研究，以比较两种方法。结果表明，在洪水映射应用程序的背景下，结构化的EM对噪声和障碍物引起的阶级混乱更为强大。

Semi-supervised learning aims to learn prediction models from both labeled and unlabeled samples. There has been extensive research in this area. Among existing work, generative mixture models with Expectation-Maximization (EM) is a popular method due to clear statistical properties. However, existing literature on EM-based semi-supervised learning largely focuses on unstructured prediction, assuming that samples are independent and identically distributed. Studies on EM-based semi-supervised approach in structured prediction is limited. This paper aims to fill the gap through a comparative study between unstructured and structured methods in EM-based semi-supervised learning. Specifically, we compare their theoretical properties and find that both methods can be considered as a generalization of self-training with soft class assignment of unlabeled samples, but the structured method additionally considers structural constraint in soft class assignment. We conducted a case study on real-world flood mapping datasets to compare the two methods. Results show that structured EM is more robust to class confusion caused by noise and obstacles in features in the context of the flood mapping application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题