论文标题
使用凸聚类拟合稀疏的马尔可夫模型与分类时间序列
Fitting Sparse Markov Models to Categorical Time Series Using Convex Clustering
论文作者
论文摘要
高阶马尔可夫链经常用于模拟分类时间序列。但是,拟合此类模型的一个主要问题是模型顺序中的参数数量成倍增长。一种流行的模型建模方法是使用可变长度马尔可夫链(VLMC),该链(VLMC)确定可变订单的相关上下文(最近的过去)并形成上下文树。稀疏的马尔可夫模型(SMMS)给出了一种更通用的简约建模方法,其中所有可能的订单$ m $历史均进行了分区,以使过渡概率向量对于属于任何特定组的历史相同。在本文中,我们开发了一种基于凸聚类和正则化的拟合SMM的优雅方法。使用BIC标准选择正则化参数。理论结果建立了我们方法对大样本量的模型选择一致性。在不同的设置下进行了广泛的仿真结果,以研究该方法的有限样本性能。关于建模和分类疾病子类型的实际数据分析也证明了我们方法的适用性。
Higher-order Markov chains are frequently used to model categorical time series. However, a major problem with fitting such models is the exponentially growing number of parameters in the model order. A popular approach to parsimonious modeling is to use a Variable Length Markov Chain (VLMC), which determines relevant contexts (recent pasts) of variable orders and forms a context tree. A more general parsimonious modeling approach is given by Sparse Markov Models (SMMs), where all possible histories of order $m$ are partitioned such that the transition probability vectors are identical for the histories belonging to any particular group. In this paper, we develop an elegant method of fitting SMMs based on convex clustering and regularization. The regularization parameter is selected using the BIC criterion. Theoretical results establish model selection consistency of our method for large sample size. Extensive simulation results under different set-ups are presented to study finite sample performance of the method. Real data analysis on modelling and classifying disease sub-types demonstrates the applicability of our method as well.