论文标题
使用积分运算符的单数值估算非参数隐藏马尔可夫模型的顺序
Estimation of the Order of Non-Parametric Hidden Markov Models using the Singular Values of an Integral Operator
论文作者
论文摘要
我们有兴趣评估有限状态隐藏的马尔可夫模型(HMM)的顺序,只有两个假设是,潜在马尔可夫链的过渡矩阵具有完整的等级,并且发射分布的密度函数是线性独立的。我们通过研究一些精心挑选的积分运算符的排名来估算该顺序的新程序,该过程依赖于一对连续观测值的分布。当使用光谱方法估算HMM的顺序时,该方法规避了光谱方法的通常限制:它可以避免选择基础函数的选择;它不需要在HMM阶面上了解上限的任何知识(对于频谱方法,这种上限是由基本函数的数量定义的);它允许使用合适的内核选择轻松处理不同类型的数据(包括连续数据,循环数据或多元连续数据)。该方法依赖于以下事实:HMM的顺序可以从一对连续观测值的分布中识别出来,并且该顺序等于某些积分运算符的等级(\ emph {i.e。}其奇异值的奇异值的数量是非零值的)。由于只能获得操作员的奇异值的经验反应,因此我们提出了一个数据驱动的阈值过程。建立了高估HMM顺序的概率的上限。此外,指出用于内核密度估计和阈值的带宽的足够条件,以获得HMM级别估计值的一致性。由于所有调整参数的值均由样本量确定,因此很容易实现该过程。
We are interested in assessing the order of a finite-state Hidden Markov Model (HMM) with the only two assumptions that the transition matrix of the latent Markov chain has full rank and that the density functions of the emission distributions are linearly independent. We introduce a new procedure for estimating this order by investigating the rank of some well-chosen integral operator which relies on the distribution of a pair of consecutive observations. This method circumvents the usual limits of the spectral method when it is used for estimating the order of an HMM: it avoids the choice of the basis functions; it does not require any knowledge of an upper-bound on the order of the HMM (for the spectral method, such an upper-bound is defined by the number of basis functions); it permits to easily handle different types of data (including continuous data, circular data or multivariate continuous data) with a suitable choice of kernel. The method relies on the fact that the order of the HMM can be identified from the distribution of a pair of consecutive observations and that this order is equal to the rank of some integral operator (\emph{i.e.} the number of its singular values that are non-zero). Since only the empirical counter-part of the singular values of the operator can be obtained, we propose a data-driven thresholding procedure. An upper-bound on the probability of overestimating the order of the HMM is established. Moreover, sufficient conditions on the bandwidth used for kernel density estimation and on the threshold are stated to obtain the consistency of the estimator of the order of the HMM. The procedure is easily implemented since the values of all the tuning parameters are determined by the sample size.