论文标题

通过聚类应用程序促进的局部深度功能的分析和统计特性

Analytical and statistical properties of local depth functions motivated by clustering applications

论文作者

Francisci, Giacomo, Agostinelli, Claudio, Nieto-Reyes, Alicia, Vidyashankar, Anand N.

论文摘要

本地一般深度($ LGD $)功能用于描述多元分布中的本地几何特征和模式。在本文中,我们对$ LGD $进行了严格的系统研究,并建立了几种分析和统计属性。首先,我们表明,当基本概率分布与密度$ f(\ cdot)$完全连续时,$ lgd $的缩放版本(称为$τ$ -AppRoximation)均匀地收敛,in $ l^d(\ m athbb {rathbb {r}^p)$ f(r} p)$ f(\ cdot)$ f(\ cdot)$ vesse $ vesse $ vesse。其次,我们确定,随着样本量向无穷大,中心和缩放样本$ lgd $分配分配到一个集中的高斯过程,在$ \ nathcal {h} _g $上的有限函数的空间中均匀地分配到一个均匀的高斯过程,这是一类功能,产生了$ LGD $。第三,使用$τ$ -Approximation($SτA$)的示例版本和梯度系统分析,我们开发了一种新的聚类算法。该算法的有效性需要几个结果,即与$sτa$相关的梯度系统的均匀有限差近似。因此,我们建立了\ emph {Bernstein} - 类型的不平等,因为中心和缩放样本$ LGD $之间的偏差也是独立的利益。最后,调用上述结果,我们建立了聚类算法的一致性。还提供了所提出的方法在模式估计和上层集合估计中的应用。使用数值实验和数据分析评估方法的有限样本性能。

Local general depth ($LGD$) functions are used for describing the local geometric features and mode(s) in multivariate distributions. In this paper, we undertake a rigorous systematic study of $LGD$ and establish several analytical and statistical properties. First, we show that, when the underlying probability distribution is absolutely continuous with density $f(\cdot)$, the scaled version of $LGD$ (referred to as $τ$-approximation) converges, uniformly and in $L^d(\mathbb{R}^p)$ to $f(\cdot)$ when $τ$ converges to zero. Second, we establish that, as the sample size diverges to infinity the centered and scaled sample $LGD$ converge in distribution to a centered Gaussian process uniformly in the space of bounded functions on $\mathcal{H}_G$, a class of functions yielding $LGD$. Third, using the sample version of the $τ$-approximation ($S τA$) and the gradient system analysis, we develop a new clustering algorithm. The validity of this algorithm requires several results concerning the uniform finite difference approximation of the gradient system associated with $S τA$. For this reason, we establish \emph{Bernstein}-type inequality for deviations between the centered and scaled sample $LGD$, which is also of independent interest. Finally, invoking the above results, we establish consistency of the clustering algorithm. Applications of the proposed methods to mode estimation and upper level set estimation are also provided. Finite sample performance of the methodology are evaluated using numerical experiments and data analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源