论文标题
Ripley K函数的概括用于检测面积数据中的空间聚类
A Generalization of Ripley's K Function for the Detection of Spatial Clustering in Areal Data
论文作者
论文摘要
空间聚类检测在各种领域都有多种应用,包括识别传染病爆发,评估土地利用模式,确定犯罪热点以及识别脑成像应用中神经元的簇。在对点过程数据上执行空间聚类分析是常见的,但针对Areal数据的应用通常引起了人们的兴趣。例如,研究人员可能希望知道患有极少数医疗状况的人口普查或传染病爆发往往会在空间上聚集在一起。由于很少为面积数据设计空间聚类方法,因此研究人员通常会减少面积数据以指向过程数据(例如,使用每个Areal单元的质心),并应用设计用于点过程数据的方法,例如Ripley的K函数或平均最近的邻居方法。但是,由于这些方法不是为面积数据而设计的,因此可能会出现许多问题。例如,我们表明它们可以导致功率损失和/或I型错误率显着膨胀。为了解决这些问题,我们提出了Ripley的K函数的概括,专门针对检测Areal数据中的空间聚类。我们将其性能与传统的Ripley K函数,平均最近的邻居方法和空间扫描统计量进行了比较,并通过广泛的模拟研究进行了比较。然后,我们通过使用该方法来检测包含保护地役权的土地包裹和具有高儿科超重/肥胖率的美国县的空间聚类来评估该方法的现实世界表现。
Spatial clustering detection has a variety of applications in diverse fields, including identifying infectious disease outbreaks, assessing land use patterns, pinpointing crime hotspots, and identifying clusters of neurons in brain imaging applications. While performing spatial clustering analysis on point process data is common, applications to areal data are frequently of interest. For example, researchers might wish to know if census tracts with a case of a rare medical condition or an outbreak of an infectious disease tend to cluster together spatially. Since few spatial clustering methods are designed for areal data, researchers often reduce the areal data to point process data (e.g., using the centroid of each areal unit) and apply methods designed for point process data, such as Ripley's K function or the average nearest neighbor method. However, since these methods were not designed for areal data, a number of issues can arise. For example, we show that they can result in loss of power and/or a significantly inflated type I error rate. To address these issues, we propose a generalization of Ripley's K function designed specifically to detect spatial clustering in areal data. We compare its performance to that of the traditional Ripley's K function, the average nearest neighbor method, and the spatial scan statistic with an extensive simulation study. We then evaluate the real world performance of the method by using it to detect spatial clustering in land parcels containing conservation easements and US counties with high pediatric overweight/obesity rates.