论文标题

公民科学数据的双重机器学习趋势模型

A Double Machine Learning Trend Model for Citizen Science Data

论文作者

Fink, Daniel, Johnston, Alison, Strimas-Mackey, Matt, Auer, Tom, Hochachka, Wesley M., Ligocki, Shawn, Jaromczyk, Lauren Oldham, Robinson, Orin, Wood, Chris, Kelling, Steve, Rodewald, Amanda D.

论文摘要

1。公民和社区科学(CS)数据集具有巨大的潜力,可以估计每年在全球收集的大量数据,估计人口跨越的人口变化模式。但是,使许多CS项目能够收集大量数据的灵活协议通常缺乏在多年来保持一致采样所需的结构。这导致了年际混淆,因为随着时间的流逝,观察过程的变化与物种种群规模的变化混淆。 2。在这里,我们描述了一种新颖的建模方法,旨在估计物种种群趋势,同时控制公民科学数据中常见的混淆。该方法基于双机器学习,这是一个使用机器学习方法来估计人口变化的统计框架,以及用于调整数据中发现的混淆的倾向分数。此外,我们开发了一种模拟方法,以识别和调整倾向分数错过的残留混淆。使用这种新方法,我们可以从公民科学数据中产生空间详细的趋势估计。 3。为了说明该方法,我们使用CS项目Ebird的数据估算了物种趋势。我们使用仿真研究来评估该方法在面对现实世界中估计空间变化趋势的能力。结果表明,以27公里的分辨率在空间恒定和空间变化的趋势上区分了趋势估计。在估计的人口变化方向(增加/减少)和估计幅度的高相关性上的错误率较低。 4。在考虑公民科学数据中混淆的同时估算空间明确趋势的能力有可能填补重要的信息差距,有助于估计物种,地区或季节的人口趋势,而无需严格的监视数据。

1. Citizen and community-science (CS) datasets have great potential for estimating interannual patterns of population change given the large volumes of data collected globally every year. Yet, the flexible protocols that enable many CS projects to collect large volumes of data typically lack the structure necessary to keep consistent sampling across years. This leads to interannual confounding, as changes to the observation process over time are confounded with changes in species population sizes. 2. Here we describe a novel modeling approach designed to estimate species population trends while controlling for the interannual confounding common in citizen science data. The approach is based on Double Machine Learning, a statistical framework that uses machine learning methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data. Additionally, we develop a simulation method to identify and adjust for residual confounding missed by the propensity scores. Using this new method, we can produce spatially detailed trend estimates from citizen science data. 3. To illustrate the approach, we estimated species trends using data from the CS project eBird. We used a simulation study to assess the ability of the method to estimate spatially varying trends in the face of real-world confounding. Results showed that the trend estimates distinguished between spatially constant and spatially varying trends at a 27km resolution. There were low error rates on the estimated direction of population change (increasing/decreasing) and high correlations on the estimated magnitude. 4. The ability to estimate spatially explicit trends while accounting for confounding in citizen science data has the potential to fill important information gaps, helping to estimate population trends for species, regions, or seasons without rigorous monitoring data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源