论文标题
双重巡逻:绿色安全的多臂匪徒
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
论文作者
论文摘要
绿色安全领域保护野生动植物和森林的保护工作受到捍卫者(即巡逻者)的有限限制,他们必须巡逻广阔的地区以保护袭击者(例如,偷猎者或非法登记员)。防守者必须选择在保护区的每个地区花费多少时间,平衡探索很少访问的地区的探索以及对已知热点的剥削。我们将问题提出为随机的多军强盗,每个行动都代表巡逻策略,使我们能够保证巡逻政策的收敛速度。但是,一种天真的匪徒方法会损害长期最佳性能的短期表现,从而导致动物偷猎和森林被破坏。为了加快绩效,我们利用奖励功能和动作可分解性的平滑度。我们在Lipschitz-continition和分解之间表现出协同作用,因为每种都有助于对方的收敛。在此过程中,我们弥合了组合和Lipschitz匪徒之间的差距,提出了一种无重格的方法,可以收紧现有的保证,同时优化短期性能。我们证明了我们的算法,蜥蜴,提高了柬埔寨实际偷猎数据的性能。
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.