论文标题
分区学习的花朵过滤器
Partitioned Learned Bloom Filter
论文作者
论文摘要
Bloom过滤器是空间效率的概率数据结构,用于测试元素是否是集合的成员,并且可能会返回误报。最近,通过为代表集合使用学习的模型,开发了被称为学习的Bloom过滤器,可以从假阳性的速度上提供改进的性能。但是,以前学习的Bloom过滤器的方法并不能充分利用学习模型。在这里,我们展示了如何将最佳模型利用的问题构建为优化问题,并使用我们的框架得出算法,这些算法在许多情况下可以实现近乎最佳的性能。来自模拟数据集和现实世界数据集的实验结果表明,我们对原始学习的Bloom滤波器结构和先前提出的启发式改进的优化方法都有显着的性能提高。
Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives. Recently, variations referred to as learned Bloom filters were developed that can provide improved performance in terms of the rate of false positives, by using a learned model for the represented set. However, previous methods for learned Bloom filters do not take full advantage of the learned model. Here we show how to frame the problem of optimal model utilization as an optimization problem, and using our framework derive algorithms that can achieve near-optimal performance in many cases. Experimental results from both simulated and real-world datasets show significant performance improvements from our optimization approach over both the original learned Bloom filter constructions and previously proposed heuristic improvements.