Top-K Onshelf实用程序采矿的通用算法

论文标题

Top-K Onshelf实用程序采矿的通用算法

A Generic Algorithm for Top-K On-Shelf Utility Mining

论文作者

Chen, Jiahui, Guo, Xu, Gan, Wensheng, Wan, Shichen, Yu, Philip S.

论文摘要

搁置实用工具采矿（OSUM）是数据挖掘的新兴研究方向。它旨在发现在其销售时间内具有相对效用的物品集。与传统的公用事业开采相比，Osum可以在现实生活中找到更多实用和有意义的模式。但是，传统的Osum有一个主要的缺点。对于普通用户而言，很难定义最低阈值细节来挖掘适量的货架上高公用项目集。一方面，如果设置阈值太高，则模式的数量将不够。另一方面，如果设定阈值太低，则会发现太多模式，并导致不必要的时间和记忆消耗。为了解决此问题，用户通常直接指定一个参数k，其中仅考虑顶级相对实用项目集。因此，在本文中，我们提出了一种通用算法TOIT，该算法用于挖掘Top-K Onshelf高耗时模式来解决此问题。 TOIT采用了一种新颖的策略来根据架子上的数据集提高细节。此外，还采用了两种名为Subtree实用程序的新型上限策略，并应用了本地实用程序来修剪搜索空间。通过采用上述策略，TOIT算法可以尽早缩小搜索空间，提高采矿效率并降低记忆消耗，以便比其他算法获得更好的性能。在具有不同样式的真实数据集上进行了一系列实验，以将效果与最新的Koshu算法进行比较。实验结果表明，在运行时间和内存消耗中，TOIT都优于Koshu。

On-shelf utility mining (OSUM) is an emerging research direction in data mining. It aims to discover itemsets that have high relative utility in their selling time period. Compared with traditional utility mining, OSUM can find more practical and meaningful patterns in real-life applications. However, there is a major drawback to traditional OSUM. For normal users, it is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets. On one hand, if the threshold is set too high, the number of patterns would not be enough. On the other hand, if the threshold is set too low, too many patterns will be discovered and cause an unnecessary waste of time and memory consumption. To address this issue, the user usually directly specifies a parameter k, where only the top-k high relative utility itemsets would be considered. Therefore, in this paper, we propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns to solve this problem. TOIT applies a novel strategy to raise the minutil based on the on-shelf datasets. Besides, two novel upper-bound strategies named subtree utility and local utility are applied to prune the search space. By adopting the strategies mentioned above, the TOIT algorithm can narrow the search space as early as possible, improve the mining efficiency, and reduce the memory consumption, so it can obtain better performance than other algorithms. A series of experiments have been conducted on real datasets with different styles to compare the effects with the state-of-the-art KOSHU algorithm. The experimental results showed that TOIT outperforms KOSHU in both running time and memory consumption.

下载PDF全文

下载文献需遵守相关版权规定

论文标题