NEMA：大型网络管理数据库的自动集成

论文标题

NEMA：大型网络管理数据库的自动集成

NEMA: Automatic Integration of Large Network Management Databases

论文作者

Wu, Fubao, Song, Han Hee, Yin, Jiangtao, Gao, Lixin, Baldi, Mario, Anand, Narendra

论文摘要

网络管理，无论是用于故障分析，故障预测，绩效监控和改进的网络管理，通常都涉及来自不同来源的大量数据。为了有效地整合和管理这些来源，在其模式或本体学之间自动找到语义匹配至关重要。匹配数据库上的现有方法主要分为两类。一个侧重于基于模式属性的架构级匹配，例如字段名称，数据类型，约束和模式结构。网络管理数据库包含来自不同部门的大量表（例如，网络产品，事件，安全警报和日志），以及具有非均匀字段名称和模式特征的组。通过这些模式属性匹配它们是不可靠的。另一个类别基于使用常规字符串相似性技术的实例级匹配，这些技术不适用于大型网络管理数据库的匹配。在本文中，我们为大型网络管理数据库（NEMA）开发了一种部署实例级匹配以进行有效数据集成和连接的匹配技术。我们为数值和非数字字段设计了匹配度量和分数，并提出了用于匹配这些字段的算法。通过基于大型网络管理数据库中的地面真实场对进行实验来评估NEMA的有效性和效率。我们在具有1,458个字段的大型数据库上进行的测量，每个数据库都包含超过1000万个记录，这表明NEMA的准确性高达95％。与基线方法相比，它的精度高2％-10％。

Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题