论文标题

使用数据深度的异常检测:多元案例

Anomaly detection using data depth: multivariate case

论文作者

Mozharovskyi, Pavlo, Valla, Romain

论文摘要

异常检测是数据分析和机器学习的一个分支,旨在识别表现出异常行为的观察结果。无论是测量错误,疾病发展,恶劣的天气,生产质量默认(项目)还是设备失败,财务欺诈或危机事件,它们的准时识别,隔离和解释几乎构成了科学和行业的任何分支中的重要任务。通过提供可靠的排序,数据深度 - 统计函数测量空间的任何点归于数据集 - 成为检测异常的特别有用工具。数据深度已经以其理论属性而闻名,在过去的十年中,尤其是近年来经历了实质性的计算发展,这使其适用于当代大小的数据分析和机器学习问题。 在本文中,将数据深度研究为一种有效的异常检测工具,将异常标签分配给了多变量设置中深度值较低的观测值。讨论了不可能的必要性和合理性的实用问题,深度功能的形状,其稳健性和计算复杂性,阈值的选择。插图包括在各种设置中强调数据深度行为的用例。

Anomaly detection is a branch of data analysis and machine learning which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any branch of science and industry. By providing a robust ordering, data depth - statistical function that measures belongingness of any point of the space to a data set - becomes a particularly useful tool for detection of anomalies. Already known for its theoretical properties, data depth has undergone substantial computational developments in the last decade and particularly recent years, which has made it applicable for contemporary-sized problems of data analysis and machine learning. In this article, data depth is studied as an efficient anomaly detection tool, assigning abnormality labels to observations with lower depth values, in a multivariate setting. Practical questions of necessity and reasonability of invariances and shape of the depth function, its robustness and computational complexity, choice of the threshold are discussed. Illustrations include use-cases that underline advantageous behaviour of data depth in various settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源