机器学习局限性的层次结构

论文标题

机器学习局限性的层次结构

A Hierarchy of Limitations in Machine Learning

论文作者

Malik, Momin M.

论文摘要

乔治·E·P·布克（George E. P. Box，1979）写道：“所有模型都是错误的，但有些是有用的。”机器学习的重点是概率模型在社交系统中的预测有用性，但直到现在才能掌握这些模型是错误的方式以及这些缺点的后果。本文尝试对机器学习中的特定概念，程序和统计局限性进行全面的结构化概述。机器学习建模者本身可以使用所描述的层次结构来识别可能的故障点，并通过如何解决问题来思考，而机器学习模型的消费者可以知道在面对有关是否，何处以及如何应用机器学习的决定时要质疑什么。局限性来自量化本身固有的承诺，即表明未建模的依赖性如何导致交叉验证过于乐观，作为评估模型性能的一种方式。

"All models are wrong, but some are useful", wrote George E. P. Box (1979). Machine learning has focused on the usefulness of probability models for prediction in social systems, but is only now coming to grips with the ways in which these models are wrong---and the consequences of those shortcomings. This paper attempts a comprehensive, structured overview of the specific conceptual, procedural, and statistical limitations of models in machine learning when applied to society. Machine learning modelers themselves can use the described hierarchy to identify possible failure points and think through how to address them, and consumers of machine learning models can know what to question when confronted with the decision about if, where, and how to apply machine learning. The limitations go from commitments inherent in quantification itself, through to showing how unmodeled dependencies can lead to cross-validation being overly optimistic as a way of assessing model performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题