论文标题
智能,物理和信息 - 机器学习的准确性和简单性之间的权衡
Intelligence, physics and information -- the tradeoff between accuracy and simplicity in machine learning
论文作者
论文摘要
我们如何使机器能够理解世界并变得更好地学习?为了实现这一目标,我相信从许多整体方面观看情报,以及任务绩效和复杂性之间的普遍两项权衡,提供了两个可行的观点。在本论文中,我使用物理和信息中的策略和工具在智力的某些方面解决了几个关键问题,并研究了两项权衡的相位过渡。首先,我们如何才能使学习模型更加灵活和高效,以便通过更少的例子来快速学习?受到物理学家如何建模世界的启发,我们引入了一个范式和AI物理学家,以同时学习许多小型专业模型(理论)以及它们是准确的领域,然后可以简化,统一和存储,并以一种连续的方式促进几乎没有射击的学习。其次,对于表示学习,何时可以学习良好的表示,学习如何取决于数据集的结构?在调整权衡超参数时,我们通过研究相转换来解决这个问题。在信息瓶颈中,我们从理论上表明,这些相变是可预测的,并且在数据,模型,学习的表示和损失格局之间的关系中揭示了结构。第三,代理如何从观察结果中发现因果关系?我们通过引入一种结合了预测和最小化信息的信息的算法来解决这个问题的一部分,以从观察时间序列中发现探索性因果关系。第四,为了使模型更强大地标记噪声,我们引入了等级修剪,这是一种使用嘈杂标签进行分类的可靠算法。我相信,基于论文的工作,我们将更近一步,以启用可以使世界有意义的更聪明的机器。
How can we enable machines to make sense of the world, and become better at learning? To approach this goal, I believe viewing intelligence in terms of many integral aspects, and also a universal two-term tradeoff between task performance and complexity, provides two feasible perspectives. In this thesis, I address several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information. Firstly, how can we make the learning models more flexible and efficient, so that agents can learn quickly with fewer examples? Inspired by how physicists model the world, we introduce a paradigm and an AI Physicist agent for simultaneously learning many small specialized models (theories) and the domain they are accurate, which can then be simplified, unified and stored, facilitating few-shot learning in a continual way. Secondly, for representation learning, when can we learn a good representation, and how does learning depend on the structure of the dataset? We approach this question by studying phase transitions when tuning the tradeoff hyperparameter. In the information bottleneck, we theoretically show that these phase transitions are predictable and reveal structure in the relationships between the data, the model, the learned representation and the loss landscape. Thirdly, how can agents discover causality from observations? We address part of this question by introducing an algorithm that combines prediction and minimizing information from the input, for exploratory causal discovery from observational time series. Fourthly, to make models more robust to label noise, we introduce Rank Pruning, a robust algorithm for classification with noisy labels. I believe that building on the work of my thesis we will be one step closer to enable more intelligent machines that can make sense of the world.