论文标题
模块化架构足够吗?
Is a Modular Architecture Enough?
论文作者
论文摘要
受到人类认知的启发,机器学习系统逐渐揭示了更稀疏和模块化架构的优势。最近的工作表明,一些模块化体系结构不仅可以很好地概括,而且还导致更好的分布概括,扩展属性,学习速度和解释性。此类系统成功背后的一个关键直觉是,大多数现实世界设置的数据生成系统被认为包括稀疏的交互部分,并且赋予具有相似电感偏见的模型会有所帮助。但是,由于这些现实世界数据分布是复杂且未知的,因此缺乏对此类系统进行严格的定量评估的领域。在这项工作中,我们通过简单且已知的模块化数据分布的镜头对通用模块化体系结构进行了彻底的评估。我们强调了模块化和稀疏性的好处,并揭示了优化模块化系统时对挑战所面临的挑战的见解。在此过程中,我们提出了评估指标,以突出模块化的好处,这些好处是实质性的制度,以及当前端到端学习的模块化系统的次级次数,而不是其声称的潜力。
Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.