建模变革性AI风险（MTAIR）项目 - 摘要报告

论文标题

建模变革性AI风险（MTAIR）项目 - 摘要报告

Modeling Transformative AI Risks (MTAIR) Project -- Summary Report

论文作者

Clarke, Sam, Cottier, Ben, Englander, Aryeh, Eth, Daniel, Manheim, David, Martin, Samuel Dylan, Rice, Issa

论文摘要

该报告概述了建模变革性AI风险（MTAIR）项目的工作，试图在有关高级AI的灾难性风险及其之间的关系的辩论中绘制关键的假设，不确定性和分歧。这是基于Ben Cottier和Rohin Shah的较早图，这些图表以视觉上的一些关键分歧（“ Cruxes”）进行了一些解释。根据广泛的文献综述和与专家的参与，该报告解释了涉及的问题的模型以及最初的基于软件的实施，该实施可以纳入概率估算或其他定量因素，以启用探索，计划和/或决策支持。通过将各种辩论和讨论中的信息收集到一个更连贯的演讲中，我们希望能够更好地讨论有关问题的讨论和辩论。该模型从通过类比的推理和对人工智能的一般性信念进行讨论开始。此后，它为高级机器智能提供了不同的路径和促进技术的模型，以及这些系统能力的进步如何进行的模型，包括有关自我支持，不连续改进的辩论，以及分布式，非代理的高级高级智力或更慢的改进的可能性。该模型还专门研究了学习优化的问题，以及机器学习系统是否会创建梅萨（Mesa）优化器。然后检查了不同的安全研究对先前问题集的影响，以了解研究以及如何在实现更安全的系统中有用。最后，我们讨论了不同的故障模式的模型以及控制或接管场景的丧失。

This report outlines work by the Modeling Transformative AI Risk (MTAIR) project, an attempt to map out the key hypotheses, uncertainties, and disagreements in debates about catastrophic risks from advanced AI, and the relationships between them. This builds on an earlier diagram by Ben Cottier and Rohin Shah which laid out some of the crucial disagreements ("cruxes") visually, with some explanation. Based on an extensive literature review and engagement with experts, the report explains a model of the issues involved, and the initial software-based implementation that can incorporate probability estimates or other quantitative factors to enable exploration, planning, and/or decision support. By gathering information from various debates and discussions into a single more coherent presentation, we hope to enable better discussions and debates about the issues involved. The model starts with a discussion of reasoning via analogies and general prior beliefs about artificial intelligence. Following this, it lays out a model of different paths and enabling technologies for high-level machine intelligence, and a model of how advances in the capabilities of these systems might proceed, including debates about self-improvement, discontinuous improvements, and the possibility of distributed, non-agentic high-level intelligence or slower improvements. The model also looks specifically at the question of learned optimization, and whether machine learning systems will create mesa-optimizers. The impact of different safety research on the previous sets of questions is then examined, to understand whether and how research could be useful in enabling safer systems. Finally, we discuss a model of different failure modes and loss of control or takeover scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题