论文标题
通过决策理论对手为AGI/ASI的跟腱高跟鞋
Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
论文作者
论文摘要
随着AI的进展继续前进,重要的是要知道高级系统将如何做出选择以及以什么方式失败。机器已经可以在某些领域中超越人类,并了解如何安全地构建可能在人类层面上具有或高于人类水平的能力的人。人们可能会怀疑人为的智能(AGI)和人为的超智能(ASI)将是人类无法可靠地超越的系统。作为对这一假设的挑战,本文提出了阿喀琉斯高跟假设,该假设指出,即使是潜在的超级智能系统也可能具有稳定的决策理论妄想,这会导致他们在对抗环境中做出不合理的决策。在对决策理论文献中关键困境和悖论的调查中,以此假设讨论了许多潜在的致命弱点。为了理解这些弱点可能被植入系统的方式,做出了一些新的贡献。
As progress in AI continues to advance, it is important to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make irrational decisions in adversarial settings. In a survey of key dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made toward understanding the ways in which these weaknesses might be implanted into a system.