论文标题
人工智能,价值和一致性
Artificial Intelligence, Values and Alignment
论文作者
论文摘要
本文着眼于在AI一致的背景下出现的哲学问题。它捍卫三个命题。首先,AI对准问题的规范和技术方面是相互关联的,为在两个领域工作的人之间的生产互动提供了空间。其次,重要的是要明确结盟的目标。 AI与指令,意图,揭示偏好,理想的偏好,兴趣和价值观的AI之间存在显着差异。在这种情况下,以系统的方式结合了这些元素的基于原则的AI对齐方式,在这种情况下具有很大的优势。第三,理论家的核心挑战不是要确定AI的“真实”道德原则。相反,这是确定一致性的公平原则,尽管人们的道德信念广泛差异,但仍获得反思性认可。本文的最后一部分探讨了有可能确定AI一致性的公平原则的三种方式。
This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive engagement between people working in both domains. Second, it is important to be clear about the goal of alignment. There are significant differences between AI that aligns with instructions, intentions, revealed preferences, ideal preferences, interests and values. A principle-based approach to AI alignment, which combines these elements in a systematic way, has considerable advantages in this context. Third, the central challenge for theorists is not to identify 'true' moral principles for AI; rather, it is to identify fair principles for alignment, that receive reflective endorsement despite widespread variation in people's moral beliefs. The final part of the paper explores three ways in which fair principles for AI alignment could potentially be identified.