论文标题

在互动叙事中与社会规范和价值观保持一致

Aligning to Social Norms and Values in Interactive Narratives

论文作者

Ammanabrolu, Prithviraj, Jiang, Liwei, Sap, Maarten, Hajishirzi, Hannaneh, Choi, Yejin

论文摘要

我们专注于创建与互动叙事或基于文本的游戏中对社会有益的规范和价值观保持一致的代理商 - 环境中,代理人通过自然语言感知并与世界互动。这种互动代理通常通过强化学习来培训以优化任务绩效,即使这些奖励可能导致违反社会规范的代理行为 - 对代理本身或环境中的其他实体造成伤害。社会价值一致性是指创建代理人的行为符合给定背景和人群的预期道德和社会规范的代理人 - 在我们的情况下,这意味着行为的代理人的行为方式不那么危害和对自己和他人更有益。 我们建立在Jiminy板球基准的基础上(Hendrycks等,2021),这是一组25个注释的互动叙事,其中包含数千个在道德上显着的场景,涵盖了从盗窃和身体伤害到利他主义的一切。我们介绍了使用经过特殊训练的语言模型中存在的社交常识性知识来限制其行动空间,以使其仅与与社会有益的价值一致的行动,从而介绍了galad(游戏价值对齐),该代理使用特殊训练的语言模型中存在的社交常识性知识。一项实验研究表明,与强有力的当代价值一致性方法相比,盛大的代理人有效地做出足够有效的决策,同时将社会有害行为的频率降低了25%,同时将最新的任务绩效提高了4%。

We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games -- environments wherein an agent perceives and interacts with a world through natural language. Such interactive agents are often trained via reinforcement learning to optimize task performance, even when such rewards may lead to agent behaviors that violate societal norms -- causing harm either to the agent itself or other entities in the environment. Social value alignment refers to creating agents whose behaviors conform to expected moral and social norms for a given context and group of people -- in our case, it means agents that behave in a manner that is less harmful and more beneficial for themselves and others. We build on the Jiminy Cricket benchmark (Hendrycks et al. 2021), a set of 25 annotated interactive narratives containing thousands of morally salient scenarios covering everything from theft and bodily harm to altruism. We introduce the GALAD (Game-value ALignment through Action Distillation) agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values. An experimental study shows that the GALAD agent makes decisions efficiently enough to improve state-of-the-art task performance by 4% while reducing the frequency of socially harmful behaviors by 25% compared to strong contemporary value alignment approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源