论文标题
自然语言系统中的秘密不安全文本
Mitigating Covertly Unsafe Text within Natural Language Systems
论文作者
论文摘要
对于智能技术而言,越来越普遍的问题是文本安全性,因为不受控制的系统可能会向其用户提出建议,从而导致伤害或威胁生命的后果。但是,可能导致身体伤害的生成陈述的显性程度各不相同。在本文中,我们区分了可能导致身体伤害的文本类型,并建立一个特别不受欢迎的类别:秘密不安全的文本。然后,我们就系统的信息进一步分解了这一类别,并讨论解决方案以减轻每个子类别中的文本生成。最终,我们的工作定义了秘密不安全语言的问题,这会造成身体伤害,并认为利益相关者和监管机构需要优先考虑这个微妙而危险的问题。我们强调缓解策略,以激发未来的研究人员解决这个具有挑战性的问题并帮助改善智能系统内的安全性。
An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.