论文标题
隐私保护直接遵循图:在过程挖掘中平衡风险和实用性
Privacy-Preserving Directly-Follows Graphs: Balancing Risk and Utility in Process Mining
论文作者
论文摘要
流程挖掘技术使组织能够分析业务流程执行跟踪,以确定改善其运营绩效的机会。通常,这种执行轨迹包含私人信息。例如,医疗保健过程的执行痕迹可能对隐私敏感。在这种情况下,组织需要部署增强隐私技术(PET),以在分析这些数据中获得的利益与隐私法规对它们施加的要求,尤其是在将数据披露给流程分析师时,将重新识别风险最小化。在许多可用的宠物中,差异隐私因其防止谓词挑剔的攻击及其可组合隐私保证的能力而脱颖而出。差异隐私的缺点是缺乏其所依赖的主要隐私参数的解释性,即Epsilon。这导致了一个经常出现的问题,即epsilon有多少足够?本文提出了一种确定epsilon价值的方法,当按两个相关的业务指标披露过程挖掘技术的输出时,即绝对百分比误差指标捕获准确性的丧失(又称公用事业损失),这是由于增加了噪声,并且猜测了一个概率的概率,从而增加了niguly的噪声,从而捕获了一个猜测的信息。本文专门研究了保护所谓的直接遵循图(DFGS)的披露的问题,这是大多数过程挖掘工具生产的过程挖掘伪像。本文报告了对拟议方法在13个现实生活中的日志集合中实现的公用事业风险权衡权衡的经验评估。
Process mining techniques enable organizations to analyze business process execution traces in order to identify opportunities for improving their operational performance. Oftentimes, such execution traces contain private information. For example, the execution traces of a healthcare process are likely to be privacy-sensitive. In such cases, organizations need to deploy Privacy-Enhancing Technologies (PETs) to strike a balance between the benefits they get from analyzing these data and the requirements imposed onto them by privacy regulations, particularly that of minimizing re-identification risks when data are disclosed to a process analyst. Among many available PETs, differential privacy stands out for its ability to prevent predicate singling out attacks and its composable privacy guarantees. A drawback of differential privacy is the lack of interpretability of the main privacy parameter it relies upon, namely epsilon. This leads to the recurrent question of how much epsilon is enough? This article proposes a method to determine the epsilon value to be used when disclosing the output of a process mining technique in terms of two business-relevant metrics, namely absolute percentage error metrics capturing the loss of accuracy (a.k.a. utility loss) resulting from adding noise to the disclosed data, and guessing advantage, which captures the increase in the probability that an adversary may guess information about an individual as a result of a disclosure. The article specifically studies the problem of protecting the disclosure of the so-called Directly-Follows Graph (DFGs), which is a process mining artifact produced by most process mining tools. The article reports on an empirical evaluation of the utility-risk trade-offs that the proposed approach achieves on a collection of 13 real-life event logs.