论文标题
智能电网独特 - 细粒电消耗数据的隐私成本
Unique in the Smart Grid -The Privacy Cost of Fine-Grained Electrical Consumption Data
论文作者
论文摘要
通过智能电表的电力消耗时间系列的收集随着全国性的智能电网计划而增长。这些数据既高度敏感又高价:有关个人数据的强大法律保护它,而有关开放数据的法律则旨在在隐私保护数据发布过程之后公开它。在这项工作中,我们研究了大规模现实生活中的细粒电消耗时间序列的独特性,并显示了其与隐私威胁的联系。我们的结果表明,在此类数据集中,唯一性率令人担忧。特别是,我们表明,了解5个连续的电动措施可以平均在我们的250万个半小时电动时间序列数据集中重新识别90%以上的家庭。此外,即使数据严重降低,唯一性仍然很高。例如,当数据四舍五入到最接近的100瓦时,知道7个连续的电动措施可以平均重新识别40%以上的家庭(同一数据集)。我们还研究了独特性和熵,独特性和电力消耗以及电力消耗和温度之间的关系,显示了它们的牢固相关性。
The collection of electrical consumption time series through smart meters grows with ambitious nationwide smart grid programs. This data is both highly sensitive and highly valuable: strong laws about personal data protect it while laws about open data aim at making it public after a privacy-preserving data publishing process. In this work, we study the uniqueness of large scale real-life fine-grained electrical consumption time-series and show its link to privacy threats. Our results show a worryingly high uniqueness rate in such datasets. In particular, we show that knowing 5 consecutive electric measures allows to re-identify on average more than 90% of households in our 2.5M half-hourly electric time series dataset. Moreover, uniqueness remains high even when data is severely degraded. For example, when data is rounded to the nearest 100 watts, knowing 7 consecutive electric measures allows to re-identify on average more than 40% of the households (same dataset). We also study the relationship between uniqueness and entropy, uniqueness and electric consumption, and electric consumption and temperatures, showing their strong correlation.