论文标题
在TLS握手上的多演奏网站指纹上
On Multi-Session Website Fingerprinting over TLS Handshake
论文作者
论文摘要
分析用户的互联网流量数据和活动以不同的方式对用户的体验产生一定的影响,从维护互联网上的服务质量并为用户提供高质量的建议系统到异常检测和安全连接。考虑到Internet是一个复杂的网络,我们不能为每个活动分解数据包。因此,我们必须有一个模型,可以识别互联网用户在给定时间段内所做的所有活动。在本文中,我们提出了一种深度学习方法来生成多标签分类器,该分类器可以预测用户在某个时期内访问的网站。该模型通过在TLSV1.2和TLSV1.3客户端Hello数据包中以时间顺序出现的服务器名称提取服务器名称来起作用。我们将测试数据上的结果与一个简单的完全连接的神经网络进行了比较,以证明使用时间顺序信息可以改善性能。为了进行进一步的评估,我们在人造数据集和修改的数据集上测试模型,以检查模型在不同情况下的准确性。最后,我们提出的模型在测试数据集上的准确度为95%,在修改后的数据集和人造数据集上的准确性高于90%。
Analyzing users' Internet traffic data and activities has a certain impact on users' experiences in different ways, from maintaining the quality of service on the Internet and providing users with high-quality recommendation systems to anomaly detection and secure connection. Considering that the Internet is a complex network, we cannot disintegrate the packets for each activity. Therefore we have to have a model that can identify all the activities an Internet user does in a given period of time. In this paper, we propose a deep learning approach to generate a multi-label classifier that can predict the websites visited by a user in a certain period. This model works by extracting the server names appearing in chronological order in the TLSv1.2 and TLSv1.3 Client Hello packets. We compare the results on the test data with a simple fully-connected neural network developed for the same purpose to prove that using the time-sequential information improves the performance. For further evaluations, we test the model on a human-made dataset and a modified dataset to check the model's accuracy under different circumstances. Finally, our proposed model achieved an accuracy of 95% on the test dataset and above 90% on both the modified dataset and the human-made dataset.