论文标题

Xtralibd:检测Java和Python应用中无关的第三方库

XtraLibD: Detecting Irrelevant Third-Party libraries in Java and Python Applications

论文作者

Kapur, Ritu, Rao, Poojith U, Dewan, Agrim, Sodhi, Balwinder

论文摘要

软件开发包括使用多个第三方库(TPLS)。但是,软件应用程序可分发中存在的无关库通常会导致过度消耗CPU周期,内存和Modile-Devices的电池电量。因此,需要对应用程序中存在的未使用的TPL的识别和去除。我们提出了一种快速,储存效率,混淆的方法,用于检测Java和Python应用中的无关-TPLS。我们方法的新颖方面是i)使用称为lib2vec的模型来计算.class文件的向量表示。使用段落向量算法训练LIB2VEC模型。 ii)在使用它来训练lib2Vec模型之前,一个.class文件通过语义传播转换转换为归一化形式。 iii)使用27种不同语言特异性LIB2VEC模型开发和测试的额外库检测器(Xtralibd)。使用不同的参数和> 30,000。class和> 478,000 .Py文件从> 100个不同的Java库和43,711 Python培训,分别在Mavencentral.com和Pypi.com上培训了43,711 Python。 Xtralibd的精度为99.48%,F1得分为0.968,并且优于现有工具,即,即Libscout,Lrtradar和Libd,精度分别提高了74.5%,30.33%和14.1%。与LIBD相比,Xtralibd的响应时间提高了61.37%,储存量减少了87.93%(比JingRestient比99.85%)。我们的程序工件可在https://www.doi.org/10.5281/zenodo.5179747获得。

Software development comprises the use of multiple Third-Party Libraries (TPLs). However, the irrelevant libraries present in software application's distributable often lead to excessive consumption of resources such as CPU cycles, memory, and modile-devices' battery usage. Therefore, the identification and removal of unused TPLs present in an application are desirable. We present a rapid, storage-efficient, obfuscation-resilient method to detect the irrelevant-TPLs in Java and Python applications. Our approach's novel aspects are i) Computing a vector representation of a .class file using a model that we call Lib2Vec. The Lib2Vec model is trained using the Paragraph Vector Algorithm. ii) Before using it for training the Lib2Vec models, a .class file is converted to a normalized form via semantics-preserving transformations. iii) A eXtra Library Detector (XtraLibD) developed and tested with 27 different language-specific Lib2Vec models. These models were trained using different parameters and >30,000 .class and >478,000 .py files taken from >100 different Java libraries and 43,711 Python available at MavenCentral.com and Pypi.com, respectively. XtraLibD achieves an accuracy of 99.48% with an F1 score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar, and LibD with an accuracy improvement of 74.5%, 30.33%, and 14.1%, respectively. Compared with LibD, XtraLibD achieves a response time improvement of 61.37% and a storage reduction of 87.93% (99.85% over JIngredient). Our program artifacts are available at https://www.doi.org/10.5281/zenodo.5179747.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源