Philaex：解释AI模型在恶意软件检测中的失败和成功

论文标题

Philaex：解释AI模型在恶意软件检测中的失败和成功

PhilaeX: Explaining the Failure and Success of AI Models in Malware Detection

论文作者

Lu, Zhi, Thing, Vrizlynn L. L.

论文摘要

对AI模型用于支持网络安全决策的预测的解释至关重要。尤其如此，当模型的错误预测可能导致严重损失甚至损失生命和关键资产时。但是，尽管在大多数情况下表现出色，但大多数现有的AI模型都无法提供有关其预测结果的解释。在这项工作中，我们提出了一种新颖的可解释的AI方法，称为Philaex，该方法提供了启发式手段，以识别优化的特征子集，以形成AI模型预测的完整解释。它标识了导致模型的边界预测的功能，并提取了具有积极贡献的人。然后，通过优化脊回归模型来量化特征归因。我们通过两个实验来验证解释保真度。首先，我们通过Philaex的特征归因值评估了方法在正确识别Android Malwares对抗样本中激活特征的能力。其次，扣除和增强测试用于评估解释的保真度。结果表明，与最先进的方法（例如石灰和摇动）相比，Philaex能够正确解释不同类型的分类器，并具有更高的保真度解释。

The explanation to an AI model's prediction used to support decision making in cyber security, is of critical importance. It is especially so when the model's incorrect prediction can lead to severe damages or even losses to lives and critical assets. However, most existing AI models lack the ability to provide explanations on their prediction results, despite their strong performance in most scenarios. In this work, we propose a novel explainable AI method, called PhilaeX, that provides the heuristic means to identify the optimized subset of features to form the complete explanations of AI models' predictions. It identifies the features that lead to the model's borderline prediction, and those with positive individual contributions are extracted. The feature attributions are then quantified through the optimization of a Ridge regression model. We verify the explanation fidelity through two experiments. First, we assess our method's capability in correctly identifying the activated features in the adversarial samples of Android malwares, through the features attribution values from PhilaeX. Second, the deduction and augmentation tests, are used to assess the fidelity of the explanations. The results show that PhilaeX is able to explain different types of classifiers correctly, with higher fidelity explanations, compared to the state-of-the-arts methods such as LIME and SHAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题