论文标题

Causalrca:基于因果推理的微服务应用的精确细粒根本原因定位

CausalRCA: Causal Inference based Precise Fine-grained Root Cause Localization for Microservice Applications

论文作者

Xin, Ruyue, Chen, Peng, Zhao, Zhiming

论文摘要

有效地定位性能异常的根本原因对于使云中微服务应用程序的快速恢复和减轻损失至关重要。根据可以本地化的原因的粒度,服务运营商可能采取不同的措施,例如,如果只能将缺点的服务(即,在有缺陷的服务上的特定指示指标)定位(即定位(即定位)(即(即,良好,良好),则可以重新启动或迁移服务)。先前的研究主要集中于粗粒的故障服务本地化,现在对细粒根本原因定位越来越感兴趣,以识别服务错误的服务和指标。基于因果推理(CI)方法最近已经获得了根本原因定位的普及,但是当前使用的CI方法具有局限性,例如线性因果关系假设和严格的数据分布要求。为了应对这些挑战,我们提出了一个名为Causalrca的框架,以实施细粒度,自动化和实时的根源原因。 Causalrca使用基于梯度的因果结构学习方法来生成加权因果图和根本原因推理方法来定位根部原因指标。我们进行粗粒和细粒根本原因定位,以评估Causalrca的定位性能。实验结果表明,Causalrca在定位准确性方面的基线方法显着超过了基线方法,例如,有缺陷服务中细粒根的平均AC@3的平均AC@3为0.719,与基线方法相比,平均增加为10%。此外,平均AVG@5增长了9.43%。

Effectively localizing root causes of performance anomalies is crucial to enabling the rapid recovery and loss mitigation of microservice applications in the cloud. Depending on the granularity of the causes that can be localized, a service operator may take different actions, e.g., restarting or migrating services if only faulty services can be localized (namely, coarse-grained) or scaling resources if specific indicative metrics on the faulty service can be localized (namely, fine-grained). Prior research mainly focuses on coarse-grained faulty service localization, and there is now a growing interest in fine-grained root cause localization to identify faulty services and metrics. Causal inference (CI) based methods have gained popularity recently for root cause localization, but currently used CI methods have limitations, such as the linear causal relations assumption and strict data distribution requirements. To tackle these challenges, we propose a framework named CausalRCA to implement fine-grained, automated, and real-time root cause localization. The CausalRCA uses a gradient-based causal structure learning method to generate weighted causal graphs and a root cause inference method to localize root cause metrics. We conduct coarse- and fine-grained root cause localization to evaluate the localization performance of CausalRCA. Experimental results show that CausalRCA has significantly outperformed baseline methods in localization accuracy, e.g., the average AC@3 of the fine-grained root cause metric localization in the faulty service is 0.719, and the average increase is 10% compared with baseline methods. In addition, the average Avg@5 has improved by 9.43%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源