论文标题
旅行者:导航任务并行迹线进行性能分析
Traveler: Navigating Task Parallel Traces for Performance Analysis
论文作者
论文摘要
了解软件在执行中的行为是识别和解决绩效问题的关键步骤。这在高性能计算环境中尤其重要,即使是次要的性能调整也可以从计算资源使用方面转化为大量节省。为了帮助绩效分析,开发人员可能会收集执行跟踪 - 执行过程中程序活动的时间顺序排列。由于痕迹代表了完整的历史,开发人员可以发现各种以前未知的性能问题,使其成为探索性绩效分析的重要工件。但是,由于数据大小和含义的复杂性问题,交互式痕量可视化很困难。迹线代表许多平行过程中的纳秒级事件,这意味着收集的数据通常很大且难以探索。异步任务并行编程范式的兴起使事件及其可能的原因之间的关系变得复杂。为了应对这些挑战,我们与高性能计算研究人员合作进行了持续的设计研究。我们开发了多样化和分层的方式来导航和表示执行跟踪数据,以支持其跟踪分析任务。通过迭代设计过程,我们开发了Traveler,这是一个用于任务并行轨迹的集成可视化平台。旅行者提供多个链接接口,以帮助从多个上下文中导航跟踪数据。我们通过用户的反馈和案例研究评估了旅行者的实用性,发现在我们的设计支持性能分析任务中将多种导航模式整合在一起,并导致在分布式数组库中发现以前未知的行为。
Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace - a chronological log of program activity during execution. As traces represent the full history, developers can discover a wide array of possibly previously unknown performance issues, making them an important artifact for exploratory performance analysis. However, interactive trace visualization is difficult due to issues of data size and complexity of meaning. Traces represent nanosecond-level events across many parallel processes, meaning the collected data is often large and difficult to explore. The rise of asynchronous task parallel programming paradigms complicates the relation between events and their probable cause. To address these challenges, we conduct a continuing design study in collaboration with high performance computing researchers. We develop diverse and hierarchical ways to navigate and represent execution trace data in support of their trace analysis tasks. Through an iterative design process, we developed Traveler, an integrated visualization platform for task parallel traces. Traveler provides multiple linked interfaces to help navigate trace data from multiple contexts. We evaluate the utility of Traveler through feedback from users and a case study, finding that integrating multiple modes of navigation in our design supported performance analysis tasks and led to the discovery of previously unknown behavior in a distributed array library.