论文标题
Emblaze:通过嵌入空间的互动比较来照明机器学习表示形式
Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces
论文作者
论文摘要
现代机器学习技术通常依赖于复杂的高维嵌入表示形式来捕获数据中的基础结构并提高性能。为了表征模型缺陷并选择理想的表示形式,模型构建者通常需要比较跨多个嵌入式空间,这是一个充满挑战的分析任务,几乎没有现有工具支持。我们首先采访了九个嵌入专家在各个领域的嵌入专家,以表征他们在分析嵌入空间时所面临的各种挑战和所使用的技术。从这些角度来看,我们开发了一个名为Emblaze的新型系统,该系统将嵌入空间比较集成到计算笔记本环境中。 Emblaze使用动画,交互式散点图,带有新颖的星形跟踪扩展,以实现视觉比较。它还采用新颖的邻里分析和聚类程序来动态地暗示空间之间具有有趣变化的点数。通过与ML专家的一系列案例研究,我们证明了与Emblaze的互动比较如何有助于获得嵌入空间结构的新见解。
Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.