论文标题
散点:基于Scagnostics生成实例空间
scatteR: Generating instance space based on scagnostics
论文作者
论文摘要
传统的合成数据生成方法依赖于基于模型的方法来调整模型的参数,而不是关注数据本身的结构。相比之下,Scagnostics是一种探索性图形方法,它使用图理论测量方法捕获双变量数据的结构。本文提出了一种新型的数据生成方法,该方法使用SCAGNOSTICS测量来控制生成的数据集的特征。通过使用迭代广义模拟退火优化器,Scatter找到了数据点的最佳排列,从而最大程度地降低了电流和目标SCAGNOSTICS测量之间的距离。结果表明,散射可以在30秒内产生50个数据点,平均均方根误差为0.05,使其成为教学统计方法的有用教学工具。总体而言,STACTER提供了一个基于实例空间的特征生成数据集的入口点,而不是依靠基于模型的模拟。
Traditional synthetic data generation methods rely on model-based approaches that tune the parameters of a model rather than focusing on the structure of the data itself. In contrast, Scagnostics is an exploratory graphical method that captures the structure of bivariate data using graph-theoretic measures. This paper presents a novel data generation method, scatteR, that uses Scagnostics measurements to control the characteristics of the generated dataset. By using an iterative Generalized Simulated Annealing optimizer, scatteR finds the optimal arrangement of data points that minimizes the distance between current and target Scagnostics measurements. The results demonstrate that scatteR can generate 50 data points in under 30 seconds with an average Root Mean Squared Error of 0.05, making it a useful pedagogical tool for teaching statistical methods. Overall, scatteR provides an entry point for generating datasets based on the characteristics of instance space, rather than relying on model-based simulations.