论文标题
用Copulas生成和模拟合成数据集
Generation and Simulation of Synthetic Datasets with Copulas
论文作者
论文摘要
本文提出了一种基于Copula模型生成合成数据集的新方法。我们的目标是根据边际和联合分布产生类似于实际数据的替代数据。我们提出了一种完整可靠的算法,用于生成包含数字或分类变量的合成数据集。与SMOTE和自动编码器等其他方法相比,将我们的方法应用于两个数据集显示出更好的性能。
This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.