论文标题

使用汽车编码选项

Numeric Encoding Options with Automunge

论文作者

Teague, Nicholas J.

论文摘要

与表格数据的机器学习中的主流练习可能认为,在深层神经网络的背景下,除了缩放数字集之外的任何功能工程都是多余的。本文将通过对Automunge开源python库平台的数字转换选项进行调查,以实现数字流的扩展编码的潜在好处,以用于表格数据管道,其中转换可以应用于具有衍生物的几代和分支的“家庭树”集中的不同列。汽车转换选项包括归一化,套筒,注入噪声,衍生物等。这些方法将这些方法汇总到家族树转换集中,以供用于在不同信息内容的多种配置中为机器学习提供数字特征,这可以应用于编码未知解释的数字集。实验证明了一种新颖的通用解决方案,可以通过噪声注入进行表格学习来增强数据,这可能会在使用较低的训练数据的应用中受益于模型性能。

Mainstream practice in machine learning with tabular data may take for granted that any feature engineering beyond scaling for numeric sets is superfluous in context of deep neural networks. This paper will offer arguments for potential benefits of extended encodings of numeric streams in deep learning by way of a survey of options for numeric transformations as available in the Automunge open source python library platform for tabular data pipelines, where transformations may be applied to distinct columns in "family tree" sets with generations and branches of derivations. Automunge transformation options include normalization, binning, noise injection, derivatives, and more. The aggregation of these methods into family tree sets of transformations are demonstrated for use to present numeric features to machine learning in multiple configurations of varying information content, as may be applied to encode numeric sets of unknown interpretation. Experiments demonstrate the realization of a novel generalized solution to data augmentation by noise injection for tabular learning, as may materially benefit model performance in applications with underserved training data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源