论文标题
具有优化的3D多项式神经网络的植物物种识别,并且可变重叠的时间连续滑动窗口
Plant Species Recognition with Optimized 3D Polynomial Neural Networks and Variably Overlapping Time-Coherent Sliding Window
论文作者
论文摘要
最近,开发了EAGL-I系统是为了迅速创建旨在被农民和研究人员通常使用的植物数据集,以创建农业中的AI驱动解决方案。结果,由40,000张图像组成的公开植物识别数据集与系统一起创建了由8种植物物种组成的不同尺寸的图像,以证明其能力。本文提出了一种新颖的方法,称为可变的时间连续滑动窗口(fotcsw),该方法将由具有可变大小的图像组成的数据集转换为具有固定尺寸的3D表示,适合卷积神经网络,并证明此表示比将数据集的图像调整为给定尺寸更具信息性。我们从理论上正式化了该方法的用例及其固有的属性,我们证明了它对数据具有过采样和正则化效果。通过将FotCSW方法与最近提出的称为1维多项式神经网络的机器学习模型的3D扩展相结合,我们能够创建一个模型,该模型在EAGL-I系统创建的数据集中实现了99.9%的最新准确性,从而超越了诸如Resnet和Inception和Inception和Inception和Inception和Inception。此外,我们创建了一种启发式算法,该算法能够降低任何预训练的N维多项式神经网络,并在不改变其性能的情况下压缩它,从而使模型更快,更轻。此外,我们确定当前可用的数据集无法以目前的形式用于机器学习,这是因为训练集和测试集之间存在很大的类不平衡。因此,我们创建了一个特定的预处理和模型开发框架,使我们能够将准确性从49.23%提高到99.9%。
Recently, the EAGL-I system was developed to rapidly create massive labeled datasets of plants intended to be commonly used by farmers and researchers to create AI-driven solutions in agriculture. As a result, a publicly available plant species recognition dataset composed of 40,000 images with different sizes consisting of 8 plant species was created with the system in order to demonstrate its capabilities. This paper proposes a novel method, called Variably Overlapping Time-Coherent Sliding Window (VOTCSW), that transforms a dataset composed of images with variable size to a 3D representation with fixed size that is suitable for convolutional neural networks, and demonstrates that this representation is more informative than resizing the images of the dataset to a given size. We theoretically formalized the use cases of the method as well as its inherent properties and we proved that it has an oversampling and a regularization effect on the data. By combining the VOTCSW method with the 3D extension of a recently proposed machine learning model called 1-Dimensional Polynomial Neural Networks, we were able to create a model that achieved a state-of-the-art accuracy of 99.9% on the dataset created by the EAGL-I system, surpassing well-known architectures such as ResNet and Inception. In addition, we created a heuristic algorithm that enables the degree reduction of any pre-trained N-Dimensional Polynomial Neural Network and which compresses it without altering its performance, thus making the model faster and lighter. Furthermore, we established that the currently available dataset could not be used for machine learning in its present form, due to a substantial class imbalance between the training set and the test set. Hence, we created a specific preprocessing and a model development framework that enabled us to improve the accuracy from 49.23% to 99.9%.