在新的合成数据集中训练的合奏机器学习模型可以很好地概括使用可穿戴设备的压力预测

论文标题

在新的合成数据集中训练的合奏机器学习模型可以很好地概括使用可穿戴设备的压力预测

Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

论文作者

Vos, Gideon, Trinh, Kelly, Sarnyai, Zoltan, Azghadi, Mostafa Rahimi

论文摘要

介绍。我们研究了在单个研究协议中记录的少数受试者的数据集上构建的模型的概括能力。接下来，我们建议并评估将这些数据集组合到一个大型数据集中的方法。最后，我们通过将增强梯度增强与人工神经网络相结合，以测量新的，看不见的数据来衡量预测能力，从而提出和评估集合技术的使用。方法。本研究中使用了来自六个公共数据集的传感器生物标志物数据。为了测试模型的概括，我们开发了一种在一个数据集（Swell）上训练的梯度增强模型，并在其他研究中使用的两个数据集（Wesad，Neuro）上测试了其预测能力。接下来，我们合并了四个小数据集，即（Swell，Neuro，Wesad，UBFC-Phys），提供了99个主题的组合。此外，我们利用随机抽样与另一个数据集（考试）结合使用，以构建一个较大的培训数据集，该数据集由200个合成受试者组成。最后，我们开发了一个合奏模型，该模型将我们的梯度增强模型与人工神经网络相结合，并在另外两个看不见的公开压力数据集（Wesad和Toadstool）上对其进行了测试。结果。我们的方法提供了一个可靠的应力测量系统，能够在新的，看不见的验证数据上实现85％的预测准确性，从而比在小型数据集中训练的单个模型实现了25％的性能提高。结论。在小型的单个研究协议数据集上训练的模型并不能很好地概括用于新的，看不见的数据，并且缺乏统计能力。在包含更多不同研究对象的数据集上训练的Ma-chine学习模型更好地捕获生理差异，从而导致更健壮的压力检测。

Introduction. We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols. Next, we propose and evaluate methods combining these datasets into a single, large dataset. Finally, we propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data. Methods. Sensor biomarker data from six public datasets were utilized in this study. To test model generalization, we developed a gradient boosting model trained on one dataset (SWELL), and tested its predictive power on two datasets previously used in other studies (WESAD, NEURO). Next, we merged four small datasets, i.e. (SWELL, NEURO, WESAD, UBFC-Phys), to provide a combined total of 99 subjects,. In addition, we utilized random sampling combined with another dataset (EXAM) to build a larger training dataset consisting of 200 synthesized subjects,. Finally, we developed an ensemble model that combines our gradient boosting model with an artificial neural network, and tested it on two additional, unseen publicly available stress datasets (WESAD and Toadstool). Results. Our method delivers a robust stress measurement system capable of achieving 85% predictive accuracy on new, unseen validation data, achieving a 25% performance improvement over single models trained on small datasets. Conclusion. Models trained on small, single study protocol datasets do not generalize well for use on new, unseen data and lack statistical power. Ma-chine learning models trained on a dataset containing a larger number of varied study subjects capture physiological variance better, resulting in more robust stress detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题