论文标题
机器学习算法的并行化分别在单机器和火花上
Parallelization of Machine Learning Algorithms Respectively on Single Machine and Spark
论文作者
论文摘要
随着大数据技术的快速发展,如何从大量数据中挖掘出有用的信息成为一个基本问题。但是,使用机器学习算法来分析大型数据可能会耗时且在传统的单机器上效率低下。为了解决这些问题,本文对单个机器和大数据平台Spark上的几种经典机器学习算法的并行化进行了一些研究。我们将传统机器学习算法的运行时间和效率与单个机器和火花平台上的并行机器学习算法进行了比较。研究结果表明,并行化机器学习算法的运行时和效率显着提高。
With the rapid development of big data technologies, how to dig out useful information from massive data becomes an essential problem. However, using machine learning algorithms to analyze large data may be time-consuming and inefficient on the traditional single machine. To solve these problems, this paper has made some research on the parallelization of several classic machine learning algorithms respectively on the single machine and the big data platform Spark. We compare the runtime and efficiency of traditional machine learning algorithms with parallelized machine learning algorithms respectively on the single machine and Spark platform. The research results have shown significant improvement in runtime and efficiency of parallelized machine learning algorithms.