论文标题
群集环境中线性回归算法的性能评估
Performance Evaluation of Linear Regression Algorithm in Cluster Environment
论文作者
论文摘要
引入了集群计算以替代超级计算机的优势。集群计算能够克服无法有效处理超级计算机的问题。在本文中,我们将通过在集群环境中执行一种数据挖掘技术来评估群集计算的性能。该实验将尝试通过使用Apache Spark作为群集计算的框架来预测飞行延迟。结果表明,与独立的群集相比,通过在群集环境中涉及5个PC可以提高计算的性能高达39.76%。将更多的节点附加到群集上可以使过程变得更快。
Cluster computing was introduced to replace the superiority of super computers. Cluster computing is able to overcome the problems that cannot be effectively dealt with supercomputers. In this paper, we are going to evaluate the performance of cluster computing by executing one of data mining techniques in the cluster environment. The experiment will attempt to predict the flight delay by using linear regression algorithm with apache spark as a framework for cluster computing. The result shows that, by involving 5 PCs in cluster environment with equal specifications can increase the performance of computation up to 39.76% compared to the standalone one. Attaching more nodes to the cluster can make the process become faster significantly.