Twitter属性分类与Q学习在比特币价格预测上

论文标题

Twitter属性分类与Q学习在比特币价格预测上

Twitter Attribute Classification with Q-Learning on Bitcoin Price Prediction

论文作者

Otabek, Sattarov, Choi, Jaeyoung

论文摘要

渴望根据人们对Twitter的意见来实现准确的比特币价格预测，通常需要数百万推文，使用不同的文本挖掘技术（预处理，代币化，蒸馏，驱动，停止单词删除），并开发机器学习模型来执行预测。这些尝试导致使用大量计算机功率，中央处理单元（CPU）利用率，随机访问记忆（RAM）用法和时间。为了解决这个问题，在本文中，我们考虑了对价格变化和计算机资源使用水平影响的推文属性的分类，同时获得了准确的价格预测。为了对价格变动具有很高影响的Tweet属性进行分类，我们收集了在一定期段内发布的所有与比特币相关的推文，并根据以下推文属性将其分为四类：$（i）$ Tweet海报的关注者的关注者数量，$（ii）$（II）$在Tweet上的评论数，$（III）$（III）$（III）$（III）$（III）$（IIFES $（II iv）和$（II ii iv）。我们通过使用上述四个分类的推文来分别训练和测试，并在它们之间找到最佳准确的预测。特别是，我们设计了几种奖励功能，以提高Q倾斜的预测准确性。我们通过经典方法比较我们的方法，其中通过分析CPU工作负载，RAM使用，内存，时间和预测准确性，将所有与比特币相关的推文用作模型的输入数据。结果表明，关注者最多的用户发布的推文对未来价格的影响最大，并且与经典方法相比，其利用率减少了80 \％\％的时间，减少88.8％\％的CPU消耗量和12.5％的准确预测。

Aspiring to achieve an accurate Bitcoin price prediction based on people's opinions on Twitter usually requires millions of tweets, using different text mining techniques (preprocessing, tokenization, stemming, stop word removal), and developing a machine learning model to perform the prediction. These attempts lead to the employment of a significant amount of computer power, central processing unit (CPU) utilization, random-access memory (RAM) usage, and time. To address this issue, in this paper, we consider a classification of tweet attributes that effects on price changes and computer resource usage levels while obtaining an accurate price prediction. To classify tweet attributes having a high effect on price movement, we collect all Bitcoin-related tweets posted in a certain period and divide them into four categories based on the following tweet attributes: $(i)$ the number of followers of the tweet poster, $(ii)$ the number of comments on the tweet, $(iii)$ the number of likes, and $(iv)$ the number of retweets. We separately train and test by using the Q-learning model with the above four categorized sets of tweets and find the best accurate prediction among them. Especially, we design several reward functions to improve the prediction accuracy of the Q-leaning. We compare our approach with a classic approach where all Bitcoin-related tweets are used as input data for the model, by analyzing the CPU workloads, RAM usage, memory, time, and prediction accuracy. The results show that tweets posted by users with the most followers have the most influence on a future price, and their utilization leads to spending 80\% less time, 88.8\% less CPU consumption, and 12.5\% more accurate predictions compared with the classic approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题