The goal of the paper is to establish a model for predicting stock price movement through knowledge graph from the financial news of the renowned companies.

1. Introduction

In finance, knowledge graph is designed to capture the relationships between entities such as the management of companies, news events, and user preferences. These entities can be used for decision making and derive business insights to predict the stock market.

Knowledge graph embedding is a kind of representation of knowledge graph. There are several algorithms that explore the mapping relationship between entities and relationships in the translation distance model:

  1. TransE – represents one-to-one type of relationship

  2. TransH – maps the head and tail of a vector onto a hyperplane

  3. TransR – maps entities and relationships in different dimensions. For each relationship, the model has a matrix that maps the entity vectors into space

  4. TransD – improvement of TransR

For stock market prediction, our knowledge graph embedding-driven approach has 4 main steps:

  1. Data retrieval

  2. Preprocessing

  3. Model creation & prediction

  4. Model application

The data retrieval is where we retrieve financial news on our selected stocks on Thomson Reuters or CNN. We developed a web crawler to obtain the financial news and match them with the corresponding stock data. Once we consolidated the data, we moved to preprocessing, where we do EDA, text normalisation, text tokenisation, label tagging, and word2vec transformation. We would then generate a feature vector using machine learning and knowledge graph embedding. The third phase is model creation where stock market prediction labels (binary up or down) are assigned to financial news to train a classification model. The final finance decision relies on the predictive performance of this framework. The final phase involves model evaluation.

2. Literature Review

ML techniques is becoming popular in stock market prediction. A common procedure among ML models is concatenating features from different sources into a feature vector. In early work, most ML models only uses structured stock data and ignored unstructured data. When unstructured data was included, the model uses BoW which doesn’t take into account word order or word sparsity. The efficient market hypothesis has found that the emotional impulses of the renowned company investors often lead to abnormal fluctuations in company stocks. This has led to a lot of work on using sentiment analysis on news data to predict stock movements. However, sentiment analysis is limited if the sentiment is more implicit rather than direct emotional words. Some work also proposed topic sentiment to improve the performance of stock market prediction.

We utilise syntax analysis, specifically, we would extract a structured tuple from an unstructured text based on the semantic structure of each piece of news. Knowledge graph can enrich the structured representation of the news event and retain the feature vectors for the news event. In summary, our research focuses on syntax analysis in financial news along with other features extraction such as stock data, technical indicators, and BoWs.

Ryan

Ryan

Data Scientist

Leave a Reply