Objectives and Contributions

Used BERT to perform sentiment analysis and key entity detection on financial texts. We consider key entity detection as a sentence matching task. We proposed an ensemble learning that outperformed SVM, LR, NBM, and BERT for two financial sentiment analysis and key entity detection datasets.

We combine sentiment analysis and key entity detection in a unified approach based on RoBERTa. We fine-tuned RoBERTa using different methods to implement sentiment analysis and key entity detection. With sentiment analysis, we focus specifically on the negative emotion information. With key entity detection, we were able to match each financial entity with its text to detect the key entities of negative information.

Methodology

Existing entity detection methods cannot detect key entities, let alone selecting the most relevant entities with tags. Therefore, we propose RoBERTa based sentiment analysis and key entity detection approach for online financial texts as follows:

  1. Analyse the sentiment of text using RoBERTa (classic)

  2. Get financial entity list and select key entities

  3. Select the key entity with tags

In terms of step 2, for each piece of financial data, we use NER to get a list of entities. In the coarse-grained task, we detected some key entities related to the financial text from the entity list. As shown in the figure below, we feed each entity and the financial text into RoBERTa and use RoBERTa as a sentence matching model to determine whether each entity is a key entity. The number of models is determined by the number of entities. We would form a list of key entities of those entities that are predicted as key by RoBERTa.

In terms of step 3, in certain fine-grained tasks, we are required to select the most relevant key entities based on the tags of the financial text. Tags can be corruption, fraud etc. This is more of a machine reading comprehension (MRC) task and therefore, we treat each financial text as an MRC article and rewrite its tag as MRC questions. The answers predicted by the MRC model are considered to be key entities.

Experiments and Results

Dataset

We used two datasets: Negative Financial Information and Subject Deter-mination 2019 and Event Subject Extraction for Financial Field. The table below showcase the descriptive statistics of the dataset. Our evaluation metrics are accuracy rate for sentiment analysis and F1 score for key entity detection.

Results

Ryan

Ryan

Data Scientist

Leave a Reply