Convolutional Neural Network (CNN)
Early neural architecture for relation extraction. CNN is applied to relation extraction as follows:
- Construct sentence representation (matrix) as the input. It consists the concatenation of three different components.
Vector encodings of positional offsets from candidate argument a1 (positional encodings)
Vector encodings of positional offsets from candidate argument a2 (positional encodings)
Input the sentence matrix to the convolutional layer, followed by max-pooling layer
- The final scoring function is as follows:
- Model can be trained using margin objective
Recurrent Neural Network (RNN)
Using bi-directional LSTM to encode words or dependency path between two arguments. Using Xu et al. (2015) as a case study. The paper segment each dependency path into left and right sub-paths. In each path, a RNN is run from the argument to the root word. The final representation is computed by max pooling all the recurrent states along each path. You can apply this process across different channels, for example, part-of-speech, dependency relations etc. The model is defined as below:
The end score is passed through a softmax layer to obtain a probability and the model is train using regularised cross-entropy.