A feature-based approach to using dependency paths to extract relations would be to decompose the dependency paths into features to capture individual edges. However, with this method, we would need to create local features for all the ways in which these dependencies paths are similar and different.
Instead, we can define a similarity function that computes the similarity score between two instances. Score of the similarity function is large when the two instances are similar. If the similarity function obeys few key properties, it is a valid kernel function.
Binary classification with kernel function
With a valid kernel function, we can build a non-linear classifier without explicitly defining a feature vector or neural network architecture. For a binary classification problem, we have the following decision function:
Where alpha and beta are learnable parameters from the training set. Each alpha tells us how importance the instance x is towards the classification task. Kernel-based classification can be viewed as a weighted form of the nearest-neighbour classifier.
To perform multi-class classification, we can either:
Train separate binary classifiers for each label (one vs all)
Train binary classifiers for each pair of possible labels (one vs one)
Dependency kernels are effective for relation extraction because they can capture syntactic properties of the path between the two candidate arguments pretty well. One class of dependency tree kernels is defined recursively, with score of a pair of trees equal to the similarity of the root nodes and the sum of similarities of matched pairs of child subtrees.