23.4 Representation learning models for local coherence
What is lexical cohesion?
It’s the idea that how semantically related words in nearby sentences are indicate how cohesive the discourse is.
What is the TextTiling algorithm of Hearst?
The algorithm computes the cosine similarity between neighbouring text spans, showing that sentences in the same subtopic has a higher similarity to each other.
Describe the LSA coherence method.
It’s the first model to use embeddings. It models coherence of two sentences by computing the cosine similarity of their LSA sentence embeddings, where sentence embeddings are the summation of respective words embeddings. The overall coherence of a text is the average similarity of all the adjacent sentence pairs.
Describe the local coherence discriminator (LCD).
LCD is very similar to early models in that in computes the coherence of a text by taking the average similarity scores of all the adjacent sentence pairs. Where it defers is in that LCD is a self-supervised model, trained to distinguish between sentence pairs in the training documents and the constructed incoherent pairs. Therefore, our training data has consecutive pairs as positive examples and incoherent pairs as negative examples. The architecture is shown in the figure below. The architecture takes a sentence pair and returns a coherence score. The model has the following steps:
Computes sentence embeddings for sentence s and t
Compute four features: the concatenation of the two sentence embeddings, the difference of the embeddings, the absolute value of the difference of the embeddings, and the element-wise product of the embeddings
These features are feed into a one-layer FFNN to output a coherence score