Morphology is the study of word-internal structure. Many words have prefixes and suffixes (word-internal structure) that shape their meaning. There are two main types of morphology:

  1. Derivational – the use of affixes to convert a word from one grammatical category to another, or to change the meaning of the word. For example, grace (noun) to graceful (adjective) or disgrace (different meaning).

  2. Inflectional – the addition of details such as gender, number, person and tense. For example, the “-ed” suffix for past tense)

Derivational Morphological Analysis

In this post, we will use finite state acceptor (FSA) to do derivational morphological analysis. We will use FSA to represents morphological rules explicitly so that the FSA can apply existing morphological rules to new words and names.

The brute-force approach would be to implement a FSA, with vocabulary size equal to the English vocabulary and a transition from the start state to the accepting state of each word. However, this approach would fail to generalise and would not capture anything about the morphotactic rules that govern creation of new words.

A more general approach would be to implement a FSA whereby the vocabulary contains morphemes, which include stem words and affixes (e.g. dis-, -ing, -ly). The FSA would consists of a set of paths that begins at the start state, with derivational affixes added along the path. This is shown in the figure below. All the states on the paths are final and will be accepted by the FSA (except q_neg).

The FSA can be minimised to the figure shown below. This showcase, for example, that the transition from q_0 to q_j2 can be made to accept any single-morpheme (monomorphemic) adjective that takes -ness and -ly suffixes. This shows that the FSA can easily be extended as new word stems are added to the vocabulary.

One of the “weakness” of FSAs as shown above is that it accepts “allureing”, not “alluring”. This showcase the difference between morphology (which morphemes to use and what order) and orthography (the way morphemes changes in written language) and phonology (how words are rendered in speech). These issues can be solved by using finite state transducers, which are finite state automata that takes inputs and produce outputs.



Data Scientist

Leave a Reply