Recap: Inflectional morphology is the addition of details such as gender, number, person and tense to lemmas. For example, the “-ed” suffix for past tense).

Morphological analysis involves converting “canto” to “CANTAR + VERB + PRESIND + 1P + SING” (surface form).

  • PRESIND: tense as present indicative
  • 1P: first-person
  • SING: Singular number

On the other hand, morphological generation involves the reverse process.

FST on Inflectional Morphology

Finite state transducers are an attractive solution because it can do both morphological analysis and morphological generation. Below is an example of a part of FST for Spanish inflectional morphology.

Input vocabulary contains all the letters in the Spanish spelling. Output vocabulary is the input vocabulary AND morphological features (e.g. +SING, +VERB). In the figure, there are two paths that take “canto” as input and either output the verb or noun meanings (depending on the context).

Overgeneration vs Undergeneration

Finite state morphological analysis can be designed by hand but it needs to avoid:

  1. Overgeneration – accepting strings or making transductions that are not valid in the language

  2. Undergeneration – failing to accept strings that are valid

An example would be that a transducer that focuses on pluralising words, if it does not accept foot / feet, then it would be undergenerating. However, if we “fix” the transducer to accept this case, then it might start accepting boot / beet, which would make it overgenerating.

Ryan

Ryan

Data Scientist

Leave a Reply