It is a huge task to design finite state transducers that captures the full range of morphological phenomena in any languages. However, finite state automata can be modularised through composition. What this means is that we can feed the output of one transducer as the input to another transducer.

Case study: Morphology and Orthography

In English morphology, the suffix “-ed” is added to indicate past tense for many verbs. For example, cook –> cooked and book –> booked. However, English orthography has a contraint that this process cannot produce a spelling that has consecutive e’s, for example, bake –> baked, and not bakeed.

A finite state composition solution would be to build separate transducers for morphology and orthography. The morphological transducer T_m takes in lexicon of words and set of morphological features (e.g. PAST) as input and output characters of a-z and the “+” boundary marker. So, using our example above, T_m will transduces bake+PAST to bake+ed. The orthogonal transducer T_o is responsible for transducing cooked+ed –> cooked, and bake+ed –> baked. The input of T_o must be the same as the output of T_m and the output of T_m is just the characters a-z.

Overall, the composed transducer (of T_m and T_o) can transduce bake+PAST to baked.

Ryan

Ryan

Data Scientist

Leave a Reply