FSAs are a hard switch to determine whether a word is in a regular language. WFSAs are extension of FSAs that computes score for every word over a given set of alphabet inputs. Finite State Transducers (FSTs) further extended both by adding an output symbol to each transition. A FST differs from WFSA in two ways:

  1. It contains an addition argument, the output vocabulary

  2. The transition function maps from states, input symbols, and output symbols to states

This means that each path through the FST transduces the input string into an output.

Some application of FST

String edit distance

The edit distance of two strings is a measure of how many operations are required to transform one string to another. The most popular method is the Levenshtein edit distance, which counts the minimum number of insertions, deletions, and substitutions. This can be computed using a one-state weighted FST, in which the input and output alphabets are the same. The transition and state diagram is shown below. For a given string pair, there are multiple paths through the transducer.

Porter stemmer

Stemming involves stripping suffixes from English words, using a sequence of character-level rules. The Porter stemming algorithm is an example. Each rule can be described by an unweighted FST. One of the stemming rules are shown below:

The final two lines conflicted each other. Each rule are supposed to apply to a terminal -s. This is handled by the state diagram as shown below:

Ryan

Ryan

Data Scientist

Leave a Reply