There are two arguments for the relevance of context-free formal languages to linguistics:
- Centre embedding
In the figure above, the level 1 centre embedding of “the dog” represents a single dog. In the level 2 centre embedding, we have “the cat the dog chased”, which represents a particular cat that is chased by the dog. In the level 3 centre embedding, we have “the goat the cat the dog chased kissed”, which represents the goat who was kissed by the cat which was chased by the dog.
Chomsky (1957) argues that to be grammatical, a centre-embedded construction must be balanced meaning that if the sentence contains n noun phrases (e.g., the dog, in our example, n = 3), then the noun phrases must be followed by exactly n – 1 verbs (e.g., chased and kissed).
No FSA exists for centre embedded constructions and so if English includes centre embeddings, then English grammar can’t be regular!
Many linguistic phenomena, especially in syntax, involve constraints that apply at long distance. In the example below, it’s appropriate to say “the coffee”, and “these coffees, but not *these coffee. This rule can easily be model in FSA, however, complex modifying expressions can appear between the determiner and the noun as shown below.
One can argue that FSA can also be designed to accept these modifying expressions. However, these is inconvenient as the only was to build an FSA to accept all the phrases above is to create two identical copies of the same FSA, one for singular determiners and one for plurals. This is very inefficient.
This is where context-free languages comes in as it can facilitates modularity across long-range dependencies.