A single context-free language can be expressed by more than one context-free grammar(CFG). There are two types of equivalent when comparing grammars:

  1. Two grammars are weakly equivalent if they generate the same strings

  2. Two grammars are strongly equivalent if they generate the same strings via the same derivations

Chomsky Normal Form

In Chomsky Normal Form (CNF), the RHS of every production rule includes either two non-terminals or a single terminal symbol: A –> BC or A –> a.

All CFGs can be turned into a weakly equivalent CNF grammar. To do so, we would need to create a new “dummy” non-terminals to address production rules with more than two non-terminals. For example, converting W –> X Y Z to W –> X W\X and W\X –> Y Z.

In this case, W\X is the new dummy non-terminal. This step binarises the grammar, which is important for bottom-up parsing. Production rules with RHS that contains a mix of terminal and non-terminal symbols can be replaced using similar approach. Unary non-terminal productions A –> B are replaced as follows: for each production B –> alpha in the grammar, add a new production rule A –> alpha. However, we would still keep the production rule A –> B since it is a valid binary production.

Ryan

Ryan

Data Scientist

Leave a Reply