Download Context-Free Frontier - Automata and Complexity Theory - Lecture Slides and more Slides Theory of Automata in PDF only on Docsity!
Chapter Fourteen:
The Context-Free Frontier
At this point we have two major language categories, the regular languages and the context- free languages, and we have seen that the CFLs include the regular languages, like this:
regular languages CFLs
L ( a * b *)
{ an^ bn }
Are there languages outside of the CFLs? In this chapter we will see that the answer is yes, and we will see some simple examples of languages that are not CFLs.
We have already seen that there are many closure properties for regular languages. Given any two regular languages, there are many ways to combine them—intersections, unions, and so on—that are guaranteed to produce another regular language. The context-free languages also have some closure properties, though not as many as the regular languages. If regular languages are a safe and settled territory, context-free languages are more like frontier towns. Some operations like union get you safely to another context-free language; others like complement and intersection just leave you in the wilderness.
Pumping Parse Trees
- A pumping parse tree for a CFG G = ( V , Σ, S , P ) is a parse tree with two properties: 1 There is a node for some nonterminal symbol A , which has that same nonterminal symbol A as one of its descendants 2 The terminal string generated from the ancestor A is longer than the terminal string generated from the descendant A
- Like every parse tree, a pumping parse tree shows that a certain string is in the language
- Unlike other parse trees, it identifies an infinite set of other strings that must also be in the language…
Lemma 14.1.
- As shown:
- uvwxy is the whole derived string
- A is the nonterminal that is its own descendant
- vwx is the string derived from the ancestor A
- w is the string derived from the descendant
- | vwx | > | w |, so v and x are not both ε
- There are two subtrees rooted at A
- We can make other legal parse trees by substitution…
S
u v w x y
A
A
If a grammar G generates a
pumping parse tree with
yield as shown, then L ( G )
includes uv iwx iy for all i.
Cut And Paste, i = 2
- We can replace the w subtree with the vwx subtree
- That makes a parse tree for uvvwxx y
- That is, uviwx iy for i = 2
S
u v w x y
A
A
S
u v x y
A
v w x
A
A
Cut And Paste, i = 3
- We can replace the w subtree with the vwx , again
- That makes a parse tree for uvvvwxxx y
- That is, uviwx iy for i = 3
S
u v x y
A
v w x
A
A
S
u v x y
A
v x
A
A
v w x
A
Useful Trees
- If we can find a pumping parse tree, we can
conclude that for all i , uviwx iy ∈ L ( G )
- And note that all these uv iwx iy are distinct,
because v and x are not both ε
- The next lemma shows that pumping parse
trees are not at all hard to find
S → S | S+S | S*S | a | b | c
S
a b
S + S
S
S + S
S * S
a b
c
S
a
Height Of A Parse Tree
- The height of a parse tree is the number of
edges in the longest path from the start
symbol to any leaf
- For example:
- These are parse trees of heights 1, 2, and 3:
Lemma 14.1.
- Proof: let G = ( V , Σ, S , P ) be any CFG, L ( G ) infinite
- G generates infinitely many minimum-size parse trees, since each string in L ( G ) has at least one
- Only finitely many can have height | V | or less, so G generates a minimum-size parse tree of height > | V |
- Such a tree must be a pumping parse tree:
- Property 1: it has a path with more than | V | edges; some nonterminal A must occur at least twice on such a path
- Property 2: replacing the ancestor A with the descendant A makes a tree with fewer nodes; this can't be a tree yielding the same string, because our tree was minimum-size
Every CFG G = ( V , Σ, S , P ) that generates an infinite language generates a pumping parse tree.
Outline
- 14.1 Pumping Parse Trees
- 14.2 The Language { a n^ b n^ c n }
- 14.3 Closure Properties For CFLs
- 14.4 Non-Closure Properties
- 14.5 A Pumping Lemma
- 14.6 Pumping-Lemma Proofs
- 14.7 The Languages { xx }
The Insight
- There must be some string in L ( G ) with a
pumping parse tree: a k^ b k^ c k^ = uvwxy
- But no matter how you break up a k^ b k^ c k^ into
those substrings uvwxy (where v and x are
not both ε) you can show uv^2 wx^2 y ∉ { a n^ b n^ c n }
- Either:
- v or x has more than one kind of symbol
- v and x have at most one kind of symbol each
- If v or x has more than one kind of symbol:
- uv^2 wx^2 y would have a s after b s and/or b s after c s
- Not even in L ( a * b * c *), so certainly not in { anbncn }
- Example:
- If v and x have at most one kind each:
- uv^2 wx^2 y has more of one or two, but not all three
- Not in { anbncn }
- Example:
a a a a a b b b b b c c c c c
u v w x y
a a a a a b b b b b c c c c c
u v w x y
Closure Properties
- CFLs are closed for some of the same
common operations as regular languages:
- Union
- Concatenation
- Kleene star
- Intersection with a regular language
- For the first three, we can make simple proofs
using CFGs…
Theorem 14.3.
- Proof is by construction using CFGs
- Given G 1 = ( V 1 , Σ 1 , S 1 , P 1 ) and G 2 = ( V 2 , Σ 2 , S 2 , P 2 ), with L ( G 1 ) = L 1 and L ( G 2 ) = L 2
- Assume V 1 and V 2 are disjoint (without loss of generality, because symbols could be renamed)
- Construct G = ( V , Σ, S , P ), where
- V = V 1 ∪ V 2 ∪{ S }
- Σ = Σ 1 ∪Σ 2
- P = P 1 ∪ P 2 ∪{( S → S 1 ), ( S → S 2 )}
- L ( G ) = L 1 ∪ L 2 , so L 1 ∪ L 2 is a CFL
If L 1 and L 2 are any context-free languages, L 1 ∪ L 2 is also context free.