Back to all posts
Notes

Context-Free Language

Context-free grammars define context-free languages using production rules and parse trees, with pushdown automata as their recognizing machines, forming a superset of regular languages.

可能的重点:

必会:

  1. CFG 定义(四元组)
  2. 写 grammar(给语言写 CFG)
  3. 画 parse tree
  4. 判断 ambiguity
  5. leftmost / rightmost derivation

进阶:

  1. CFL vs Regular 区别
  2. PDA 基本理解
  3. 简单构造 grammar

Chomsky Hierarchy

类型 名称 工具
Type 3 Regular Language FSM
⭐ Type 2 Context-Free Language CFG + PDA
Type 1 Context-Sensitive LBA
Type 0 Recursively Enumerable Turing Machine

🔹包含关系:$L₀ ⊃ L₁ ⊃ L₂ ⊃ L₃$

🔹 若一个 grammar 是 regular,则 it must be context-free

Definition of Context-Free Grammar

A CFG is a 4-tuple:$G = (Vₙ, Vₜ, P, S)$

符号 含义 示例
$Vₙ$ Non-terminal variables(非终结符) $A, B, S, <expr>$
$Vₜ$ Terminal symbols(终结符) $a, b, 0, 1, +, id$
$P$ Productions(产生式规则) $A → 0A1 | B$
$S$ Start variable(开始符号) $S ∈ Vₙ$

约束条件:$V = Vₙ ∪ Vₜ,Vₙ ∩ Vₜ = ∅$

最关键的是 $A \rightarrow \beta$ (左边只有一个非终结符

Derivation

  1. Write down the start variable
  2. Find a variable + matching rule,replace with RHS
  3. Repeat until no variables remain

符号表示

  • :direct derivation(直接推导)
  • ⇒*:derives(零步或多步推导)

CFL 属于一种生成语言 (generate strings) CFG 这种特殊的递归语法结构可以推导出任意符合相应语法的字符串

Parse Tree(语法分析树)

  • Root:starting symbol
  • Leaf:terminal symbols(no children)
  • Interior node:non-terminals(with children)

Parse tree 是 derivation 的 alternative representation

两种推导方式

  1. LL → leftmost 每次展开最左边的非终结符
  2. LR → rightmost 每次展开最右边的非终结符

Ambiguity Problem

歧义问题,顾名思义是,有歧义 针对相同的字符串,可能有不同的理解 定义是:A CFG G is ambiguous if it generates some strings with more than one distinct parse tree

对于相同的字符串有多个语法分析树,语法分析树不同,编译的时候执行的先后顺序就不同,就会有 Ambiguity Problem

例如:

<EXPR> → <EXPR>+<EXPR> | <EXPR>×<EXPR> | a

对于 a+a×a,存在 two different parse trees:

Tree 1: (+ at root)        Tree 2: (× at root)
     +                           ×
    / \\                         / \\
   a   ×                       +   a
      / \\                     / \\
     a   a                   a   a

编译器必须避免 ambiguity

在编译的时候,我们要保证 Each program must have a unique meaning, Interpretation depends on parse tree 所以我们不能有这种 Ambiguity Problem

消除 Ambiguity Problem 的方法

引入优先级层次

🔸 注意:No algorithm 能将 arbitrary ambiguous CFG 转为 unambiguous

🔸 某些 CFL 是inherently ambiguous(本质上二义)

Pushdown Automaton (PDA) 下推自动机

CFL 对应的机器: PDA(带栈的自动机)

Formal Definition(6-tuple)

$PDA = (S, Σ, Γ, δ, s₀, F)$

符号 含义
$S$ finite set of states
$Σ$ input alphabet
$Γ$ stack alphabet
$δ$ transition: $S×Σₑ×Γₑ → P(S×Γₑ)$
$s₀$ start state
$F$ accept states

核心思想是:多了一个 stack

所以可以:

  • 记住数量
  • 处理嵌套结构

PDA for $L = \{0ⁿ1ⁿ | n≥0\}$

策略:

  • 读 0 → push 0 onto stack
  • 读 1 → pop 0 from stack
  • 结束且 stack empty → accept
  • 否则 → reject

注意这个地方字符串的特点是,前半部分是 0 后半部分是 1,两部分字符的个数相同 核心思想就是处理字符串的时候用栈来记录 0 的数量,遇到 0 入栈,等到轮到 1 的时候遇到 1 出栈就能保证两部分数量相同

CFG 与 PDA 的等价性

For every CFG, ∃ PDA that recognizes the same language For every PDA, ∃ CFG that generates the same language

关键结论

  1. Every finite automaton is a special case of PDA(stack not used)
  2. Regular languages $⊆$ CFL
  3. But $∃$ CFL that are not regular → Regular $⊂$ CFL(proper subset)
  4. Nondeterministic PDA $>$ Deterministic PDA(expressive power)

Share this post

Back to home

Comments