Unformatted text preview:

11-722: Grammar FormalismsParsing OverviewAlon LavieJanuary 23, 2006References:Hopcroft and Ullman, “Introduction to Automata Theory, Languages and Computation”.James Allen, “Natural Language Understanding”, 2nd edition.Jurafsky and Martin, “Speech and Language Processing”.Formal Languages• A mathematical abstraction of real languages: Natural Languages andComputer Languages• A language is no more than a set of items called words (the equivalent ofsentences in a natural language)• Languages can be defined declaratively, descriptively or computationally• Formal Language Theory: The study of properties of the various types andclasses of languages using formal mathematical proofs• Fundamental problem - word membership:Given a word w and a language L - is w ∈ L?• what algorithm or computational device is necessary to answer this questiondepends on the class of the language1 11-722 Grammar FormalismsBasic Definitions• Σ: the alphabet - a finite (non-empty) set of atomic symbols– each symbol σ in the set is a letter– letters are denoted by lower case Latin letters a, b, c,...• a word is a string of letters from a given alphabet Σ• |w| denotes the length of word w• ² denotes the empty word: |²| = 0• we only consider words of finite length• Def: a language L is a set (finite or infinite) of words constructed from a given alphabetΣ• Examples: L1= {²} L2= Ø L3= {anbn|n ≥ 0 }• Set Theory and operations apply to formal languages:– union, intersection, complementation, membership–L = {w ∈ Σ∗|w 6∈ L}• Important notation:Σ∗= set of all finite words over the alphabet ΣΣi= set of all words of length i over the alphabet Σ2 11-722 Grammar FormalismsLanguage Classes• Sets of formal languages that can be defined using a particular descriptivedefinition or abstraction of a computational framework• Examples:– The set of languages that can be described by Regular Expressions– The set of languages for which we can construct a Finite State Automaton– The set of languages that can be defined using a Context-free Grammar• Knowing the class to which a language belongs will allow us to developefficient algorithms for processing the language or deciding membership inthe language3 11-722 Grammar FormalismsDeterministic FSA• Formal Definition of a DFSA: A = (Q, Σ, δ, q0, F ) where:– Q is a finite set of states– Σ is a finite alphabet– q0∈ Q is an initial (start) state– F ⊆ Q is a set of final states– δ : Q × Σ → Q is the complete transition function• The language accepted by a DFSA A is defined to be:L(A) = {w ∈ Σ∗| after computing on w, A is in a state q ∈ F }• based on the function δ, we defineˆδ, the function on words that models the computationof a DFSA recursively as follows:–ˆδ : Q × Σ∗→ Q–ˆδ(q, ²) = q ∀q ∈ Q–ˆδ(q, xσ) = δ(ˆδ(q, x), σ)• Formal definition of L(A): L(A) = {w ∈ Σ∗|ˆδ(q0, w) ∈ F }• Def: Regular Language: a language L ⊆ Σ∗is called regular if there exists someDFSA A such that L = L(A)• Examples of regular languages: L = Σ∗L = {²} L = (aab)∗4 11-722 Grammar FormalismsContext-Free Grammars• A descriptive generative formalism for specifying the set of words in alanguage using production rules• Formal Definition: a context-free grammar G = (V, T, P, S)– V is a finite set of variables– T is a finite set of terminal symbols (similar to Σ for FSAs)– P is a set of context-free production rules, each of the formA → α, where α ∈ (V ∪ T )∗– S is a start non-terminal (S ∈ V )• Notations:– we denote elements of V by S, A, B, C...– we denote elements of T by a, b, c...– we denote strings over T∗by w, x, y...– we denote strings over (T ∪ V )∗by α, β, γ...– we denote single variables or terminals by X, Y, Z...5 11-722 Grammar FormalismsContext-Free Grammars• Example: L = {anbn| n ≥ 1}G: S --> a S bS --> a b• in this case the language L(G) could be specified in a succinct mathematicalform - often this is difficult or not possible6 11-722 Grammar FormalismsCFG Derivations• derivations describe the process of using the context-free rules to derive astring of terminal symbols• Definition: let ϕ1, ϕ2∈ (V ∪ T )∗.ϕ1directly derives ϕ2, denoted by: ϕ1=⇒Gϕ2,if ϕ1= αAβ, ϕ2= αγβ and A → γ is a rule in PG• ϕ1derives ϕ2, denoted by ϕ1∗=⇒Gϕ2,if there exists a finite sequence of direct derivations such thatϕ1=⇒Gϕ01=⇒Gϕ02=⇒Gϕ03=⇒G· · · =⇒Gϕ2• ϕ1i=⇒Gϕ2denotes that ϕ1derives ϕ2in exactly i derivation steps• a rightmost derivation is a derivation in which at each step, the rightmostnon-terminal in the string is picked for expansion• similarly for a leftmost derivation7 11-722 Grammar FormalismsContext Free Languages (CFLs)• Formal Definition: the language of a CFG G is defined as:L(G) = {w ∈ T∗| S∗=⇒Gw}• a language L is context-free if there exists a grammar G such that L = L(G)• the set of all such languages is called the set of context-free languages (CFLs)• two grammars G1and G2are called equivalent if L(G1) = L(G2)8 11-722 Grammar FormalismsParse Trees• a Parse Tree is a graphical representation of a derivation• the leaves (yield) of the tree correspond to a terminal string in L(G)• the tree does not represent the derivation order of the non-terminals• the tree does reflect the structure of the input string - what rules were used toderive the various substrings of the input• a parse tree constitutes a proof that a given input string is in L(G)• a grammar G is called ambiguous if there exists a word w ∈ L(G) that hastwo or more different parse trees• There exist CFLs that are inherently ambiguous9 11-722 Grammar FormalismsPushdown Automata• An extension of a FSA that is powerful enough to accept CFLs• The FSA is augmented with a memory storage device in the form of a stack• Formal Definition: a PDA M = (Q, Σ, Γ, δ, q0, Z0, F ) where:– Q, Σ, q0, F are similar to those of a FSA– Γ is a finite set of stack symbols– Z0is a start stack symbol– δ : Q × (Σ ∪ {²}) × Γ → 2Q×Γ∗δ(q, σ, Z) = {(q1, γ1), (q2, γ2), ..., (qm, γm)}• Note that a PDA is non-deterministic: it can make ²-moves on the input• It can also: replace the top element of the stack, “push” an element onto thestack, and “pop” an element from the stack10 11-722 Grammar FormalismsRecognition and


View Full Document

CMU 11722 Grammar Fomalism - Parsing Overview

Download Parsing Overview
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parsing Overview and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parsing Overview 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?