UVA CS 4610 - Programming Language Syntax - D2137868

Home> Schools> University Of Virginia> Computer Science (CS) > CS 4610> Programming Language Syntax

UVA CS 4610 - Programming Language Syntax

Pages 12

Download Save

Unformatted text preview:

CD_Ch02-P374514 [10:55 2009/2/25] SCOTT: Programming Language Pragmatics Page: 13 1–8672Programming LanguageSyntax2.4 Theoretical FoundationsAs noted in the main text, scanners and parsers are based on the ﬁnite automataand pushdown automata that form the bottom two levels of the Chomsky languagehierarchy. At each level of the hierarchy, machines can be either deterministic ornondeterministic. A deterministic automaton always performs the same operationin a given situation. A nondeterministic automaton can perform any of a set ofoperations. A nondeterministic machine is said to accept a string if there existsa choice of operation in each situation that will eventually lead to the machinesaying “yes.” As it turns out, nondeterministic and deterministic ﬁnite automataare equally powerful: every DFA is, by deﬁnition, a degener ate NFA, and theconstruction in Example 2.14 (page 56) demonstrates that for any NFA we cancreate a DFA that accepts the same language. The same is not true of push-downautomata: there are context-free languages that are accepted by an NPDA but notby any DPDA. Fortunately, DPDAs sufﬁce in practice to accept the syntax of realprogramming languages. Practical scanners and parsers are always deterministic.2.4.1 Finite AutomataPrecisely deﬁned, a deterministic ﬁnite automaton (DFA) M consists of (1) aﬁnite set Q of states, (2) a ﬁnite alphabet Σ of input symbols, (3) a distinguishedinitial state q1∈ Q, (4) a set of distinguished ﬁnal states F ⊆ Q, and (5) atransition function δ : Q × Σ → Q that chooses a new state for M based on thecurrent state and the current input symbol. M begins in state q1. One by one itconsumes its input symbols, using δ to move from state to state. When the ﬁnalsymbol has been consumed, M is interpreted as saying “yes” if it is in a state inF; otherwise it is interpreted as saying “no.” We can extend δ in the obvious wayto take st rings, rather than symbols, as inputs, allowing us to say that M acceptsstring x if δ(q1, x) ∈ F. We can then deﬁne L(M ), the language accepted by M ,to be the set {x | δ(q1, x) ∈ F}. In a nondeterministic ﬁnite automaton (NFA),Copyrightc 2009 by Elsevier Inc. All rig hts reserved. 13CD_Ch02-P374514 [10:55 2009/2/25] SCOTT: Programming Language Pragmatics Page: 14 1–86714 Chapter 2 Programming Language Sy ntax. .Startddddq3q4q1q2Figure 2.32 Minimal DFA for the language consisting of all strings of decimal digits containinga single decimal point. Adapted from Figure 2.10 in the main text. The symbol d here is shortfor “0, 1, 2, 3, 4, 5, 6, 7, 8, 9”.the transition function, δ, is multivalued: the automaton can move to any of a setof possible states from a given state on a given input. In addition, it may movefrom one state to another “spontaneously”; such transitions are said to take inputsymbol .We can illustrate these deﬁnitions with an example. Consider the circles-and-EXAMPLE 2.56Formal DFA ford *(.d | d. ) d *arrows automaton of Figure 2.32 (adapted from Figure 2.10 in the main text).This is the minimal DFA accepting strings of decimal digits containing a singledecimal point. Σ={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, .} is the machine’s input alpha-bet. Q = {q1, q2, q3, q4} is the set of states; q1is the initial state; F = {q4} (asingleton in this case) is the set of ﬁnal states. The t ransition function can be rep-resented by a set of triples δ = {(q1, 0, q2), ...,(q1, 9, q2), (q1, ., q3), (q2, 0, q2),...,(q2, 9, q2), (q2, ., q4), (q3, 0, q4), ...,(q3, 9, q4), (q4, 0, q4), ...,(q4, 9, q4)}.In each triple (qi, a, qj), δ(qi, a)=qj.Given the constructions of Examples 2.12 and 2.14, we know that there existsan NFA that accepts the language generated by any given regular expression, anda DFA equivalent to any given NFA. To show that regular expressions and ﬁniteautomata are of equivalent expressive power, all that remains is to demonstratethat there exists a regular expression that generates the language accepted by anygiven DFA. We illustrate the required construction below for our decimal stringsexample (Figure2.32). More for mal and general treatment of all the regularlanguage constructions can be found in standard automata theory texts [HMU01,Sip97].From a DFA to a Regular ExpressionTo construct a regular expression equivalent to a given DFA, we employaadynamicprogramming algorithm that builds solutions to successively more complicatedsubproblems from a table of solutions to simpler subproblems. We begin with aset of simple regular expressions that describe the transition function, δ. For allstates i, we deﬁner0ii= a1| a2| ... | am| Copyrightc 2009 by Elsevier Inc. All rights reserved.CD_Ch02-P374514 [10:55 2009/2/25] SCOTT: Programming Language Pragmatics Page: 15 1–8672.4.1 Finite Automata 15where {a1| a2| ... | am} = {a | δ(qi, a)=qi} is the set of characters labelingthe “self-loop” from state qiback to itself. If there is no such self-loop, r0ij= .Similarly, for i = j, we deﬁner0ij= a1| a2| ... | amwhere{a1| a2| ... | am} = {a | δ(qi, a)=qj} is the set of characters labelingthe arc from qito qj. If there is no such arc, r0ijis the empty regular expression.(Note the difference here: we can stay in state qiby not accepting any input, so is always one of the alternatives in r0ii, but not in r0ijwhen i = j.)Given these r0expressions, the dynamic programming algorithm inductivelycalculates expressions rkijwith larger superscripts. In each, k names the highest-numbered state through which control may pass on the way from qito qj.Weassume that states are numbered starting with q1, so when k = 0 we must transi-tion directly from qito qj, with no intervening states.In our small example DFA, r011= r033= , and r022= r044= 0 | 1 | 2 | 3 | 4 |EXAMPLE 2.57Reconstructing a regularexpression for the decimalstring DFA5 | 6 | 7 | 8 | 9 | , which we will abbreviate d | . Similarly, r013= r024= ., andr012= r034= d. Expressions r014, r021, r023, r031, r032, r041, r042, and r043are all empt y.For k > 0, the rkijexpressions will generally generate multicharacter strings. Ateach step of the dynamic programming algorithm, we letrkij= rk−1ij| rk−1ikrk−1kk*rk−1kjThat is, to get from qito qjwithout going through any states numbered higherthan k, we can either go from qito qjwithout going through any state numberedhigher than k − 1 (which we already know how to do), or else we can go from qitoqk(without

View Full Document


School:
Email:
New Password:
Confirm Password:

UVA CS 4610 - Programming Language Syntax

Sign up for free to view:

Please select your school