CSE 5317Lecture'2:'Language'implementa1on21'Jan'2010Nate'NystromUniversity'of'Texas'at'ArlingtonFriday, February 5, 2010Language definitionSyntax'defines'the'structure'of'a'program◾set'of'rules'defining'which'symbols'are'a'legall y'structured'programSeman)cs'defines'the'“meaning”'of'a'program◾without'seman1cs,'programs'are'just'sequences'of'charactersNext'week:'syntaxMost'of'the'rest'of'the'course:'seman1cs2Friday, February 5, 2010SyntaxSyntax'defined'by'a'conte xt-free'grammarExample:◾Exp'::='Exp'+'Term'|'Term◾Term'::='Term'*'Factor'|'Factor◾Factor'::='INTEGER◾Expressions'with'+'and'*'operators'over'integers◾*"has"higher"precedence"than"+◾1+2*3"parses"as"1+(2*3)"not"as"(1+2)*33Friday, February 5, 2010SemanticsDefines'the'meaning'of'programsOWen'described'informally◾in'a'book'(e.g.,'the'Java'Language'Specifica1on)◾or,'by'the'implementa1on'(e.g.,'Python)Be]er'is'to'describe'the'seman1cs'formally◾Unambiguous◾Allows'you'to'check'that'language'is'implemented'correctly◾Can'generate'some'(or'all!)'of'the'implementa1on'automa1cally◾Can'reason'formally'about'programs4Friday, February 5, 2010Formal semanticsSeveral'different'ways:◾opera1onal'seman1cs'a'rewrite'rules◾will"discuss"briefly"later"in"the"semester◾denota1onal'seman1cs'a'program'as'mathema1cal'formula◾requires"deep"mathema>cs"(domain"theory)–not"going"to"discuss◾axioma1c'seman1cs'a'program'as'logical'statements◾less"useful–not"going"to"discuss5Friday, February 5, 2010Operational semanticsSeman1cs'given'by'a'set'of'syntac1c'rewrite'rulese.g.,''''if'true'then's1'else's2'➝'s1e.g.,''''if'false'then's1'else's2'➝'s2e.g.,''''if'e'then's1'else's2'➝'if'e’'then's1'else's2''''''''''''''''''''''when'e'➝'e’e.g,.''''while'e'do's'➝,if'e'then'{'s;'while'e'do's'}Trivial'to'translate'opera1onal'seman1cs'into'an'implementa1on6Friday, February 5, 2010Language implementation◾Compiler◾a'program'that'translates'an'executable'program'in'one'language'into'an'executable'program'in'anoth er'language◾usually'higheralevel'to'loweralevel◾e.g.,"C"to"x86,"Java"to"JVM"bytecode,"C++"to"C◾“tran slator”'a'target'code'is'in'a' simil ar'lan guage◾“decompiler”'a'target'code'is'in'a'higheralevel'language◾“assembler”'a'source'is'assembly'code,'target'is'machine'code◾Interpreter◾a'program'that'reads'an'executable'pro gram'and'produces'the'results'of'running'that'program7Friday, February 5, 2010Compilers◾recognize'legal'(and'ill egal)'programs◾generate'correct'code◾manage'storage'of'all'variables'and'code◾agree'on'format'for'target'(object,'assembly)'code8compilersource"codetarget"codeerrorsFriday, February 5, 2010Interpreters◾recognize'legal'(and'ill egal)'programs◾eval uate'program9interpretersource"codeoutputerrorsFriday, February 5, 2010Traditional two-pass compiler◾intermediate'representa1on'(IR)◾front'end'maps'legal'code'to'IR◾back'end'maps'IR'onto'target'machine◾simplify'retarge1ng◾allow'mul1ple'front'ends10front"endsource"codetarget"codeerrorsback"enderrorsIRFriday, February 5, 2010Traditional interpreter◾intermediate'representa1on'(IR)◾front'end'maps'legal'code'to'IR◾eval uator'interprets'program,'producing'output11front"endsource"codeoutputerrorsevaluatorerrorsIRFriday, February 5, 2010Front end◾Responsibili1es:◾recognize'legal'code◾report'errors◾produce'IR◾preliminary'storage'map◾shape'the'code'for'the'back'end◾Much'of'the'front'end'construc1on'can'be'automated12scannersource"codeerrorsparsererrorstokensseman>c"analysiserrorsIR IRFriday, February 5, 2010Scanner◾maps'characters'into'tokens◾x'='x'+'y◾=>◾ID(“x”)'='ID(“x”) '+'ID(“ y”)◾character'string'value'for'a'token'is'lexeme◾typical'tokens:'NUMBER,'ID,'+,'a,'*,'/,'if,'while◾eliminates'whitespace◾key'issue'is'speed◾oWen'use'specialized'scanner'vs.'generated'one'(via'lex)13Friday, February 5, 2010ScanningPreaprocess'the'input'before'parsingBreak'stream'of'bytes'into'a'stream'of'tokens◾IF,"WHILE,"INT,"STRING,"INTScanner"eliminates"whitespace,"commentsSimplifies"the"parser◾deal"with"~100"tokens"vs."65536"characters◾smaller"state"machineUses"less"memory,"faster14Friday, February 5, 2010Parser◾recognize'context afree'syntax◾guide'contextasensi1ve'analysis◾construct'IR◾produce'meaningful'error'messages◾a]empt'error'correc1on◾parser'generators'mechanize'much'of'this'work15Friday, February 5, 2010Parsing16Turns'stream'of'bytes'into'an'inamemory'representa1on'of'the'programMany'approaches:◾parser4generators'for'different'classes'of'grammars◾top\down:"LL(1),"LL(k),"LL(*),"PEG◾bo_om\up:"LR(1),"LALR(1),"LR(k),"GLR◾implement'by'hand◾LL(k)"grammars"can"be"implemented"by"a"recursive\descent"parserFriday, February 5, 2010Context-free syntax◾Specify'with'a'grammar◾SheepNoise'::='baa'|'baa'SheepNoise◾Format'is'called'BackusaNaur'Form'(BNF)◾Formally,'a'grammar'G'='(S,N,T,P)◾S'is'the'start'symbol'(in'N)◾N'is'the'set'of'nonterminals'(e.g.,'SheepNoise)◾T'is'the'set'of'terminals'(e.g.,'baa,'NUMBER,'+)◾P'is'the'set'of'prod uc1on s'or'rewrite'rules'(in'Na>'(N'U'T))17Friday, February 5, 2010Backus-Naur FormSet'of'terminals,'a1,'...,'am◾characters'or'tokensSet'of'nonterminals,'A1,'... ,'AnDesignate'one'nonterminal'as'the'start4symbolSet'of'rules'of'the'form,'A'::='X1'...'Xk◾X'is'a'symbol,'either'a'terminal'or'nonterminal18Friday, February 5, 2010A more useful grammar◾Goal'::='Expr◾Expr'::='Expr'Op'Term◾Expr'::='Term◾Term'::='NUMBER◾Term'::='ID◾Op'::='+◾Op'::='a◾S'='Goal◾N'='{Goal,'Expr,'Term,'Op}◾T'='{NUMBER,'ID,'+,'a}◾P'='{Goal'::='Expr,'...,'Op'::='a}19Friday, February 5, 2010Derivations◾Given'a'grammar,'valid4sentences'can'be'derived'by'repeated'subs1tu1on◾Goal◾Expr◾Expr'Op'Term◾Expr'Op'y◾Expr'a'y◾Expr'Op'Term'a'y◾Expr'Op'2'a'y◾Expr'+'2'a'y◾Term'+'2'a'y◾x'+'2'a'y◾To'recognize'a'valid'sentence'in'some'CFG,'we'reverse'the'process'and'build'up'a'parse20Friday, February 5, 2010Front endA parse can be represented by a tree called a pars e or syntax tree2><num:<id:x><id:>ygoaloptermopexprexpr termexprterm-+Obviously, this contains a lot of unnecessary information19Parse trees◾Can'represent'a'parse'with'a'tree◾Contains'a'lot'of'unnecessary'informa1on21Friday, February 5, 2010Abstract syntax trees◾Much'more'concise'than'p arse'tree◾Abstract'syntax'trees'(ASTs)'are'oWen'used'as'an'IR'between'front'end'and'back'end22Front endSo, compilers often use an abstract syntax
View Full Document