DOC PREVIEW
FSU COP 4342 - Flex and lexical analysis

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Fall 2006 Program Development 4Flex and lexical analysisFrom the area of compilers, we get a host of tools toconvert text files into programs. The first part of thatprocess is often called lexical analysis, particularly for suchlanguages as C.A good tool for creating lexical analyzers is flex. Ittakes a specification file and creates an analyzer, usuallycalled lex.yy.c.COP 4342Fall 2006 Program Development 4Lexical analysis terms☞ A token is a group of characters having collectivemeaning.☞ A lexeme is an actual character sequence forming aspecific instance of a token, such as num.☞ A pattern is a rule expressed as a regular expressionand describing how a particular token can be formed.For example, [A-Za-z][A-Za-z_0-9]* is a rule.COP 4342Fall 2006 Program Development 4☞ Characters between tokens are called whitespace; theseinclude spaces, tabs, newlines, and formfeeds. Manypeople also count comments as whitespace, though sincesome tools such as lint/splint look at comments, thisconflation is not perfect.COP 4342Fall 2006 Program Development 4Attributes for tokensToke ns can have attributes that can be passed back tothe calling function.Constants could have the value of the constant, forinstance.Identifiers might have a pointer to a location whereinformation is kept about the identifier.COP 4342Fall 2006 Program Development 4Some general approaches to lexical analysisUse a lexical analyzer generator tool, such as flex.Write a one-off lexical analyzer in a traditionalprogramming language.Write a one-off lexical analyzer in assembly language.COP 4342Fall 2006 Program Development 4Flex - our lexical analyzer generatorIs linked with its library (libfl.a) using -lfl as acompile-time option.Can be called as yylex().It is easy to interface with bison/yacc.COP 4342Fall 2006 Program Development 4l file → lex → lex.yy.clex.yy.c and → gcc → lexical analyzerother filesinput stream → lexical analyzer → actions takenwhen rules appliedCOP 4342Fall 2006 Program Development 4Flex specificationsLex source:{ definitions }%%{ rules }%%{ user subroutines }COP 4342Fall 2006 Program Development 4Definitions☞ Declarations of ordinary C variables and constants.☞ flex definitionsCOP 4342Fall 2006 Program Development 4RulesThe form of rules are:regularexpression actionThe actions are C/C++ code.COP 4342Fall 2006 Program Development 4Flex regular expressionss string s literally\c character c literally, where c would normally be a lex operator[s] character class^ indicates beginning of line[^s] characters not in character class[s-t] range of characterss? s occurs zero or one timeCOP 4342Fall 2006 Program Development 4Flex regular expressions, continued. any character except newlines* zero or more occurrences of ss+ one or more occurrences of sr|s r or s(s) grouping$ end of lines/r s iff followed by r (not recommended) (r is *NOT* consumed)s{m,n} m through n occurences of sCOP 4342Fall 2006 Program Development 4Examples of regular expressions in flexa* zero or more a’s.* zero or more of any character except newline.+ one or more characters[a-z] a lowercase letter[a-zA-Z] any alphabetic letter[^a-zA-Z] any non-alphabetic charactera.b a followed by any character followed by brs|tu rs or tuCOP 4342Fall 2006 Program Development 4a(b|c)d abd or acd^start beginning of line with then the literal characters startEND$ the characters END followed by an end-of-line.COP 4342Fall 2006 Program Development 4Flex actionsActions are C source fragments. If it is c ompound, ortakes more than one line, enclose with braces (’{’ ’}’).Example rules:[a-z]+ printf("found word\n");[A-Z][a-z]* { printf("found capitalized word:\n");printf(" ’%s’\n",yytext);}COP 4342Fall 2006 Program Development 4Flex definitionsThe form is simplyname definitionThe name is just a word beginning with a letter (oran underscore, but I don’t recomme nd those for generaluse) followed by zero or more letters, underscore, or dash.The definition actually goes from the first non-whitespacecharacter to the end of line. You can refer to it via{name}, which will expand to (definition). (cite: thisCOP 4342Fall 2006 Program Development 4is largely from “man flex”.)Tattoueba:DIGIT [0-9]Now if you have a rule that looks like{DIGIT}*\.{DIGIT}+that is the same as writing([0-9])*\.([0-9])+COP 4342Fall 2006 Program Development 4An example Flex program/* either indent or use %{ %} */%{int num_lines = 0;int num_chars = 0;%}%%\n ++num_lines; ++num_chars;. ++num_chars;%%int main(int argc, char **argv){yylex();printf("# of lines = %d, # of chars = %d\n",num_lines, num_chars );}COP 4342Fall 2006 Program Development 4Another example programdigits [0-9]ltr [a-zA-Z]alphanum [a-zA-Z0-9]%%(-|\+)*{digits}+ printf("found number: ’%s’\n", yytext);{ltr}(_|{alphanum})* printf("found identifer: ’%s’\n", yytext);’.’ printf("found character: {%s}\n", yytext);. { /* absorb others */ }%%int main(int argc, char **argv){yylex();}COP


View Full Document

FSU COP 4342 - Flex and lexical analysis

Download Flex and lexical analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Flex and lexical analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Flex and lexical analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?