1AWK• a language for pattern scanning and processing– Al Aho, Brian Kernighan, Peter Weinberger– Bell Labs, ~1977• Intended for simple data processing:• selection, validation:"Print all lines longer than 80 characters"length > 80• transforming, rearranging:"Replace the 2nd field by its logarithm"{ $2 = log($2); print }• report generation:"Add up the numbers in the first field,then print the sum and average"{ sum += $1 }END { print sum, sum/NR }Structure of an AWK program:• a sequence of pattern-action statementspattern { action }pattern { action }…• "pattern" is a regular expression, numeric expression, string expression or combination• "action" is executable code, similar to C• Operation:for each filefor each input linefor each patternif pattern matches input linedo the action• Usage:awk 'program' [ file1 file2 ... ]awk -f progfile [ file1 file2 ... ]2AWK features:• input is read automatically– across multiple files– lines split into fields ($1, ..., $NF; $0 for whole line)• variables contain string or numeric values– no declarations– type determined by context and use– initialized to 0 and empty string– built-in variables for frequently-used values• operators work on strings or numbers– coerce type according to context• associative arrays (arbitrary subscripts)• regular expressions (like egrep)• control flow statements similar to C– if-else, while, for, do• built-in and user-defined functions– arithmetic, string, regular expression, text edit, ...• printf for formatted output• getline for input from files or processesBasic AWK programs:{ print NR, $0 }precede each line by line number{ $1 = NR; print }replace first field by line number{ print $2, $1 }print field 2, then field 1{ temp = $1; $1 = $2; $2 = temp; print }flip $1, $2{ $2 = ""; print }zap field 2{ print $NF }print last fieldNF > 0print non-empty linesNF > 4print if more than 4 fields$NF > 4print if last field greater than 4NF > 0 {print $1, $2}print two fields of non-empty lines/regexpr/print matching lines (egrep)$1 ~ /regexpr/print lines where first field matchesEND { print NR }line count{ nc += length($0) + 1; nw += NF } wc commandEND { print NR, "lines", nw, "words", nc, "characters" }$1 > max { max = $1; maxline = $0 } print longest lineEND { print max, maxline }3Awk text formatter#!/bin/sh# f - format text into 60-char linesawk '/./ { for (i = 1; i <= NF; i++)addword($i) }/^$/ { printline(); print "" }END { printline() }function addword(w) {if (length(line) + length(w) > 60)printline()line = line space wspace = " "}function printline() {if (length(line) > 0)print lineline = space = ""}' "$@"Arrays• Usual case: array subscripts are integers• Reverse a file:{ x[NR] = $0 } # put each line into array xEND { for (i = NR; i > 0; i--)print x[i] }• Making an array:n = split(string, array, separator)– splits "string" into array[1] ... array[n]– returns number of elements– optional "separator" can be any regular expression4Associative Arrays• array subscripts can have any value– not limited to integers• canonical example: adding up name-value pairsInput:pizza 200beer 100pizza 500beer 50Output:pizza 700beer 150• program:{ amount[$1] += $2 }END { for (name in amount)print name, amount[name] | "sort +1 -nr" }Assembler & simulator for toy machine• hypothetical RISC machine (tiny SPARC)• 10 instructions, 1 accumulator, 1K memory# print sum of input numbers (terminated by zero)ld zero # initialize sum to zerost sumloop get # read a numberjz done # no more input if number is zeroadd sum # add in accumulated sumst sum # store new value back in sumj loop # go back and read another numberdone ld sum # print sumputhaltzero const 0sum const• assignment: write an assembler and simulator5Assembler and simulator/intepreter# asm - assembler and interpreter for simple computer# usage: awk -f asm program-file data-files...BEGIN {srcfile = ARGV[1]ARGV[1] = "" # remaining files are datatempfile = "asm.temp"n = split("const get put ld st add sub jpos jz j halt", x)for (i = 1; i <= n; i++) # create table of op codesop[x[i]] = i-1# ASSEMBLER PASS 1FS = "[ \t]+"while (getline <srcfile > 0) {sub(/#.*/, "") # strip commentssymtab[$1] = nextmem # remember label locationif ($2 != "") { # save op, addr if presentprint $2 "\t" $3 >tempfilenextmem++}}close(tempfile)# ASSEMBLER PASS 2nextmem = 0while (getline <tempfile > 0) {if ($2 !~ /^[0-9]*$/) # if symbolic addr,$2 = symtab[$2] # replace by numeric valuemem[nextmem++] = 1000 * op[$1] + $2 # pack into word}# INTERPRETERfor (pc = 0; pc >= 0; ) {addr = mem[pc] % 1000code = int(mem[pc++] / 1000)if (code == op["get"]) { getline acc }else if (code == op["put"]) { print "\t" acc }else if (code == op["st"]) { mem[addr] = acc }else if (code == op["ld"]) { acc = mem[addr] }else if (code == op["add"]) { acc += mem[addr] }else if (code == op["sub"]) { acc -= mem[addr] }else if (code == op["jpos"]) { if (acc > 0) pc = addr }else if (code == op["jz"]) { if (acc == 0) pc = addr }else if (code == op["j"]) { pc = addr }else if (code == op["halt"]) { pc = -1 }else { pc = -1 }}}Anatomy of a compilerinputtokensintermediateformobjectfilelexicalanalysissyntaxanalysiscodegenerationsymboltableinputdataa.outoutputlinking6Anatomy of an interpreterinputtokensintermediateforminputdatalexicalanalysissyntaxanalysisexecutionsymboltableoutputParsing by recursive descentexpr: term | expr + term | expr - termterm: factor | term * factor | term / factorfactor: NUMBER | ( expr )NF > 0 {f = 1e = expr()if (f <= NF) printf("error at %s\n", $f)else printf("\t%.8g\n", e)}function expr( e) { # term | term [+-] terme = term()while ($f == "+" || $f == "-")e = $(f++) == "+" ? e + term() : e - term()return e}function term( e) { # factor | factor [*/] factore = factor()while ($f == "*" || $f == "/")e = $(f++) == "*" ? e * factor() : e / factor()return e}function factor( e) { # number | (expr)if ($f ~ /^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)$/) {return $(f++)} else if ($f == "(") {f++e = expr()if ($(f++) != ")")printf("error: missing ) at %s\n", $f)return e} else {printf("error: expected number or ( at %s\n", $f)return 0}}7YACC and LEX• languages for building bigger languages• YACC: "yet another compiler compiler"(S. C. Johnson, ~ 1972)– converts a grammar and semantic actions into a parser for that grammar• LEX: lexical analyzer
View Full Document