Princeton COS 333 - Structure of an AWK program - D586984

Home> Schools> Princeton University> Computer Science (COS) > COS 333> Structure of an AWK program

DOC PREVIEW

Princeton COS 333 - Structure of an AWK program

School name Princeton University

Course Cos 333- Reasoning About Computation

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1AWK• a language for pattern scanning and processing– Al Aho, Brian Kernighan, Peter Weinberger– Bell Labs, ~1977• Intended for simple data processing:• selection, validation:"Print all lines longer than 80 characters"length > 80• transforming, rearranging:"Replace the 2nd field by its logarithm"{ $2 = log($2); print }• report generation:"Add up the numbers in the first field,then print the sum and average"{ sum += $1 }END { print sum, sum/NR }Structure of an AWK program:• a sequence of pattern-action statementspattern { action }pattern { action }…• "pattern" is a regular expression, numeric expression, string expression or combination• "action" is executable code, similar to C• Operation:for each filefor each input linefor each patternif pattern matches input linedo the action• Usage:awk 'program' [ file1 file2 ... ]awk -f progfile [ file1 file2 ... ]2AWK features:• input is read automatically– across multiple files– lines split into fields ($1, ..., $NF; $0 for whole line)• variables contain string or numeric values– no declarations– type determined by context and use– initialized to 0 and empty string– built-in variables for frequently-used values• operators work on strings or numbers– coerce type according to context• associative arrays (arbitrary subscripts)• regular expressions (like egrep)• control flow statements similar to C– if-else, while, for, do• built-in and user-defined functions– arithmetic, string, regular expression, text edit, ...• printf for formatted output• getline for input from files or processesBasic AWK programs:{ print NR, $0 }precede each line by line number{ $1 = NR; print }replace first field by line number{ print $2, $1 }print field 2, then field 1{ temp = $1; $1 = $2; $2 = temp; print }flip $1, $2{ $2 = ""; print }zap field 2{ print $NF }print last fieldNF > 0print non-empty linesNF > 4print if more than 4 fields$NF > 4print if last field greater than 4NF > 0 {print $1, $2}print two fields of non-empty lines/regexpr/print matching lines (egrep)$1 ~ /regexpr/print lines where first field matchesEND { print NR }line count{ nc += length($0) + 1; nw += NF } wc commandEND { print NR, "lines", nw, "words", nc, "characters" }$1 > max { max = $1; maxline = $0 } print longest lineEND { print max, maxline }3Awk text formatter#!/bin/sh# f - format text into 60-char linesawk '/./ { for (i = 1; i <= NF; i++)addword($i) }/^$/ { printline(); print "" }END { printline() }function addword(w) {if (length(line) + length(w) > 60)printline()line = line space wspace = " "}function printline() {if (length(line) > 0)print lineline = space = ""}' "$@"Arrays• Usual case: array subscripts are integers• Reverse a file:{ x[NR] = $0 } # put each line into array xEND { for (i = NR; i > 0; i--)print x[i] }• Making an array:n = split(string, array, separator)– splits "string" into array[1] ... array[n]– returns number of elements– optional "separator" can be any regular expression4Associative Arrays• array subscripts can have any value– not limited to integers• canonical example: adding up name-value pairsInput:pizza 200beer 100pizza 500beer 50Output:pizza 700beer 150• program:{ amount[$1] += $2 }END { for (name in amount)print name, amount[name] | "sort +1 -nr" }Assembler & simulator for toy machine• hypothetical RISC machine (tiny SPARC)• 10 instructions, 1 accumulator, 1K memory# print sum of input numbers (terminated by zero)ld zero # initialize sum to zerost sumloop get # read a numberjz done # no more input if number is zeroadd sum # add in accumulated sumst sum # store new value back in sumj loop # go back and read another numberdone ld sum # print sumputhaltzero const 0sum const• assignment: write an assembler and simulator5Assembler and simulator/intepreter# asm - assembler and interpreter for simple computer# usage: awk -f asm program-file data-files...BEGIN {srcfile = ARGV[1]ARGV[1] = "" # remaining files are datatempfile = "asm.temp"n = split("const get put ld st add sub jpos jz j halt", x)for (i = 1; i <= n; i++) # create table of op codesop[x[i]] = i-1# ASSEMBLER PASS 1FS = "[ \t]+"while (getline <srcfile > 0) {sub(/#.*/, "") # strip commentssymtab[$1] = nextmem # remember label locationif ($2 != "") { # save op, addr if presentprint $2 "\t" $3 >tempfilenextmem++}}close(tempfile)# ASSEMBLER PASS 2nextmem = 0while (getline <tempfile > 0) {if ($2 !~ /^[0-9]*$/) # if symbolic addr,$2 = symtab[$2] # replace by numeric valuemem[nextmem++] = 1000 * op[$1] + $2 # pack into word}# INTERPRETERfor (pc = 0; pc >= 0; ) {addr = mem[pc] % 1000code = int(mem[pc++] / 1000)if (code == op["get"]) { getline acc }else if (code == op["put"]) { print "\t" acc }else if (code == op["st"]) { mem[addr] = acc }else if (code == op["ld"]) { acc = mem[addr] }else if (code == op["add"]) { acc += mem[addr] }else if (code == op["sub"]) { acc -= mem[addr] }else if (code == op["jpos"]) { if (acc > 0) pc = addr }else if (code == op["jz"]) { if (acc == 0) pc = addr }else if (code == op["j"]) { pc = addr }else if (code == op["halt"]) { pc = -1 }else { pc = -1 }}}Anatomy of a compilerinputtokensintermediateformobjectfilelexicalanalysissyntaxanalysiscodegenerationsymboltableinputdataa.outoutputlinking6Anatomy of an interpreterinputtokensintermediateforminputdatalexicalanalysissyntaxanalysisexecutionsymboltableoutputParsing by recursive descentexpr: term | expr + term | expr - termterm: factor | term * factor | term / factorfactor: NUMBER | ( expr )NF > 0 {f = 1e = expr()if (f <= NF) printf("error at %s\n", $f)else printf("\t%.8g\n", e)}function expr( e) { # term | term [+-] terme = term()while ($f == "+" || $f == "-")e = $(f++) == "+" ? e + term() : e - term()return e}function term( e) { # factor | factor [*/] factore = factor()while ($f == "*" || $f == "/")e = $(f++) == "*" ? e * factor() : e / factor()return e}function factor( e) { # number | (expr)if ($f ~ /^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)$/) {return $(f++)} else if ($f == "(") {f++e = expr()if ($(f++) != ")")printf("error: missing ) at %s\n", $f)return e} else {printf("error: expected number or ( at %s\n", $f)return 0}}7YACC and LEX• languages for building bigger languages• YACC: "yet another compiler compiler"(S. C. Johnson, ~ 1972)– converts a grammar and semantic actions into a parser for that grammar• LEX: lexical analyzer

View Full Document