Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Introduction to AwkAwk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.AwkWorks well on record-type dataReads input file(s) a line at a timeParses each line into fieldsPerforms user-defined tests against each line, performs actions on matchesOther Common UsesInput validationEvery record have same # of fields?Do values make sense (negative time, hourly wage > $1000, etc.)?Filtering out certain fieldsSearchesWho got a zero on lab 3?Who got the highest grade?Many othersInvocationCan write little one-liners on the command line (very handy):print the 3rd field of every line:$ awk '{ print $3 }' input.txtExecute an awk script file:$ awk –f script.awk input.txtOr, use this sha-bang as the first line, and give your script execute permissions:#!/bin/awk -fForm of an AWK programAWK programs are entries of the form:pattern { action } pattern – some test, looking for a pattern (regular expressions) or C-like conditionsif null, actions are applies to every lineaction – a statement or set of statementsif not provided, the default action is to print the entire line, much like grepForm of an AWK programInput files are parsed, a record (line) at a timeEach line is checked against each pattern, in orderThere are 2 special patterns:BEGIN – true before any records are readEND – true at end of input (after all records have been read)Awk FeaturesPatterns can be regular expressions or C like conditions.Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed.Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)VariablesNot declared, nor typedNo character typeOnly strings and floats (support for ints)$n refers to the nth field (where n is some integer value) # prints each field on the linefor( i=1; i<=NF; ++i )print $iSome Built-in VariablesFS – the input field separatorOFS – the output field separatorNF – # of fields; changes w/each recordNR – the # of records read (so far). So, the current record #FNR – the # of records read so far, reset for each named file$0 – the entire input lineExample$ cat emp.dataBeth 4.00 0Dan 3.75 0Kathy 4.00 10Mark 5.00 20Mary 5.50 22Susie 4.25 18Print pay for those employees who actually worked$ awk ‘$3>0 {print $1, $2*$3}’ emp.dataKathy 40Mark 100Mary 121Susie 76.5Example – CSV file$ cat students.csvsmith,john,js12jones,fred,fj84bee,sue,sb23fife,ralph,rf86james,jim,jj22cook,nancy,nc54banana,anna,ab67russ,sam,sr77loeb,lisa,guitarHottie$ cat getEmails.awk#!/bin/awk -fBEGIN { FS = "," }{ printf( "%s's email is: %[email protected]\n", $2, $3 ); }$ getEmails.awk students.csvjohn's email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected] – output separator$ cat out.awk#!/bin/awk -fBEGIN { FS = ","; OFS = "-*-"; }{ print $1, $2, $3; }$ out.awk students.csvsmith-*-john-*-js12jones-*-fred-*-fj84bee-*-sue-*-sb23fife-*-ralph-*-rf86james-*-jim-*-jj22cook-*-nancy-*-nc54banana-*-anna-*-ab67russ-*-sam-*-sr77loeb-*-lisa-*-guitarHottieFlow ControlAwk syntax is much like CSame loops, if statements, etc.AWK: Aho, Weinberger, KernighanKernighan and Ritchie wrote the C languageAssociative ArraysAwk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables.Total[“Sue”] = 100;It is possible to loop over all indices that have currently been assigned values.for (name in Total)print name, Total[name];Example using Associative Arrays$ cat scoresFred 90Sue 100Fred 85Sam 70Sue 98Sam 50Fred 70$ cat total.awk{ Total[$1] += $2}END {for (i in Total)print i, Total[i];}$ awk -f total.awk scoresSue 198Sam 120Fred 245Useful one-linersLine count:awk 'END {print NR}'grepawk '/pat/'headawk 'NR<=10'Add line #s to a fileawk '{print NR, $0}'awk '{ printf( "%5d %s", NR, $0 )}'Many more. See the resources tab on the course webpage for links to more
View Full Document