'&$%CSE 303:Concepts and Tools for Software DevelopmentDan GrossmanWinter 2006Lecture 5— Regular Expressions (and more), grep, other utilitiesDan Grossman CSE303 Winter 2006, Lecture 5 1'&$%Where are We• We are done learning this bizarre pseudo-programming languagecalled the shell.• Today: Specifying string patterns for many utilities, particularlygrep and sed.• Monday: Homework 1 due, no class• Wednesday: sed– needed in one place for homework 2– could do that one part manually for now (?)• Friday: We start learning C.Note: Start homework 2 early.Dan Grossman CSE303 Winter 2006, Lecture 5 2'&$%Globbing vs. Regular Expressions vs. ...“Globbing” refers to filename expansion characters.“Regular expressions” are a different but overlapping set of rules forspecifying patterns to programs like grep. (Somet imes called “patternmatching”.)More distinctions:• Regular expressions a la CSE322• “Regular expressions” in grep• “Regular expressions” in egrep (same as grep -E)• More subtle distinctions per program...Dan Grossman CSE303 Winter 2006, Lecture 5 3'&$%Real Regular ExpressionsSome of the crispest, elegant, most useful CS theory out there.What computer scientists know and ill-educated hackers don’t (totheir detriment).A regular expression p may “match” a string s. If p =• a, b, ... matches the single character• p1p2, ... if we can write s as s1s2, p1matches s1, p2matchess2.• p1|p2, ... if p1matches s or p2matches s (in egrep, for grepuse \|)• p1∗, if there is an i ≥ 0 such that p1. . . p1| {z }imatches s.(for i = 0, matchines the zero-c haracter s tring).Lots of examples with egrep.Dan Grossman CSE303 Winter 2006, Lecture 5 4'&$%Why this language?Amazing facts (see 322):• Exactly the patterns that can be found by a program that can saybefore it sees its input how much space it needs. (Decide if a 1GBstring has a substring that matches...)• You can write a program that takes two regular expressions anddecides if one matches every string the other does.• ... see CSE322Dan Grossman CSE303 Winter 2006, Lecture 5 5'&$%ConveniencesLots of “conveniences” do not make the language any more powerful:• p1+ is just p1p1∗• p1? is just (|p1)• [zd-h] is just z | d | e | f | g | h• [^A-Z] and . are long but technically just conveniences.• p1{n} is just p1. . . p1| {z }n• p1{n,} is just p1. . . p1| {z }np1∗• p1{n, m} is just p1. . . p1| {z }np1? . . . p1?| {z }mDan Grossman CSE303 Winter 2006, Lecture 5 6'&$%Beginning and endReally grep is matching each line against .*p.*.You c an say that is not what you want with ^ (beginning) and $ (end)or both (match whole line exactly).I can’t think of a good reason to put these characters in the m iddle ofa pattern, but you can.Fundamentally, we are still in the realm of “real” regular expressions.Dan Grossman CSE303 Winter 2006, Lecture 5 7'&$%Nasty gotchas• Special characters for one program not s pecial for another.• For example, I found \{ for grep but { for egrep.• Must quote your patterns so the s hell does not muck with them –and use single quotes if they contain $.• Must escape special characters with \ if you need them lite rally:\. and . are very different.– But inside [] less quoting (so backslash bec ome s lite ral)!Dan Grossman CSE303 Winter 2006, Lecture 5 8'&$%Previous matches• Up to 9 times in a pattern, you can group with (p) and refer tothe matched text later! (Need backslashes in sed.)• You can refer to the text (most recently) matched by the nthonewith \n.• Simple example: double-words ^\([a-zA-Z]*\)\1$• You cannot do this w ith regular expressions; the program mustkeep the previous strings.– Espec ially useful with sed because of substitutions.Dan Grossman CSE303 Winter 2006, Lecture 5 9'&$%Other UtilitiesSome very useful programs you can learn on your own:find (search for files, e.g., find /usr -name words)diff (compare two files’ contents, output is easy for humans andprograms to read (see all patch))wc (word-count (also characters and lines))Also:For many programs the -r flag makes them recursive (apply to allfiles, subdirectories, subsubdirectories, ...).Examples: chmod, cp, diff, rm.So “delete everything on the computer” is cd /; rm -rf *(be careful!)Dan Grossman CSE303 Winter 2006, Lecture 5
View Full Document