DOC PREVIEW
UW CSE 303 - Lecture Notes

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HW 2A in focusHW 2AEditorsed/sedExample ed session [Wikipedia]sed: non-interactive edWhat is a regular expression?Regular expressionsegrep and regexesBasic regexesWildcards and anchorsSpecial charactersQuantifiers: * + ?More quantifiersCharacter setsCharacter rangessedmore about sedBack-referencesloopsDebuggingPerformanceAlgorithmic complexitySlide 24WednesdayQuestions?David Notkin  Autumn 2009  CSE303 Lecture 6HW 2A in focusanagramraga Mancomputer science (&) engineering epic reticence ensuring gnomenotkin bearddrank one bitHW 2A•Understanding vs. doing – this is not a straightforward barrier to overcome, and it doesn’t happen all at once•Breaking down solutions into parts is crucial–Edsger W. Dijkstra, ACM Turing Lecture 1972, “The Humble Programmer”•Some amount of “finding things on your own” is essential; perhaps I expected too much of this for this assignment–sed and regular expressions–loops–…•Performance – not the high priority–Linear vs. quadratic (or worse) complexity of the scriptEditors•To write and change your programs you should be using an editor – what’s the alternative?•The most common editors on Unix are pico, emacs and vi – pico is simple, emacs is (arbitrarily) complex, and vi is still loved by many old-time Unix users•You don’t need to become an expert in these, but it’s worth an investment to become capableCSE303 Au09 3ed/sed•But you didn’t mention sed on the previous slide? Isn’t it a “stream editor”? Indeed it is.•It’s closely related to ed, a line editor from the first days of Unix–ed let you interactively edit lines, changing parts of specific lines – referred to by number and/or by content – inserting and deleting lines, etc.CSE303 Au09 4Example ed session [Wikipedia]aed is the standard Unix text editor.This is line number two..2i .%led is the standard Unix text editor.$$This is line number two.$3s/two/three/,led is the standard Unix text editor.$$This is line number three.$w text65qThe end result is a simple text file containing the following text:ed is the standard Unix text editor. This is line number three.CSE303 Au09 5sed: non-interactive ed•But sometimes you wanted to use ed-like features – in particular regular expression matching – non-interactively•That’s what sed is for – using ed-like commands on a string to do transformations that are hard or impossible to do with tr, etc.•A core feature is the use of regular expressions – these are powerful and found in other Unix tools, most noticeably grepCSE303 Au09 6What is a regular expression?•"[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}"•regular expression: a description of a pattern of text–can test whether a string matches the expression's pattern–can use a regex to search/replace characters in a string–regular expressions are powerful but can be tough to read•the above regular expression matches basic email addressesRegular expressions•Appear throughout computer science, in tools, in theory, in practice•Powerful enough to be very useful; other kinds of matching require more powerful languages than regular expressions, but they are more complex•Lots of variations, but all have the same “power” – that is, they can match the same patterns, although the expressions themselves may be more or less complicatedCSE303 Au09 8egrep and regexes egrep "[0-9]{3}-[0-9]{3}-[0-9]{4}"command description egrep extended grep; uses regexes in its search patterns; equivalent to grep -EBasic regexes•The simplest regexes simply match a particular substring: "abc"•Matches any line containing "abc"–YES : "abc","abcdef","defabc",".=.abc.=.", ...–NO : "fedcba","ab c","AbC","Bash", ...Wildcards and anchors•. (a dot) matches any character except \n–".oo.y" matches "Doocy", "goofy", "LooPy", ...–use \. to literally match a dot . character•^ matches the beginning of a line; $ the end–"^fi$" matches lines that consist entirely of "fi"•\< demands that pattern is the beginning of a word;\> demands that pattern is the end of a word–"\<for\>" matches lines that contain the word "for"Special characters•| means or–"abc|def|g" matches lines with "abc", "def", or "g“•precedence of ^(Subject|Date): vs. ^Subject|Date:•There's no and symbol. Why not?•() are for grouping–"(Homer|Marge) Simpson" matches lines containing "Homer Simpson" or "Marge Simpson“•\ starts an escape sequence: many characters must be escaped to match them: /\$.[]()^*+?Quantifiers: * + ?•* means 0 or more occurrences –"abc*" matches "ab","abc","abcc", "abccc", ... –"a(bc)*" matches "a", "abc", "abcbc", "abcbcbc", ... –"a.*a" matches "aa", "aba", "a8qa", "a!?_a", ... •+ means 1 or more occurrences –"a(bc)+" matches "abc", "abcbc", "abcbcbc", ... –"Goo+gle" matches "Google", "Gooogle", "Goooogle", ... •? means 0 or 1 occurrences –"Martina?" matches lines with "Martin", "Martina"–"Dan(iel)?" matches lines with "Dan" or "Daniel"More quantifiers•{min,max} means between min and max occurrences–"a(bc){2,4}" matches "abcbc", "abcbcbc", or "abcbcbcbc" •min or max may be omitted to specify any number –"{2,}" means 2 or more–"{,6}" means up to 6–"{3}" means exactly 3Character sets•[ ] group characters into a character set; will match any single character from the set –"[bcd]art" matches strings containing "bart", "cart", and "dart" –equivalent to "(b|c|d)art"Character ranges•Specify a range of characters with - –"[a-z]" matches any lowercase letter –"[a-zA-Z0-9]" matches any lower- or uppercase letter or digit •an initial ^ inside a character set negates it –"[^abcd]" matches any character other than a, b, c, d •inside a character set, - must be escaped to be matched –"[+\-]?[0-9]+" matches optional + or -, followed by at least one digitsed•Usage:–sed -r "s/REGEX/TEXT/g" filename•substitutes (replaces) occurrence(s) of regex with the given text•if filename is omitted, reads from standard input•sed has other uses, but most can be emulated with substitutions•Example (replaces all occurrences of 143 with 303):–sed -r "s/143/303/g" lecturenotes.txtcommand description sedstream editor; performs regex-based replacements and alterations on inputmore about sed•sed is line-oriented; processes input a line at a time•-r option makes regexes work better–recognizes ( ) , [ ] , * , + the “right” way, etc.•g flag after last / matches all


View Full Document

UW CSE 303 - Lecture Notes

Documents in this Course
Profiling

Profiling

11 pages

Profiling

Profiling

22 pages

Profiling

Profiling

11 pages

Testing

Testing

12 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?