CSE 399-004, Spring 2006Python ProgrammingHandout 4www.seas.upenn.edu/~cse39905Course Plan•Part I: Fundamentals•Syntax•Data Structures•Basic Functional Programming•Part II•Regular Expressions•Object-Oriented Programming•Lazy Functional Programming•Part III: Special Topics•AI Search•Comparisons w/ Ruby/OCaml/Scheme•Unicode and Multilinguality2Today•Command-line Arguments•Basic Networking•String Formatting3Command-Line ArgumentsSimple Arguments•sys.argv5C: void main (int argc, char **argv)Java: public static void main (String argv[])import sysfor arg in sys.argv: print argtest.py$ python test.py test.py$ python test.py abc def test.pyabcdef$ python test.py --help test.py--help$ python test.py -m kant.xml test.py-mkant.xmlargv[0] is always the program itself(like C, but unlike Java)Optional Arguments•getopt6turnin -c cse39905 -p hw1 a.py b.py>>> import getopt>>> arglist = '-a -b -cfoo -d bar a1 a2'.split()>>> arglist['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2']>>> opts, args = getopt.getopt(arglist, 'abc:d:')>>> opts[('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')]>>> args['a1', 'a2']>>> getopt.getopt("a b -c d".split(), "abc:d")([], 'a b -c d')>>> getopt.getopt("-a b -c".split(),"a:c:")getopt.GetoptError: option -c requires argumentoptionsadditionalargumentsalways precedeLong Option Names7>>> s = '--condition=foo --testing --output-file \... abc.def -x a1 a2'>>> args = s.split()>>> args['--condition=foo', '--testing', '--output-file', 'abc.def', '-x', 'a1', 'a2']>>> optlist, args = getopt.getopt(args, 'x', ... ['condition=', 'output-file=', 'testing'])>>> optlist[('--condition', 'foo'), ('--testing', ''), ('--output-file', 'abc.def'), ('-x', '')]>>> args['a1', 'a2']Typical Example8import getopt, sysdef main(): try: opts, args = getopt.getopt(sys.argv[1:], "ho:v", \ ["help", "output="]) except getopt.GetoptError: # print help information and exit: usage() sys.exit(2) output = None verbose = False for o, a in opts: if o == "-v": verbose = True if o in ("-h", "--help"): usage() sys.exit() if o in ("-o", "--output"): output = a # ...if __name__ == "__main__": main()Basic NetworkingFetching a Web Page10>>> import urllib2>>> url = 'http://tycho.usno.navy.mil/cgi-bin/timer.pl'>>> for line in urllib2.urlopen(url):... if 'EDT' in line:... print line... <BR>Apr. 02, 10:27:28 AM EDT US Naval Observatory Master Clock TimeApr. 03, 14:27:28 UTC Apr. 03, 10:27:28 AM EDT Apr. 03, 09:27:28 AM CDT Apr. 03, 08:27:28 AM MDT Apr. 03, 07:27:28 AM PDT Apr. 03, 06:27:28 AM AKDT Apr. 03, 04:27:28 AM HASTTime Service Department, US Naval Observatory<html><body><TITLE>What time is it?</TITLE><H2> US Naval Observatory Master Clock <BR>Apr. 03, 14:27:28 UTC<BR>Apr. 03, 10:27:28 AM EDT<BR>Apr. 03, 09:27:28 AM CDT<BR>Apr. 03, 08:27:28 AM MDT<BR>Apr. 03, 07:27:28 AM PDT<BR>Apr. 03, 06:27:28 AM AKDT<BR>Apr. 03, 04:27:28 AM HAST</H3></B><P><A HREF="http://tycho.usnoObservatory</A></body></html>Sending Emails11>>> import smtplib>>> server = smtplib.SMTP('smtp.seas.upenn.edu')>>> server.sendmail('[email protected]','[email protected]', "hi"){}>>> server.quit()Six Degrees of Separation•HW 3 Problem 1, involving•parsing HTMLs•using regular expressions•depth-first search•Input: command-line arguments:•[-d max] [-h] [--help] URL1 URL2•default max is 6•Output: •shortest-path within max links, or •“unreachable within max links”12PennDirectoriesPersonal PagesSEAS Personal Pagesmy homepagewww.cis.upenn.edu/~lhuang3James W.www.seas.upenn.edu/~jswalkerRegular Expressionspart of this is based on “Regular Expression Howto”http://www.amk.ca/python/howto/regex/String Pattern Matching•Unix command ls *.txt or ls hw?.p*•Python Regular Expression is different: (), *, +, ?•*: repeating 0 or more times•ab*d matches ad, abd, abbd, ...•a(bcd)*d matches ad, abcdd, abcdbcdd, ...•+: repeating 1 or more times•ab+d matches abd, abbd, ...•a(bcd)+d matches abcdd, abcdbcdd, ...•?: 0 or 1 times14Character Class• | means “or”: (aa|bb) matches aa or bb•[abc] matches a, b, or c•or simply [a-c]•equivalent to (a|b|c) •[abc]+ matches a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, ...•[^5] matches any char except 5•[^0-9] matches any char except a digit•a[bcd]* matches many more than a(bcd)*15Matching is Greedy•by “matching” we mean matching the beginning portion of a string•a(bc)+ matches the underlined part in abcbcd•greedy search with backtracking•a(bcd)*b matches abcdb, abcdbcd, abcd•try match the pattern a[bcd]*b with string abcbd16Escape Characters•characters with special meanings•. ^ $ * + ? { } [ ] \ | ( )• \(ab\) matches (ab)• . matches any single character• .* matches any string• \\ matches \• ^ matches the beginning of a line or string• not the ^ inside char-classes [^...]• $ matches the end of a line or string•a[bcd]*b$ does not match string abcbd17Special Char Classes18[\s,.] matches any white spaces, “,” , or “.”\b means word-boundary (zero-length): \b\w+\b matches a single word (actually \b\w+ is enough)Performing Matches19>>> import re>>> re.match('[a-z]+', "")None>>> p = re.compile('[a-z]+')>>> p<_sre.SRE_Pattern object at 80c3c28>>>> p.match("")>>> print p.match("")None>>> m = p.match( 'tempo')>>> print m<_sre.SRE_Match object at 80c4f68>compiled version is faster for repeated use•match() returns None if failed, or a matched objectMatch vs. Search•match() determines if pattern matches at the beginning of a string•search() scans through the string to see if any substring matches20>>> print p.match('::: message')None>>> m = p.search('::: message')>>> print m<re.MatchObject instance at 80c9650>>>> m.group()'message'>>> m.span()(4, 11)findall() vs. finditer()•findall() returns a list of all substrings that matches•finditer() returns an iterator of matched objects21>>> p = re.compile('\d+')>>> s = '12 drummers, 11 pipers, 10 lords'>>> p.findall()['12', '11', '10']>>> iterator = p.finditer(s)>>> iterator<callable-iterator object at 0x401833ac>>>> for match in iterator:... print match.span()...(0, 2)(13, 15)(24, 26)Groups22>>> import re>>> p = re.compile(r'(\w+)\s+(\d+)')>>> s = " I teach cse 399 and cis 500. ">>> p.findall(s)[('cse', '399'), ('cis', '500')]>>> for m in p.finditer(s):... print m.group(), m.groups()... cse
View Full Document