Toronto CSC 309H - Web Programming - Client-Side HTTP

Unformatted text preview:

CSC309: Web ProgrammingGreg Wilson 11Web Programming:Client-Side HTTPGreg [email protected] 20052Small Pieces, Loosely JoinedUnix command line was the world's first component object modelAllowed programmers to build small pieces, then connect them in arbitrary waysKey features:Low barriers to entryCommon data formatCommunication protocol3…Loosely JoinedThe web succeeded (in part) because it followed the same modelData format: HTML (now XML)Communication protocol: HTTPThis lecture looks at how to use HTTP to get data over the webNext one looks at how to provide informationNext week, we'll look at what happens in between4HTTP ReviewMost common protocol on the web is HTTPRuns on top of TCP/IP, which provides reliable stream connection between two endpointsHTTP cycle:Client makes connectionSends request (request line, headers, body)Server sends response (similar format)Connection is closedCycle may repeat many times to display one logical pageCSC309: Web ProgrammingGreg Wilson 25Fetching PagesOpening sockets, constructing HTTP requests, parsing responses, etc. is tediousSo most languages provide a library for doing itPython: urllib.urlopen(URL) does what your browser would do:Parse the URL to figure out who to talk to, and what to ask forConstruct requestGive calling code something that looks like a file handle so that it can read the response6…Fetching Pagesimport urllibinput = urllib.urlopen("http://www.third-bit.com/greeting.html")lines = input.readlines()input.close()for line in lines[:5]:print line,7…Fetching PagesNote: readlines() wouldn't do the right thing if the URL referred to an imageUse read() to grab bytes in that caseUp to the client to do the right thing!8Baby SpidersExample: make a list of the links in a web pageThe first step in building a web spider that can explore the internet on its ownThat, and a search engine, and you're GoogleFetch the page, then parse it to extract the linksShould use DOM, but many web pages are badly formattedUse regular expressions insteadCSC309: Web ProgrammingGreg Wilson 39…Baby Spidersimport urllib, refrom sets import Setinput = urllib.urlopen("http://www.third-bit.com/index.html")page = input.read()input.close()links = re.findall(r'href=\"[^\"]+\"', page)temp = Set()for x in links:temp.add(x[6:-1])links = list(temp)links.sort()for x in links:print x10Passing ParametersSometimes want to provide extra information as part of a URLE.g., to specify search terms to GoogleAdd parameters to the URLhttp://www.google.com?q=Python searches for pages related to Python"?" separates parameters from the rest of the URLEach parameter is name=valueMultiple parameters separated by "&"Space replaced by "+"11URL EncodingBut what if you want to include "?" or "&" as part of a URL?Encode special charactersYes, it's another escaping mechanism…Use %XX, where XX is the hex character code%3D=%3B;%2C,%2B+%3A:%2F/%40@%3F?%26&%25%12…Passing ParametersExample: to search Google for "grade = A+":http://www.google.ca?grade+%3D+A%2BHelper functions:urllib.quote(str) replaces special charactersurllib.unquote(str) converts backurllib.urlencode(params) takes a list of pairs, or a dictionary, and constructs the entire query parameter stringCSC309: Web ProgrammingGreg Wilson 413Web ServicesSuppose you want to write a script that actually does search GoogleConstruct a URL: easySend it and read response: no problemParse the response: hm… there's a lot of junk on the page…Many first-generation web applications relied on screen scrapingProblem: whenever the web site changes its layout, the application has to be rewritten14…Web ServicesA proto-solution is to give clients information twiceOnce in the page body for humans to readOnce in the "meta" headers for machines to readNext step in evolution:Client says, "I want machine-readable XML, not human-readable HTML"Much easier to parseMuch less likely to change over timeA form of remote procedure call15Let the Shouting BeginTwo camps:Use existing HTTP for request/response, orUse a new protocol specifically for web servicesMost popular new protocol today is SOAPSimple Object Access ProtocolDespite its name, it's anything but simpleCredentials, foreign objects, blah blah blahThere are libraries to hide the details……but debugging can be a nightmare16Let the Shouting BeginTwo camps:Use existing HTTP for request/responseRepresentation State Transfer (REST)Use a new protocol specifically for web servicesMost popular new protocol today is SOAPSimple Object Access ProtocolDespite its name, it's anything but simpleLocal proxies for remote objectsLike database abstraction layersDebugging can be a nightmareCSC309: Web ProgrammingGreg Wilson 517AmazonAmazon was one of the first big players to define a web APIYou need a license key in order to use itFree keys restrict you to one request per secondUse functions in amazon.py module to search by various criteriaResult is a list of objects that match the criteriaCan now maintain a wishlist programmatically18…Amazonimport sys, amazon# Format multiple authors' names nicely.def prettyName(arg):if type(arg) in (list, tuple):arg = ', '.join(arg[:-1]) + ' and ' + arg[-1]return argif __name__ == '__main__':# Get information.key, asin = sys.argv[1], sys.argv[2]amazon.setLicense(key)items = amazon.searchByASIN(asin)19…Amazon# Handle errors.if not items:print 'Nothing found for', asinif len(items) > 1:print len(items), 'items found for', asin# Display information.item = items[0]productName = item.ProductNameourPrice = item.OurPriceauthors = prettyName(item.Authors.Author)print '%s: %s (%s)' % (authors, productName, ourPrice)20Everybody Can PlayYou can write similar code to talk to:GoogleFedExeBay/PayPalAnd on, and on…Question 1 of Exercise 2 will ask you to do thisOdds are good your next employer will as wellCSC309: Web ProgrammingGreg Wilson 621SummaryHuman activities have natural timescalesSip of coffee, fresh pot, tomorrow, sometime…Real revolutions occur when we move something from one category to anotherSpreadsheetsDesktop publishingWeb services make it possible for ordinary programmers to create distributed applications without heroic effortSo,


View Full Document

Toronto CSC 309H - Web Programming - Client-Side HTTP

Download Web Programming - Client-Side HTTP
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Web Programming - Client-Side HTTP and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Web Programming - Client-Side HTTP 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?