New version page

GT LCC 6310 - Trading Processor Cycles for Communication

Documents in this Course
Load more

This preview shows page 1-2-3-4-5-6 out of 17 pages.

View Full Document
View Full Document

End of preview. Want to read all 17 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

LCC 6310 Computation as an Expressive MediumParsing HTMLAccessing an external class libraryThe Parser classThe try/catch blockNodeListsTraversing the hierarchical structureExample: find all the images on a pageGetting tag attributesExample: Drawing images you find on a web siteExample: Only drawing images with the given alt textFollowing linksNon-disturbing codeDisturbing codeMaking our disturbing code workExample: Following links (recursively)Putting the pieces togetherLCC 6310Computation as an Expressive MediumLCC 6310Computation as an Expressive MediumLecture 10Lecture 10Parsing HTMLParsing HTML•Parsing HTMLParsing HTML•Process and create visualizations of or manipulations of web information•Learning HtmlParserLearning HtmlParser•Available htmlparser.sourceforge.net•A java package for parsing html•Using external java libraries within processingUsing external java libraries within processingAccessing an external class libraryAccessing an external class library•Place jars (or class files) within the Place jars (or class files) within the librarieslibraries folder in processing folder in processing•For example, if the class library is called htmlparser, create an htmlparser/library folder within the libraries folder and add the jar files•Jason will go over how to use external class libraries in your exported applets•Use the Use the importimport statement to bring the external classes into your program statement to bring the external classes into your program•Class libraries live in packages•import <packagename>.*; means “make all the classes in <packagename> available for use in my program” •Package names can be hierarchical (e.g. org.htmlparser.util). • If you get an error that an htmlparser class can not be found, look in the documentation to see what package you need to import into your program•The imported packages in the example code should get your pretty farThe Parser classThe Parser class•ParserParser is the main class for parsing the html file pointed to by a URL is the main class for parsing the html file pointed to by a URL•What is parsing? To parse an html file means to turn the raw text of the What is parsing? To parse an html file means to turn the raw text of the page into structured tags that you can processpage into structured tags that you can process•Look for specific tags•Get attributes from tags•One way to parse an HTML file:One way to parse an HTML file:Parser parser = new Parser(<URL>);Parser parser = new Parser(<URL>);NodeList nodeList = parser.parse(null); NodeList nodeList = parser.parse(null); •Parser.parse(NodeFilter filter)Parser.parse(NodeFilter filter) returns a NodeList of all html nodes (tags) returns a NodeList of all html nodes (tags) that satisfy the filterthat satisfy the filter•The null filter returns a list containing all the tagsThe try/catch blockThe try/catch block•Exceptions are thrown whenever java encounters an error situationExceptions are thrown whenever java encounters an error situation•You’ve probably all run into exceptions, like the NullPointerException •A bunch of different exceptions are defined by Java, but programmers can define their own•When an exception is thrown, it travels up the call stack (the stack of method calls)When an exception is thrown, it travels up the call stack (the stack of method calls)•The default behavior for exceptions is for them to travel all the way to the top, where The default behavior for exceptions is for them to travel all the way to the top, where they terminate your program (and print out the exception)they terminate your program (and print out the exception)•Sometimes, however, you want to handle an exception yourself and keep on going. You Sometimes, however, you want to handle an exception yourself and keep on going. You do this with the do this with the try { <statements> } catch (<ExceptionClass> e) { <exception code> }try { <statements> } catch (<ExceptionClass> e) { <exception code> }•The idea is that you want your program to keep goinig, so you write special code to clean up after the error•You can declare that a method can throw specific exceptions – if you do, any caller of You can declare that a method can throw specific exceptions – if you do, any caller of the method must handle the exception with a try/catchthe method must handle the exception with a try/catch•Parser.parse() does this, so we have to handle the exceptionNodeListsNodeLists•NodeLists contain Nodes, where each node represents a tag or textNodeLists contain Nodes, where each node represents a tag or text•Nodes are hierarchical, just like the structure of html documentsNodes are hierarchical, just like the structure of html documents•Let’s look at the top-level tag structure for the syllabusfor(int i = 0; i < nodeList.size(); i++) {Node n = nodeList.elementAt(i);if (n instanceof Tag) {Tag t = (Tag)n; println(t.getTagName()); }}•Lets take a look at reading documentation…Lets take a look at reading documentation…Traversing the hierarchical structureTraversing the hierarchical structure•NodeList Node.getChildren()NodeList Node.getChildren()•Get a list of the children nodes of a node •So, to search the nodeList for specific nodes, you need to search the top level, then search within the children of the top level, and so forth•NodeList NodeList.searchFor(Class classType, boolean NodeList NodeList.searchFor(Class classType, boolean recursive)recursive)•If the second parameter is true, it will look in the children lists for you•Need to tell it what class of node you’re looking for•In Java, classes are themselves objects•To get the Class object corresponding to the class for an object, call getClass() on any objectExample: find all the images on a pageExample: find all the images on a page•First create a “throw away” instance of the tag First create a “throw away” instance of the tag you’re looking for – just need it to get your you’re looking for – just need it to get your hands on the classhands on the classImageTag tempImageTag = new ImageTag();•Use NodeList.searchFor() to create a list of just Use NodeList.searchFor() to create a list of just the image tags (searching recursively)the image tags (searching recursively)NodeList imageList = nodeList.searchFor(tempImageTag.getClass(), true);Getting tag attributesGetting tag attributes•String Tag.getAttribute(<attribute name>)String


View Full Document
Loading Unlocking...
Login

Join to view Trading Processor Cycles for Communication and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Trading Processor Cycles for Communication and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?