DOC PREVIEW
ODU CS 791 - Lecture Notes

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Transparent Format Migration of Preserved Web ContentFormat MigrationFormat Obsolescence of Web ContentMigration of Obsolete FormatsMigration IssuesThe LOCKSS SystemLOCKSS Format MigrationProof of ConceptHTTP Format NegotiationFormat Negotiation ExamplesFormat Negotiation IllustrationFuture Work for LOCKSSTOM (Typed Object Model)TOMTOM ApplicationsJHOVEJHOVE Use in RepositoryJHOVE and LOCKSSConclusionTransparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. MorabitoLib Magazine, 11(1), 2005http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html Slides by Frank McCownOld Dominion UniversityMarch 17, 2005Format MigrationWhat is it?Conversion of older DO format to current formatWhat other major digital preservation strategy could be used?EmulationOriginal DO format is preserved and presented to the userWhen should a DO be migrated to a new format?Format change does not imply obsolescenceFormat Obsolescence of Web ContentWeb format is obsolete when widely used browsers can no longer present the contentBackwards compatibility of browsers a mustHTML 4 vs. XHTMLOld Web formats die slowlyHow many can you think of?Emulation is difficult to implementFind older browser, original plug-in, etc.Migration of Obsolete FormatsThree migration pointsMigration on ingestConvert all incoming objects into selected format before preservingBatch migrationConvert all preserved objects into new format when preserved format is perceived to be obsoleteMigration on accessConvert preserved object into new format on-the-fly when requested by a userMigration IssuesKeep original format in case conversion tool is later found to have a bug or lost vital info when convertingConversion tool should be preserved to document original format and in case bug is found in toolChoose migration format wisely – it can significantly reduce the need and cost for future migrationsThe LOCKSS SystemLOCKSS1 - Lots Of Copies Keep Stuff Safe™Developed at Stanford UniversityOpen source, P2P software Used by libraries to ensure web accessible content (e-journals and open access material), remains available at all timesEach peer collects material to preserve by crawling publisher’s web sitePeers continually perform content consistency checks and repair content when neededPreserved material is transparently presented to user if publisher’s copy is not available (using web proxy)Currently used by 80 libraries worldwide1http://lockss.stanford.eduLOCKSS Format MigrationPlug-in format converter registers input/output MIME typesIANA MIME types - http://www.iana.org/assignments/media-types/ LOCKSS web proxy uses plug-in converters to perform on-the-fly conversion of obsolete formats (migration on access)Converters are preserved along with web content among peersProof of ConceptConvert “obsolete” GIF images to PNGProxy Web server prevents MIME type image/gif from matching any Accept: header Mismatch prompts conversion so content is delivered using the original URL but with Mime-Type=image/png. Images from Fig 1 and 2 at http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.htmlHTTP Format NegotiationBrowser can tell a web server a format is obsolete by telling it not to send that formatHTTP/1.11 defines how web servers and client browsers negotiate the format, language, and encoding of web contentBrowser sends request using Accept: header listing acceptable MIME types of content format1http://www.w3.org/Protocols/rfc2616/rfc2616.htmlFormat Negotiation ExamplesAccept: text/plain;q=0.5, text/xml;q=0.8, text/html “I prefer text/html first, text/xml second, and finally text/plain.”*/*;q=0.1“If you can’t give me what I want, give me what you have.”image/*, image/gif;q=0“Send me any kind of image except GIFs.”NOTE: q=0 semantics are not actually defined in HTTP/1.1Format Negotiation IllustrationBrowserLOCKSS ProxyWeb ServerHTTP RequestAccept: */*;q=0.1,image/gif;q=0HTTP ResponseContent-Type: image/pngGIFGIF to PNG ConverterPNGI’ll take whatever you have except obsolete GIF images.All I have are GIFs. I’ll convert them to a format the browser can handle.Future Work for LOCKSSReplace proof-of-concept implementation with complete implementation with API for plug-in convertersUse a format migration service like TOMUse JHOVE format metadata extraction and validation technology to improve the quality of format metadataTOM (Typed Object Model)Came from John Ockerbloom’s Ph.D. thesis at Carnegie Mellon1Currently managed by developers at Univ of Pennsylvania Library led by OckerbloomAddresses the problem of increasingly new and obsolete data formats that makes using digital information problematicTOM makes it possible toExplain a data formatInterpret the format for proper data extractionConvert the format into other formats1http://tom.library.upenn.edu/pubs/thesis/TOMTwo componentsData Model that describes data formats and operations that can be performed on themNetworked software that supports the description and operations of the data formatsFigure from http://tom.library.upenn.edu/intro.htmlTOM ApplicationsTOM example brokerhttp://tom.library.upenn.edu/cgi-bin/typebrowse/showtype?broker=tom%2elibrary%2eupenn%2eedu& TOM Conversion Servicehttp://tom.library.upenn.edu/convert/ Could be used by LOCKS for format migrationFred (Format Registry Demonstration)http://tom.library.upenn.edu/fred/JHOVEJSTOR/Harvard Object Validation Environment1Provides functions to perform format-specific identification, validation, and characterization of digital objects IdentificationWhat is the format of my digital object?ValidationIs my digital object really of type X?CharacterizationWhat are the significant properties of my digital object of type X?GIF examplehttp://hul.harvard.edu/jhove/gif-hul.html 1http://hul.harvard.edu/jhove/JHOVE Use in RepositoryFigure from http://hul.harvard.edu/jhove/Submission Information Package (SIP) - OAISJHOVE and LOCKSSJHOVE generates reliable format metadataLOCKSS can use JHOVE to extract quality metadata about the contents of its repositoryWhat if object to store is not valid? It may be easier to write a conversion tool using JHOVE to supply format metadataConclusionGoal is to ensure obsolete formats will not make current LOCKSS content


View Full Document

ODU CS 791 - Lecture Notes

Documents in this Course
Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?