DOC PREVIEW
ODU CS 791 - Long Term Preservation

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Long Term Preservation of Digital DataOverviewSize of the Problem to AddressGraphic of Proposed SolutionWhat Happens Now?What Happens in the Future?So What Happened Next?PDF Document Type Was SelectedClever Solution to Solve Text ExtractionHow it Works:How well did it work?MiscelleneaAdditional LinksLong Term Preservation of Digital DataRaymond A. LorieJCDL ‘01 June 24-28, 200118 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation2Overview•A proposal (IBM to Koninklijke Bibliotheek)–Save original “executable” object–Save specification on how to extract data from object–Encapsulate enough information to allow the creation of a extraction program in the future•Provides a starting point18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation3Size of the Problem to Address•Multiple levels of document complexity–Simple linear data, single data type–Moderately complex data, multiple data types and some arbitrary structure–Complex data relationships requiring preservation of environment•Moderately complex proposed for demonstration18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation4Graphic of Proposed Solution18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation5What Happens Now?•Metadata are created that describe all data in the file (based on XML model)•Methods are added that when given the file as an input, produce the original output•Methods are based a “Universal Virtual Computer” (UVC)18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation6What Happens in the Future?•Specification for UVC are “well known”•A UVC is created IAW some version level•The UVC “reads” the file and creates the original output•Allow future users to make queries against the document18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation7So What Happened Next?•Original reading was a proposal•Follow up reading was a test case•“The UVC: A Method for Preserving Digital Documents, Proof of Concept”–IBM/KB Long Term Preservation Study–December 2002–Raymond Lori–ISBN: 90-6259-157-4–http://www.kb.nl/kb/hrd/dd/dd_onderzoek/reports/4-uvc.pdf18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation8PDF Document Type Was Selected•“… because of its importance in the publishing community. …”•Difficulty extracting textual information from encoded file–Letter “A” is not stored as an ASCII A–Parameters stored to allow an “A” to be drawn18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation9Clever Solution to Solve Text Extraction18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation10How it Works:•GSview is a graphical interface for Ghostscript. Ghostscript is an interpreter for the PostScript page description language used by laser printers.•PDF is printed to a PostScript file for GSview to read•goBCL converts PDF files to HTML.•Application merges GSview images with HTMLish tags.•Allows text queries to display related page.18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation11How well did it work?•Didn’t state how many files were converted•Identified a few bugs in goBCL•Alluded to problems decoding JPEG files•Executed queries•Claimed success18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation12Miscellenea•Appendix with notational UVC architecture•Appendix with marcos to support UVC software development•Appendix containing a logical view of a PDF document18 March 2004 ODU Spring 2004 CS-891 Digital Data Preservation13Additional Links•Lorie appears to have published a fair amount about relational database systems•A list of Lorie’s publications– http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Lorie:Raymond_A=.html•Yet another UVC paper (15 June 2001)–http://www.rlg.org/preserv/diginews/diginews5-3.html•Page with all sorts of preservation


View Full Document

ODU CS 791 - Long Term Preservation

Documents in this Course
Load more
Download Long Term Preservation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Long Term Preservation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Long Term Preservation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?