DOC PREVIEW
ODU CS 791 - Long Term Preservation

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

"...let us save what remains: not byvaults and locks which fence themfrom the public eye and use inconsigning them to the waste oftime, but by such a multiplication ofcopies, as shall place them beyondthe reach of accident."Thomas Jefferson to Ebenezer Hazard, Philadelphia,February 18, 1791.Long Term Preservation Why It Needs to Be Cheap andHow To Make It So.Setting the Scale• Years is not enough• Generational time scale• Paper can last hundreds or thousandsof years• Running hard drives last several years•CDs may last decadesPaper• Preserving paper for centuries• Libraries– Build local collections– Share content• Documents persist for centuries– Religious texts–Political documentsDigital Publications• Increasingly are the version of record• Often, sole version of record• Change rapidly or disappear, with nowarning• Have adjuncts – e.g., hyperlinks, virtualmodelsFailure to collect digital artifacts will createa growing “dark age” of our timesWeb’s Impact on LibrariesLibraries now• Lease subscription materials• Access free materialsLibraries unable to• Own collections•Fill memory organization roleCost and Preservation• Preservation is about planning for thefuture• Cost is often tied up in the present• Preservation must be “pre-paid”• Budget cuts– Easier to justify cutting something notcurrently needed than something in useBottom LineIf a preservation solution is too expensiveand not immediately useful, it will fail.Make It CheapPreservation Costs• Hardware– Storage space– Machines to access it• People– Maintain the hardware– Deal with Security•SoftwareCut Hardware Cost• Use cheap machines– Possible problems with reliability• Be flexible in configuration– Purchased cheaply–Repurpose old machinesCut People Costs• Reduce the need for a Sys Admin• Automate as much as possible• Make the system secure out of the box– Security people are expensive–Security breaches are also expensiveCut Software Costs• Open Source– Users can maintain software– Easy to alter for local needs–Cost spread over institutions that support itShare the Costs• Build network of preservation machines• One participant leaving the networkwon’t break it• Each participant only pays a smallamount of the total cost•Shared developmentHow LOCKSS WorksLOCKSS• Open source • Peer to peer• Persistent access preservation system• Web delivered informationProduction: Released April 2004Support: Mellon, NSF, Stanford LibrariesSoftware: www.sourceforge.netLOCKSS Caches• Crawls and collects HTTP content– All formats (PDF, HTML, JPEG, TIF, Audio, Video)• Preserves content integrity– Independent collection– Cooperate to audit and repair damage• Provides access– Via web browser–Content is never “dark”LOCKSS machinesApproximate Data FlowsLOCKSS machines (proxy servers)Prevent the publisher from revoking access rights to back contentApproximate Data FlowsStructure of Implementation• Platform– PC into preservation appliance– Cheap to administer & run• Daemon– Cooperate to detect & repair damage– Proxy cache gets content to readers• Plug-ins–Adapts system to publisherPlatform Overview• Specially configured OpenBSD:– Boots and runs from CD– Downloads updates automatically• Security is top priority:– Signatures verified for all software• Low cost to administer & run:– Less than 1 hour per month–95% of systems patched in 48hrsHardware Costshttp://www.almaden.ibm.com/sst/html/leadership/g05.htmHDDpricesdeclineby 50%a yearTerabytes of E-JournalsMedian e-journal size is approximately 0.5 GBper year1 Terabyte (1000 GB) = 2000 journal yearsJ-yr storage TB/PC J-yrs/PC2003 $0.70 .54 1,0802004 $0.35 1.44 2,8802005 $0.28 2.88 5,7602006 $0.14 5.76 11,5202007 $0.07 11.5223,000Daemon: What It Does• Collect content:– Crawl publisher with help from plug-in• Preserve content:– Compare content with other peers– Repair from other peers if damaged• Distribute content:– Act as proxy cache for readers–Deliver publisher version if availablePublisher PermissionLOCKSS CrawlerSlowly collect e-journals• Publisher manifest– List top level URLs/volume on a web page– Include URLs for ‘front matter’, etc.– Descriptive metadata–Grant permission volume by volumeoptionalrequiredPermission - Publisher ManifestDaemon: Preserve Content• At intervals, ask peers to vote• Vote = hash of content:– random challenges prevent cheating• Tally agreeing & disagreeing votes– If agree votes win, copy is OK–If disagree votes win, copy damagedDaemon: Locate Damage• Top-level polls on entire volume:– E.g. a year of a journal• If damage, divide-&-conquer:– Call name polls, then content polls– Walk down URL tree– Or shrink range for flat sites•Eventually find damaged filesDaemon: Distribute Content• Intercept request from browser• Forward request to publisher– With IF_MODIFIED_SINCE header– Wait a short time• If publisher sends new version– Forward to browser•Otherwise send stored versionCollection AccessLOCKSS and Local Networks publisher is availableLOCKSSPUBPAC Fileor ProxyCollection AccessLOCKSS and Local Networks publisher is unavailableLOCKSSPUBPAC Fileor ProxyCollection AccessControlledLOCKSS caches• Can collect authorized content• Not able to “steal” contentAuthorized readers• Can access cached content when publisher isnot available; not “open access”LOCKSS prevents the publisher fromrevoking access rights to back contentLook and Feel to ReadersConfigure LOCKSS as a web proxyExample:– PNAS table of contents page• from LOCKSS cache•from web (9/11/02)Plug-in Overview• Adapts daemon to publisher:– Decide what/when to crawl– Handle publisher permission– Handle publisher authentication– Filter dynamic content for comparison• From publisher or community:– Download as signed .jar file–Registry finds appropriate plug-inWriting Plug-ins• Identify publisher requirements:– Crawl restrictions, boundaries– Dynamic content• Plug-in tool– Generates plug-ins using simple UI–~30 minutes to write & testDistributed Repository ModelTechnology• Uses many “unreliable repositories” (PCs)– Robustness through redundancy– Inexpensive consumer hardware– Low sys admin overhead (less 1 hour/mo)• Leverages web technology– HTTP delivered and displayed content, all formats– No


View Full Document

ODU CS 791 - Long Term Preservation

Documents in this Course
Load more
Download Long Term Preservation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Long Term Preservation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Long Term Preservation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?