Interoperability in Digital Libraries Open Archives Initiative and the NSDL CS 502 20020326 Carl Lagoze Cornell University Acknowledgements Bill Arms Herbert Van de Sompel Cornell CS 502 20020307 Beyond the walls The Library should selectively adopt the portal model for targeted program areas By creating links from the Library s Web site this approach would make available the everincreasing body of research materials distributed across the Internet The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users but it would not house local copies of materials or assume responsibility for long term preservation LC21 Digital Strategy for the Library of Congress page 5 Cornell CS 502 20020307 A portal should mean more than access Traditional portal e g Yahoo linkage with limited responsibility Hybrid Portal Asserting some semblance of curatorial role over linked resources Providing a rich fabric of services across those resources Cornell CS 502 20020307 Interoperability standards enable service creation Search and discovery Z39 50 Metadata vocabularies and syntax MARC Dublin Core XML RDF Object models METS FEDORA Cornell CS 502 20020307 Cost Interoperability Trade offs Metadata Harvestin g Z39 50 SGML HTTP Dublin Google Core Functionality Cornell CS 502 20020307 Yes its about resource discovery over distributed collections metadata Author Title Abstract Identifer Cornell CS 502 20020307 Facilitating Monitoring Longevity of Distributed Content a c tio n s Preservation Service P o li c y E n f o r c e r E vent R e c o rd s S e le c tive W e b C raw lin g W e b S ite Cornell CS 502 P1 A1 P2 A2 P3 A3 M e ta d ata H a rve s tin g P r e se r v a t io n M e t a d a t a P r e se r v a t io n M e t a d a t a M anaged R e p o s ito ry M anaged R e p o s ito ry W e b S ite 20020307 Personalization of Content View A View Slides View Video View synchronized presentation using applet Portal A View B Get Transcript of Audio Search for keyword Get Slides translated to French Portal B Tool Repository structural metadata DigitalObject Powerpoint presentation Cornell CS 502 SMIL synchronization metadata 20020307 Realaudio video Cross Repository Reference Linking Linkage Service citation metadata Cornell CS 502 citation metadata citation metadata 20020307 citation metadata citation metadata Origins of the OAI Increasing interest in alternative scholarly publishing solutions e g LANL arXiv Increasing impact through federation UPS Mtg Sante Fe October 1999 Representatives of various ePrint library publishing communities Goal definition of an interoperability framework among ePrint providers Result Santa Fe Convention interoperability through metadata harvesting Cornell CS 502 20020307 Open Archives Political Agenda Author self archiving of E Prints Mission to reformulate scholarly publishing framework Technical Infrastructure to facilitate interoperability across multiple domains Cornell CS 502 20020307 Technical Umbrella for Practical Interoperability Reference Libraries Museums Publishers E Print Archives that can be exploited by different communities Cornell CS 502 20020307 OAI Technical Infrastructure Key technical features Deploy now technology 80 20 rule Two party model providers data providers and consumers service providers Simple HTTP encoding XML schema for some degree of protocol conformance Extensibility Multiple item level metadata Collection level metadata Cornell CS 502 20020307 The World According to OAI Service Providers Discovery Current Awareness Metadata harvesting Data Providers Cornell CS 502 20020307 Preservation Content and Metadata Item metadata record 010010 Cornell CS 502 resource repository 20020307 Cornell CS 502 20020307 OAI PMH History Version 1 0 January 21 2001 Version 1 1 July 2 2001 W3C XML schema changes Version 2 0a March 1 2002 Production release June 3 2002 No major functionality changes Numerous functional tweaks Harvesting granularity flow control error handling Cornell CS 502 20020307 Key Features of the OAI Metadata Harvesting Protocol definitions concepts repository record identifier datestamp set protocol features HTTP encoding metadata prefix schema flow control protocol requests supporting requests harvesting requests Cornell CS 502 20020307 repository support data harvesting data Cornell CS 502 h a r v e s t e r oai protocol 20020307 r e p o s i t o r y items record record header identifier oai eg 001 identifier datestamp 1999 01 01 datestamp header metadata dc xmlns http purl org dc title My Example title dc metadata about ea xmlns http www arXiv org ea usage No restrictions usage ea about record Cornell CS 502 20020307 protocol support format specific metadata communityspecific record data identifiers locally unique key for extracting a record from a repository oai identifier oai archive identifier record identifier Registered URI Scheme example Cornell CS 502 Unique ID within archive syntax is archiveoai ncstrl ncstrl cornellcs TR94 1418 Archive specific Idendifier Registered within OAI 20020307 selective harvesting datestamps harvest within date range record record Cornell CS 502 20020307 r e p o s i t o r y selective harvesting sets harvest within set record record record Cornell CS 502 20020307 r e p o s i t o r y S1 S2 set specifics repositories define hierarchical organization each item in a repository may be organized in one set several sets or no sets at all meaning of sets or of set hierarchy is not defined in protocol individual communities may formulate common set configurations Cornell CS 502 20020307 HTTP encoding requests BASE URL an oa org OAI script keyword arguments verb ListIdentifers set S1 GET http an oa org OAI script verb ListIdentifers set S1 POST POST http an oa org OAI script HTTP 1 0 Content Length 78 Content Type application x www form urlencoded verb ListIdentifers set S1 Cornell CS 502 20020307 HTTP encoding responses xml version 1 0 encoding UTF 9 GetRecord xmlns http oai namespace uri xmlns xsi http w3 namespace uri xsi schemaLocation http oai namespace uri http oai schemaURL responseDate 2000 19 01T19 30 30 04 00 responseDate requestURL http an oa org OAI script verb GetRecord amp identifier oai 3AarXiv 3A0001 amp metadataPrefix oai dc requestURL record record contents record additional records GetRecord Cornell CS 502 20020307 xml namespace s response header response data metadata prefix and schema support for harvesting multiple metadata formats metadata
View Full Document
Unlocking...