View Full Document

The Digital Library of India Project: Process, Policies and Architecture



View the full content.
View Full Document
View Full Document

11 views

Unformatted text preview:

The Digital Library of India Project Process Policies and Architecture Vamshi Ambati1 N Balakrishnan2 Raj Reddy1 Lakshmi Pratha3 and C V Jawahar3 1 3 Carnegie Mellon University PA USA vamshi cmu edu rr cmu edu 2 Indian Institute of Science Bangalore India balki serc iisc ernet in International Institute of Information Technology Hyderabad India lakshmipratha iiit net jawahar iiit net Abstract In this paper we share the experience gained from establishing a process and a supporting architecture for the Digital Library of India DLI project The DLI project was started with a vision of digitizing books and making them available online in a searchable and browseable form The digitization of the books takes place at geographically distributed locations This raises many issues related to policy and collaboration We discuss these problems in detail and present the process and workflow that is established to solve them We also share the architecture of the project that supports the smooth implementation of the process The architecture of the DLI project has been arrived at after considering factors like high performance scalability availability and economy Keywords digital library digital library architecture Digital library project of India Universal digital library DLI process 1 Introduction Digital Libraries have received wide attention in the recent years allowing access to digital information from anywhere across the world They have become widely accepted and even preferred information sources in areas of education science and others Bainbridge Thompson H Witten 2003 The rapid growth of Internet and the increasing interest in development of digital library related technologies and collections McCray Gallagher 2001 Marchionini Maurer 1995 helped accelerate the digitization of printed documents in the past few years With a vision of digitizing a million books by 2008 the Digital Library of India DLI project aims to digitally preserve all the significant literary artistic and scientific works of people and make it freely available to anyone anytime from any corner of the world for education research and also for appreciation by our future generations Ever since its inception in November 2002 operating at three centers the project has been successfully digitizing books which are a dominant store of knowledge and culture We now host close to one tenth of a million books online with about 33 million pages scanned at almost 30 centers across the country The scanning centers include academic institutions of high repute religious and government institutions In such a highly distributed environment establishing a notion of collaborative e ort and distributing discrete chunks of work while maintaining uniform standards becomes a high priority task Traditionally digital libraries work in a closed environment and contain the process information and 1 the content in a local repository Although doing so increases the ease of server management and administration along with simpler resolution of process oriented issues such an isolated set up does not scale up easily or promote collaboration across geographically distributed points of operation This adds unacceptable delays in the implementation of the project In projects like the DLI with such ambitious missions a distributed environment becomes a requisite There are discrete phases and chunks of work which need not be collocated for operation For example the process of scanning books can take place at one place while the image processing and web enabling of the same could occur at a di erent place Also with some planning both these tasks could go on concurrently at di erent places on di erent consignments of books in turn yielding a higher throughput In the DLI project we have established a flexible yet cohesive process that automates the entire workflow in a distributed environment In doing so we have confronted a few problems and issues Sankar et al 2006 starting from the selection of books for digitizing operating and establishing a protocol for being free from e ort duplication producing digital output of good quality and preservation of the digitized book objects for access in a user friendly reliable and highly available manner In this paper we describe the experience gained from establishing a process involving an e cient workflow and e ective policies and deploying a scalable distributed architecture for the Digital Library of India project We suppose that the same can be successfully applied to other similar digital libraries and digitization systems We also attempt at throwing light on some unforeseen problems in the digitization process and issues of collaboration in a distributed environment The rest of the paper is organized as follows In section 2 we give an overview of the project and its organization In section 3 we discuss the problems and challenges that we experienced in the course of the digitization and web enablement of books In section 4 we discuss the process established in the project that has helped us address a few of the aforementioned problems Section 5 discusses the architecture that addresses the issue of reliable web access of digitized content and supports the DLI process and we conclude in section 6 2 Overview of the Project The Digital Library of India project was initiated in the year 2002 with motivations from the Universal Digital Library project1 The project currently digitizes and preserves books though one of the future avenues is to preserve existing digital media of di erent formats like video audio etc The scanning operations and preservation of digital data takes place at di erent centers across India Regional Mega Scanning Center RMSC The RMSCs themselves function as individual organizations with scanning units established at several locations in the region Responsibilities of a RMSC include regulating the processes of procuring or collecting the books distributing across scanning locations maintained by it gathering back the digitized content from the contractors operating at those locations and hosting the same Hence the DLI project is a congregation of RMSCs operating parallely and independently at distributed regions across India The major responsibilities of the management at the DLI are to monitor the progress of the RMSCs and supply the resources necessary for its operation We also have a contractor team which complements the setup at an RMSC The contractor team comprises of a set of


Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view The Digital Library of India Project: Process, Policies and Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Digital Library of India Project: Process, Policies and Architecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?