The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific DatasetsAgendaIntroductionSlide 4Data GridSlide 6Data Grid DesignLayered Architecture (from the paper)Core ServicesSlide 10Data Grid ServicesData Grid Services (from loci.cs.utk.edu/dsi/netstore99/docs/presentations/foster-d-slides.pdf )Slide 13Slide 14Slide 15Slide 16Slide 17Higher-Level Data Grid ComponentsSlide 19Slide 20Slide 21Slide 22ConclusionFurther Works01/15/19 1The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury, S.TueckePresented By: Kasturi ChatterjeeAgnostic: Selim Kalayci01/15/19 2AgendaIntroductionData Grid DesignData Grid ServicesHigher-Level Data Grid ComponentsConclusion01/15/19 3IntroductionGrid : Geographically distributed computing resources configured for coordinated useData Grid : Database Architecture for storage and handling huge amount of data supported by a Grid01/15/19 4IntroductionScientific disciplines are data intensive as well as computationally demandingTerabytes and petabytes of dataDiverse Domains and Geographic Distribution of Users and Resources01/15/19 5Data GridIntegrate heterogenous data archives into a distributed data management grid*Identify services for high performance, distributed, data intensive computing*APIs and Components required to implement it efficiently*from globus project slides available at loci.cs.utk.edu/dsi/netstore99/docs/presentations/foster-d-slides.pdf01/15/19 6AgendaIntroductionData Grid DesignData Grid ServicesHigher-Level Data Grid ComponentsConclusion01/15/19 7Data Grid DesignDesign Principles Mechanism Neutrality independent of low-level mechanisms Policy Neutrality design decisions are exposed to users Compatibility with Computational Grid integration of storage and computation Uniformity of Information Infrastructure uniform access to information about resource structure and state01/15/19 8Layered Architecture (from the paper)01/15/19 9Core Services Storage SystemsDPSS : Distributed Parallel Storage SystemHPSS : High Performance Storage System Metadata RepositoryLDAP : Lightweight Directory Access ProtocolMCAT : MetaData Catalogue01/15/19 10AgendaIntroductionData Grid DesignData Grid ServicesHigher-Level Data Grid ComponentsConclusion01/15/19 11Data Grid ServicesData Access Mechanisms for accessing, managing and initiating third-party transfers of dataMetadata Access Mechanisms for accessing and managing information about data01/15/19 12Data Grid Services (from loci.cs.utk.edu/dsi/netstore99/docs/presentations/foster-d-slides.pdf )01/15/19 13Data Grid ServicesStorage Systems and Data Access Storage Systems: provides functions for creating, destroying, writing and manipulating file instances associate a set of properties like name, size and access restrictions with each file instanceEg: A data grid implementation may use SRB to access data01/15/19 14Data Grid ServicesData Access APIs are defined which describes the possible operations on storage systems and file instances API provides standard interface to storage systems like create, delete, open, close, read, write and storage to storage transfer Self-Optimizing capability Uniform Access to heterogeneous Systems01/15/19 15Data Grid ServicesMetadata Service Application Metadata, Replica Metadata and System Configuration Metadata Single interface to access themPros: UniformityCons: Complex Implementation Structured as hierarchical and distributedPros: Scalable, no single failure point, local control01/15/19 16Data Grid ServicesApplication Metadata : metadata describing the information content represented by the file, circumstances under which data was obtained and information to applications to process itReplica Metadata : data used to manage replication of data objectsSystem Configuration Metadata : describes the system i.e. network connectivity, storage systems, usage policy etc.01/15/19 17AgendaIntroductionData Grid DesignData Grid ServicesHigher-Level Data Grid ComponentsConclusion01/15/19 18Higher-Level Data Grid ComponentsReplica Management from I. Foster SlidesCollections contain related filesLogical files describe replicated physical filesServices for managing replicated file instances Create / delete Schedule / manage data transfer Register in the replica catalog Metadata display01/15/19 19Higher-Level Data Grid ComponentsHow Does a Replica Manager Works ? Maintains a repository/catalogue Entries correspond to logical files/file collections Associated with each logical file/collection are one/more physical instance of objects Catalogue contains mapping from logical file to physical instances01/15/19 20Higher-Level Data Grid ComponentsReplica Manager doesn’t do the following : determine when or where replicas are created which replicas are to be used by an application keeps policy separate from replica manager design making it generic01/15/19 21Higher-Level Data Grid ComponentsReplica Selection Process of choosing replica that will optimize a desired performance criterionSelection process may initiate creation of a new replicaIntelligent scheduling to determine appropriate replica, site for (re)computation, etc.01/15/19 22AgendaIntroductionData Grid DesignData Grid ServicesHigher-Level Data Grid ComponentsConclusion01/15/19 23ConclusionImplementation experience led to the adoption of using collection of logical filesImplements computation and data intensive Grid architectureAPIs provide standard interface for various utilitiesReplica Management and Metadata services are provided using LDAP01/15/19 24Further WorksChervenak et al1.Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing :20012. High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies :20013. A Replica Location Grid Service Implementation : 20044. Applying Peer-to-Peer Techniques to Grid Replica Location Services :2006Leanne Guy et al Replica Management in Data Grids in 2002 : addressed Read/Write Replica
View Full Document