DOC PREVIEW
UCSC ISM 158 - Information Integration

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Enterprise InformationCentralized versus Distributed?Contrasting Business & Technical InformationThe Guiding PrinciplesScalable Content ProcessingScale out architecture used under cloud information servicesConsiderations in Distributed Information ManagementDimensions of IntegrationEcosystem of integration productsPoints for Discussion in classPoints to ponder at homeWhere to learn moreUpcoming guest lectures in MayQuestions?News PRESENTATIONInformation IntegrationInstructor: Pankaj MehraTeaching Assistant: Raghav GautamLec. 9May 13, 2010ISM 158Enterprise Informationpage 3Centralized versus Distributed?•Distributed systems occur naturally•State of the art does not allow complex queries or deep analysis against distributed information•Centralization may also be favored due to lower costs of infrastructure, license and labor, as well as due to their ability to better enforce tighter integrity constraints and other information management policies•Ultimately, the decision needs to take into account issues of ownership and control–Technology considerations often are secondary; even so, rational rules for resolving these considerations exist, as described in Distributed Computing Economics paperpage 4Contrasting Business & Technical InformationBusinessdomainTechnical domainMetadata scalingData bandwidth scalingSQL schema & queryXML or WS schema & queryFile schema & queryCentralized metadataReal-time informationAd hoc queryInconsistent informationPivotingPivotingData miningSearch federationStructured sourcesDistributed archivesDistributed complex controlsCentral controlCentral archiveStable schemataSchema evolutionUnstructured sourcesHeavy data processingSimple metadata fusionComplex metadataSimpler data fusionETL ETLStreaming A/VVisualizationDashboardsSteeringDeep linguisticspage 5The Guiding Principles•It is a bad idea to address the following as afterthoughts–Scale–Availability–Integrity•The ability to embed function close to data is fundamental to scalable information processing•In order to deliver the best performance/$, systems tend to scale out from technology sweet spot of the day•Redundancy configured in from the start, as well as mechanisms for early detection and isolation of faults•Optimize availability by optimizing recovery–Privacy and security–Compliance / auditability–Retention requirements–Business value–Informationqualitypage 6Scalable Content Processing•Enterprise information is complex•Diversity of information sources and formats–Entail complex integration and processing flows–Metadata generation and indexing–Content indexing•Protection and securitystoragedatacontentconnectorsconnectorsscalable repositoryscalable processinge.g. JCR APIpage 7Smart CellsScalable distributed system of self contained, all-inclusive data repositoriesPrinciplesScale-outFederationIntelligence close to dataPluggable platforms supporting proprietary and 3rd-party storage servicesExamplePlatforms for Information Lifecycle Management servicesScale out architecture used under cloud information servicesSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmartCellSmart Query FabricStorage:Block,File,Object &FragmentContent indexingAttribute indexingSupported protocols and APIspage 8Considerations in Distributed Information Management•Information is distributed across heterogeneous sources and has varied provenanceIntegration•Information management requires information about informationMetadata•Useful information is timely and findableReal-time integration and cachingIndexingSemantic analysisContextpage 9Dimensions of Integrationpage 10Ecosystem of integration products•Metadata–Determines information richness•Service Orientation–Determines protocol richness•Future–Integration as syndication–Integration aaSSQL-based EIISAP, Oracle, CompositeXML-based EIIBEA LiquidData, Mark LogicJSR 170 ECIDayWS-basedSOAMicrosoft,IBMRSS-basedNewsGatorPureEAITibco, SAGMetadataService-orientednessUniformaccessMOSS, AttivioPoints for Discussion in class•Consider a healthcare patient information scenario.–Is it mainly transactional or mainly analytic?–Would you lean toward a distributed (EAI) approach or a centralized one (warehouse)?•Consider a scenario in which a company wants to drill down into the root causes of customer complaints?–Again, centralized or distributed?•Identifying the root cause•Tracking the problem–Would real-time integration become a requirement?Points to ponder at home•Pros of integration–Connecting the dots–Single view of …–Quality control over•Inconsistency•Staleness•Gaps•Cons of integration–Loss of context–Often, read only–Cost–Duplication–Scale–Losing battle?–RiskWhere to learn more•Data Integration: The Relational Logic Approach by Michael Genesereth, Morgan & Claypool Publishers, 2010Upcoming guest lectures in May•Dr. V. Galotra, Oracle–SOA Deep Dive•Rahul Nim, Efficient Frontier–Online marketingQuestions?•NEWS


View Full Document

UCSC ISM 158 - Information Integration

Documents in this Course
NOTES

NOTES

2 pages

NOTES

NOTES

22 pages

Load more
Download Information Integration
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Integration and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Integration 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?