Unformatted text preview:

Science In An Exponential WorldEvolving ScienceExponential World of DataThe ChallengesPublishing DataMaking DiscoveriesData Access is Hitting a Wall FTP and GREP are not adequateNext-Generation Data AnalysisOur E-Science ProjectsWhy Is Astronomy Special?Features of the SDSSThe Imaging SurveyThe Spectroscopic SurveySkyServerThe SkyServer ExperienceSkyServer TrafficPublic Data Release VersionsSpatial Information For UsersSpatial Queries In SQLThings Can Get ComplexSimulationsTrendsExploration Of TurbulenceWireless Sensor NetworksCurrent Sensor DatabaseThe Big PictureSummarySlide 28Slide 29Slide 30Science In An Exponential WorldScience In An Exponential WorldAlexander Szalay, JHUJim Gray, Microsoft ReserachAlexander Szalay, JHUJim Gray, Microsoft ReserachEvolving ScienceEvolving ScienceThousand years ago:Science was empiricalDescribing natural phenomenaLast few hundred years:Theoretical branchUsing models, generalizationsLast few decades:A computational branchSimulating complex phenomenaToday: Data exploration (e-science)Synthesizing theory, experiment and computation with advanced data management and statistics new algorithms!Thousand years ago:Science was empiricalDescribing natural phenomenaLast few hundred years:Theoretical branchUsing models, generalizationsLast few decades:A computational branchSimulating complex phenomenaToday: Data exploration (e-science)Synthesizing theory, experiment and computation with advanced data management and statistics new algorithms!222.34acGaa222.34acGaaExponential World of DataExponential World of DataAstronomers have a few hundred TB now1 pixel (byte) / sq arc second ~ 4TBMulti-spectral, temporal, … → 1PB They mine it looking fornew (kinds of) objects or more of interesting ones (quasars), density variations in multi-D space, spatial and parametric correlationsData doubles every yearSame access for everyoneAstronomers have a few hundred TB now1 pixel (byte) / sq arc second ~ 4TBMulti-spectral, temporal, … → 1PB They mine it looking fornew (kinds of) objects or more of interesting ones (quasars), density variations in multi-D space, spatial and parametric correlationsData doubles every yearSame access for everyoneThe ChallengesThe ChallengesDataCollectionDiscoveryand AnalysisPublishingExponential data growth: Distributed collections Soon PetabytesNew analysis paradigm: Data federations, Move analysis to dataNew publishing paradigm: Scientists are publishers and CuratorsPublishing DataPublishing DataExponential growthProjects last at least 3-5 yearsData sent upwards only at the end of the projectData will never be centralizedMore responsibility on projectsBecoming Publishers and CuratorsData will reside with projectsAnalyses must be close to the dataExponential growthProjects last at least 3-5 yearsData sent upwards only at the end of the projectData will never be centralizedMore responsibility on projectsBecoming Publishers and CuratorsData will reside with projectsAnalyses must be close to the dataRolesAuthorsPublishersCuratorsConsumersTraditionalScientistsJournalsLibrariesScientistsEmergingCollaborationsProject www siteBigger ArchivesScientistsMaking DiscoveriesMaking DiscoveriesWhere are discoveries made?At the edges and boundariesGoing deeper, collecting more data,using more dimensionsMetcalfe’s lawUtility of computer networks grows as the number of possible connections: O(N2)Federating dataFederation of N archives has utility O(N2) Possibilities for new discoveriesgrow as O(N2)Where are discoveries made?At the edges and boundariesGoing deeper, collecting more data,using more dimensionsMetcalfe’s lawUtility of computer networks grows as the number of possible connections: O(N2)Federating dataFederation of N archives has utility O(N2) Possibilities for new discoveriesgrow as O(N2)Data Access is Hitting a WallFTP and GREP are not adequateData Access is Hitting a WallFTP and GREP are not adequateYou can GREP 1 MB in a secondYou can GREP 1 MB in a secondYou can GREP 1 GB in a minute You can GREP 1 GB in a minute You can GREP 1 TB in 2 daysYou can GREP 1 TB in 2 daysYou can GREP 1 PB in 3 yearsYou can GREP 1 PB in 3 yearsOh!, and 1PB ~4,000 disksOh!, and 1PB ~4,000 disksAt some point you need At some point you need indicesindices to limit search to limit searchparallelparallel data search and analysis data search and analysisThis is where This is where databasesdatabases can help can helpIf there is too much data to move around,If there is too much data to move around,take the analysis to the data!take the analysis to the data!Do all data manipulations at databaseDo all data manipulations at databaseBuild custom procedures and functions in the databaseBuild custom procedures and functions in the databaseYou can GREP 1 MB in a secondYou can GREP 1 MB in a secondYou can GREP 1 GB in a minute You can GREP 1 GB in a minute You can GREP 1 TB in 2 daysYou can GREP 1 TB in 2 daysYou can GREP 1 PB in 3 yearsYou can GREP 1 PB in 3 yearsOh!, and 1PB ~4,000 disksOh!, and 1PB ~4,000 disksAt some point you need At some point you need indicesindices to limit search to limit searchparallelparallel data search and analysis data search and analysisThis is where This is where databasesdatabases can help can helpIf there is too much data to move around,If there is too much data to move around,take the analysis to the data!take the analysis to the data!Do all data manipulations at databaseDo all data manipulations at databaseBuild custom procedures and functions in the databaseBuild custom procedures and functions in the databaseYou can FTP 1 MB in 1 secYou can FTP 1 MB in 1 secYou can FTP 1 GB / min You can FTP 1 GB / min (= 1 $/GB)(= 1 $/GB)… … 2 days and 1K$2 days and 1K$… … 3 years and 1M$3 years and 1M$You can FTP 1 MB in 1 secYou can FTP 1 MB in 1 secYou can FTP 1 GB / min You can FTP 1 GB / min (= 1 $/GB)(= 1 $/GB)… … 2 days and 1K$2 days and 1K$… … 3 years and 1M$3 years and 1M$Next-Generation Data AnalysisNext-Generation Data AnalysisLooking forNeedles in haystacks – the Higgs particleHaystacks: Dark matter, Dark energyNeedles are easier than haystacks‘Optimal’ statistics have poor scalingCorrelation functions are N2, likelihoodtechniques N3For large data sets main errors are not statisticalAs data and computers grow with Moore’s Law, we can only keep up with N logNTake cost of computation


View Full Document

U of M CSCI 8715 - Science In An Exponential World

Documents in this Course
Load more
Download Science In An Exponential World
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Science In An Exponential World and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Science In An Exponential World 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?