Creating Trading Networks"Creating Trading Networks of Digital Archives” - Proceedings of JCDL 2001Focus of ResearchOutside the ScopeDigital CollectionArchival Storage CalculationData ReliabilityCalculation of Global ReliabilitySlide 9Mean Time to FailureClustersBuilding the Trading NetworkDeedsConducting TradesWeighted TradingDeed Trading AlgorithmSimulation ResultsTrading Policy ResultsSame Size vs. Weighted StrategyReliability EstimatesCluster SizesWeaknesses of the SystemFurther WorkSlide 24Creating Trading NetworksThom LutkenhouseDigital PreservationMarch 25, 2004"Creating Trading Networks of Digital Archives”- Proceedings of JCDL 2001Paper Covered:By Brian Cooper and Hector [email protected] [email protected] of Research•“Preserving the Bits”•Developing P2P trading networks considering:•Varying reliability rates•Political affiliations between archives•Different levels of archival investment•Estimation of global failure ratesOutside the Scope•Legal issues•Format and platform changes•Access issues•ProvenanceDigital Collection“A set of related digital material that is managed by an archive site. Examples include issues of a digital journal, geographic information service data, or a collection of technical reports. . . we consider the collection to be a single unit for the purposes of replication.”Archival Storage CalculationNPtotal (Public Storage to Trade)Total Storage SpaceBytes of local digital collectionPtotal = F X N - NData Reliability•Global data reliability: the probability that no collection owned by any site is lost•Local data reliability: the probability that no collection owned by a particular site is lostCalculation of Global Reliability1 32SITE A133 2SITE BSITE CEach site represents an archive, each number a digital collection stored in that archive.Here we assume we can accurately estimate local data reliability, and for this example assume each site has a reliability of 0.9. (Each site has 10% chance of data loss)Calculation of Global Reliability1 32SITE A133 2SITE BSITE CProbability of losing collection 1:0.1 * 0.9 * 0.1 = 0.009Probability of losing collection 2:0.9 * 0.1 * 0.1 = 0.009Probability of losing collection 3:0.1 * 0.1 * 0.1 = .001Sum of above = .0191 - 0.019 = 0.981 Global Data ReliabilityMean Time to FailureGiven data reliability over a certain interval we can calculate Mean Time to Failure (MTTF). This is the expected number of years before data loss. MTTF is the principal metric used to judge the effectiveness of the simulated trading strategies.Clusters•Sites that have agreed to form partnerships for political, social or economic reasons.•e.g., all libraries in a state university systemBuilding the Trading Network•Determine the number of sites in the network•Estimate the reliability of each site•Past behavior of site•Components of site’s storage mechanism•Reputation of site or institutionDeeds“A deed represents the right of a local site to use space at a remote site. Deeds can be used to store collections, kept for future use, transferred to other sites that need them or split into smaller deeds.”Conducting TradesTrades are executed by means of a Deed Trading algorithm that is run at each participating site in accordance with its local Trading Strategy.The Trading Strategy determines the order in which other sites will be contacted to initiate a trade. Strategies may include:Best FitWorst FitClustering ( trading with sites you’ve traded with before )Best ReliabilityWeighted TradingAvailable space is weighted by reliability in determining fair trade.Example:100 GB X 0.75 Reliability = 75 GB X 1.00 ReliabilityDeed Trading AlgorithmDetermine size of deed needed and number of copies to makeChoose a site to trade with based on Trading StrategyHave deed?Big enough?Copy CollectionEnough copies?Seek Deed for Remaining Needed SpaceCheck Available Space at Trade SiteEnough space?Check if Other Sites Have Deeds for Trade SiteDeeds with enough space available?Check if Adequate Local Space for TradeEnough space?YesNoNoYesNoYesTradeYesNoYesNoDoneYesNoSimulation ResultsTrading Policy Results“. . .it is always best for the high reliability sites to use the closest reliable strategy, and for the low reliability sites to use clustering.”Same Size vs. Weighted StrategyReliability Estimates“. . .when estimates are innaccurate by 30 percent, archives using closest reliability can only achieve a local MTTF of 200 years, versus 500 in the ideal case.”Cluster SizesWeaknesses of the System•Class warfare:•High reliability sites realize maximum performance when they trade exclusively amongst themselves, but the system as a whole performs best when site reliability is ignored•Accurate estimation of site reliability:•Difficult to account for all factors: hardware, bankruptcy, natural disasters, war, terrorism, interdimensional crossripsFurther Work•Distributed access services•Additional compensation means:•Money•Processing power•Accomodating more dynamic collectionsThank
View Full Document