DOC PREVIEW
UCSC ISM 158 - DATA DEDUPLICATION

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

DATA DEDUPLICATIONWhat is data deduplication?Slide 3Deduplication MethodsBenefits of Data DeduplicationProblems with Data DeduplicationHow to choose a data deduplication solution?ReferencesDATA DEDUPLICATIONBy: Lily ContrerasApril 15, 2010What is data deduplication?Often called intelligent compression or single instance storage.In the deduplication process duplicate data is deleted leaving only one copy of the data to be stored.Data deduplication turns the incoming data into segments, uniquely identifies the data segments, and compares these segments to the data that has already been stored. If the incoming data is new data then it is stored on disk, but if it is a duplicate of what has already been stored then it is not stored again and a reference is created to it.“Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy.”What is data deduplication?Data deduplication operates at different levels such as the file, block, and bit level.If a file is updated, only the changed data is saved. For example, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved. The changes will not create an entirely new file. This behavior makes block and bit deduplication far more efficient. Deduplication works by comparing chunks of data to detect duplicates. Each chunk of data is assigned a unique identification calculated by the software, typically using cryptographic hash functions. When a new hash number is created it is compared with the index of other existing hash numbers. If that hash number is already in the index then the data is considered a duplicate and does not need to be stored again. Otherwise the new hash number is added to the index and the new data is stored.Deduplication MethodsIn-line deduplication is the most efficient and economic method.Hash calculations are created as the data is entered in real time.If the target device identifies a block that has already been stored then it simply references to the existing block.An advantage that in-line deduplication has over post-process deduplication is that it requires less storage as data is not duplicated.Inline deduplication significantly reduces the raw disk capacity needed in the system since the full, not-yet-deduplicated data set is never written to disk. “It optimizes time-to-DR (disaster recovery) far beyond all other methods since it does not need to wait to absorb the entire data set and then deduplicate it before it can begin replicating to the remote site.” However, “because hash calculations and lookups takes so long, it can mean that the data ingestion can be slower thereby reducing the backup throughput of the device.” Post-process deduplication first stores new data on the storage device which is later analyzed for deduplication.One of its advantages is that it does not need to wait for hash calculations and lookup to be completed before storing the data.However one of the problems with post-process deduplication is the fact that it may unnecessarily store duplicate data for a short period of time which can be big problem if storage capacity is near its limit.Perhaps the major drawback is the inability to predict when the process shall be completed.Benefits of Data DeduplicationEliminates redundant data.Drives down cost.Improves backup and recovery service levels.Changes the economics of disk versus tape.Reduces carbon footprint.Problems with Data DeduplicationHash collisionsIntensive computation power requiredEffect of compression Effect of encryptionHow to choose a data deduplication solution?Consider the broader implications of deduplication. Think about how deduplication can be used to eliminate tape in your environment.Data created by humans dedupes well but data that is created by computers does not dedupe well.Compare multiple products.Ensure ease of integration into your existing environment.Referenceshttp://searchdatabackup.techtarget.com/tip/0,289483,sid187_gci1360643,00.html http://www.datadomain.com/resources/faq.html http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1248105,00.html http://forms.datadomain.com/go/datadomain/eNL_WP_IDCBR_10


View Full Document

UCSC ISM 158 - DATA DEDUPLICATION

Documents in this Course
NOTES

NOTES

2 pages

NOTES

NOTES

22 pages

Load more
Download DATA DEDUPLICATION
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DATA DEDUPLICATION and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DATA DEDUPLICATION 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?