Storage Management

Activate your FREE membership today  |  Log-in

  • Visit other TechTarget ANZ sites: 
Posted
Jun 3, 2008
 |  By
Jerome M. Wendt

How to implement data de-duplication

 

The best way to select, implement and integrate data deduplication can vary depending on how the deduplication is performed. Here are some general principles that you can follow in selecting the right deduplicating approach and then integrating it into your environment.

Step 1: Assess your backup environment

What deduplication ratio a company achieves will depend heavily on the following factors:

  • Type of data
  • Change rate of the data
  • Amount of redundant data
  • Type of backup performed (full, incremental or differential)
  • Retention length of the archived or backup data

The challenge most companies have is quickly and effectively gathering this data. Agentless data gathering and information classification tools from Aptare, Asigra, Bocada and Kazeon Systems can assist in performing these assessments while requiring minimal or no changes to your servers in the form of agent deployments.

Step 2: Establish how much you can change your backup environment

Deploying backup software that uses software agents will require installing agents on each server or virtual machine and doing server reboots after it's installed. This approach generally results in faster backup times and higher deduplication ratios than using a data deduplication appliance. However, it can take more time and require many changes to a company's backup environment. Using a data deduplication appliance typically requires no changes to servers, though a company will need to tune its backup software according to if the appliance is configured as a file server or a virtual tape library (VTL).

 

Data de- duplication resource guide

To start, let's get a handle on data de-duplication and how it stacks up versus compression and encryption.

Next: why de-duplication is so hot!

There's also plenty of News around in the de-duplication space

It's also worth noting that Australia has made an unusually large contribution to data de-duplication, as we explain in this tale of how Adelaide's Rocksoft became the source of Quantum's de-duplication technology.

To finish, let's have a look at a de-duplication device at work, as we review Quantum's GoVault.

Step 3: Purchase a scalable storage architecture

The amount of data that a company initially plans to back up and what it actually ends up backing up are usually two very different numbers. A company usually finds deduplication so effective when it starts using it in its backup process that it quickly scales its use and deployment beyond initial intentions, so you should confirm that deduplicating hardware appliances can scale both performance and capacity. You should also verify that the hardware and software deduplication products can provide global deduplication and replication features to maximize duplication's benefits throughout the enterprise, facilitate technology refreshes and/or capacity growth, and efficiently bring in deduplicated data from remote offices.

Step 4: Check the level of integration between backup software and hardware appliances

The level of integration that a hardware appliance has with backup software (or vice versa) can expedite backups and recoveries. For example, ExaGrid Systems. ExaGrid appliances recognize backup streams from CA ARCserve and can better deduplicate data from that backup software than streams from backup software that it doesn't recognize. Enterprise backup software is also starting to better manage disk storage systems so data can be placed on different disk storage systems with different tiers of disk, so they can back up and recover data more quickly short term and then more cost-effectively store it long term.

Step 5: Perform the first backup

The first backup using agent-based deduplication software can potentially be a harrowing experience. It can create a significant amount of overhead on the server and take much longer than normal to complete because it needs to deduplicate all of the data. However, once the first backup is complete, it only needs to back up and deduplicate changed data going forward. Using a hardware appliance, the experience tends to be the opposite. The first backup may occur quickly but backups may slow over time depending on how scalable the hardware appliance is, how much data is changing and how much data growth that a company is experiencing.

About the author: Jerome M. Wendt is lead analyst and president of DCIG Inc.


Advanced data management
Resource Centre

Moonwalk_topic_centre_logo

Moonwalk allows you to manage data by project, making complex datasets easier to express in simple business policies, thus allowing for easier archiving, faster retrieval, and most importantly, the ability to readily monitor and track locations of files and projects.

Moonwalk's software goes beyond Hierarchical Storage Management (HSM) or Information Lifecycle Management (ILM), to instead offer a single tool to develop and implement policies to move our data into the 'active archive' option.

To help you understand Moonwalk's market position, we offer these searchstorage.com.au stories that highlight some of the issues storage administrators face when working with large pools of data.

Information Lifecycle management

Tiered storage

Archiving tools and strategies

Contact us: Email info@moonwalkinc.com or phone:
+61 7 3247 1080

TechTarget ANZ sites: SearchCIO.com.au | SearchNetworking.com.au | SearchSecurity.com.au | SearchStorage.com.au | SearchVoIP.com.au

WF Online community sites: ElectricalSolutions | ElectronicsOnline | FoodProcessing | InMotionOnline | LabOnline | ProcessOnline | RadioComms | SafetySolutions | SustainabilityMatters | Voice&Data

Copyright © 2008 Westwick-Farrow Pty Ltd. All rights reserved.
About Us | Contact Us | TechTarget