Storage Management

Activate your FREE membership today  |  Log-in

  • Visit other TechTarget ANZ sites: 
Posted
Jun 5, 2008
 |  By
Martha Young

Five things do to before a data de-duplication project

 

Data de-duplication is as much a business consideration as it is a technical concern. From a business perspective, de-duplicating data adds value by improving in-line performance and data integrity, adding value and intelligence to a business's intellectual property; reducing the amount of time required for backup and recovery, an important consideration for businesses looking at business continuity and disaster recovery solutions; and reducing the cost associated with physical storage, including hardware acquisition, management and administration, and energy consumption. With technology budgets coming under intense scrutiny, data de-duplication is an obvious area worth investing in and implementing for a near-term return on investment.

There are several considerations to take into account when first investigating data de-duplication options. Here are the questions to be explored.

What types of files need to be stored?

In today's business world, users are generating vast amounts of intellectual property across a wide variety of mediums. Firms need to address the unique file storage requirements for voice, video, data, electronic mail, instant messaging, mobile computing and other types of files. File type is important in the data de-duplication equation because it can indicate differences in file size. For instance, a streaming video file would require substantially more storage and, consequently, bandwidth to transfer to storage than email documents. If a service provider is supporting a lot of video, a localised solution will make more economic sense.

How long do the files need to be stored?

The answer to this question rests within the regulations you need to comply with. A mountain of regulations govern data backup, recovery, accessibility and security. Each regulation has its own framework and objectives that you must be able to meet. If all of the varieties of communication need to be stored in excess of 50 years, then data de-duplication is mandatory, if only from a manageability and retrieval perspective.

http://searchstoragechannel.techtarget.com/digitalguide/images/Misc/check.gifWhere will data de-duplication be conducted?

There are only two places where de-duplication can be conducted: at the source or in a storage appliance. Data de-duplication at the source offers the key benefits of reducing the amount of disk space neede-d to store the backups and reducing the impact on network bandwidth required to back up a given set of data. The drawback to de-duplication at the source is the impact on the server. It takes a significant number of compute cycles on each server.

Some companies have opted to transfer the compute cycle requirements to a storage appliance and conduct their data de-duplication at the appliance. This eliminates the agent footprint on the storage server and CPU cycle impact, but it does add another device or set of devices to the network that will need to be monitored, maintained and managed.

When deciding where de-duplication should take place, it's important to consider the geographical distribution of the company. For a customer with numerous branch offices, it makes economic sense to de-duplicate on a local level and reduce the overall impact on the WAN. For a customer that leverages a data center, de-duplicating within an appliance makes sense since it allows you to continue using existing backup methods and procedures, reducing the server performance impact.

Which de-duplication approach is preferred: software-based or hardware-based?

Data de-duplication can be performed using either a software-based solution or a hardware-based solution. A software-based solution enables companies to eliminate data redundancy directly at the source. As noted, a software-based solution does carry the burden of installing an agent on each server, as well as a substantial CPU cycle impact. Software-based solutions are relatively inexpensive to deploy compared with hardware solutions, but they do require ongoing maintenance to keep the clients and agents up to date. A software-based solution would be ideal in small and medium-sised businesses (SMBs), as well as within large enterprises that are geographically distributed.

De-duplication appliances, on the other hand, are ideal for a data center environment. An appliance solution offloads the transactional processing and subsequent CPU impact of the server. De-duplication appliances have a reputation of high performance and scalability, but companies considering using an appliance-based solution need to consider the bigger-picture impact of bandwidth utilisation as well as increased network complexity. A hardware-based data de-duplication solution is optimised for the data center environment: In addition to offloading server CPU cycles, an appliance in the data center can be integrated with other storage platforms to maximise storage usage.

Will data be encrypted and, if so, when?

When it comes to encryption, compression and data de-duplication, the order of execution is critical. Compression eliminates redundancy in files (thereby reducing file size). De-duplication eliminates redundant files. Encryption converts data into a random data stream. If a company encrypts its data prior to transmission, it may become impossible to compress or de-duplicate it, which would unnecessarily inflate the amount of storage required, as well as the associated costs. To optimise your storage infrastructure, advise them to compress, de-duplicate and then encrypt their files. By following this order of operation, it becomes clear that compression and de-duplication must take place at the server, then encrypted prior to being transmitted.

As companies seek to achieve data storage and retrieval regulatory compliance at the lowest possible cost, these five questions should be addressed during the data de-duplication decision process. And once a solution is chosen, evaluate whether the implementation will meet your goals and objectives.

About the author
Martha Young is co-founder and CEO of Nova Amber LLC, a business consulting company specialising in business process virtualisation.

 


TechTarget ANZ sites: SearchCIO.com.au | SearchNetworking.com.au | SearchSecurity.com.au | SearchStorage.com.au | SearchVoIP.com.au

WF Online community sites: ElectricalSolutions | ElectronicsOnline | FoodProcessing | InMotionOnline | LabOnline | ProcessOnline | RadioComms | SafetySolutions | SustainabilityMatters | Voice&Data

Copyright © 2008 Westwick-Farrow Pty Ltd. All rights reserved.
About Us | Contact Us | TechTarget