Storage Management

Activate your FREE membership today  |  Log-in

  • Visit other TechTarget ANZ sites: 
Posted
Jun 18, 2006
 |  By:  Jerome M. Wendt

Tech Report: Content-addressed storage preferred for fixed-content storage

Bookmark and Share

Capacity management

There are three primary ways CAS products manage and reduce the amount of data they store: object-based storage, SIS and data deduplication.

CAS vendors that support RAIN and networked storage array architectures store files by saving them as objects. Incoming files are scanned and a hashing algorithm creates a unique identifier for that file, which is stored in the CAS product meta data database used to reference and access that object in the future. This technique, called SIS, reduces the amount of storage. When a file is submitted to the CAS product for storage, the hashing algorithm used to analyze a file will always create the same unique identifier for the file even if some of the file attributes are different. This lets users save storage space because they're not storing multiple instances of the same file.

Before implementing SIS, users need to consider the time it takes the CAS product to generate the unique identifier and check its meta data database to see if that identifier already exists. Searching for a unique identifier may be done quickly during initial deployment, but as the size of the meta data database grows it takes longer to search it.

For the fastest file storage and recovery possible, users should use the latest version of the RAIN OS. EMC, for example, claims that under certain conditions the latest version of Centera's CentraStar OS performs four to five times faster than earlier releases. Another option is to upgrade hardware nodes with faster CPUs and 1 Gigabit Ethernet ports rather than the 100 Mb ports common to first-generation nodes. Upgrading shouldn't be that painful because RAIN nodes may be taken offline and replaced nondisruptively, and different generations of nodes can operate in the same cluster.

Another factor to consider before turning on SIS is the type of file being archived. For certain types of files, such as check images, nothing will be gained by turning on SIS. Conversely, users will see significant savings using SIS when storing e-mail attachments, for example.

Data deduplication and classification

Some CAS products use data deduplication, which breaks files apart, analyzes them at the block level and only stores identical blocks once to minimize the amount of data stored. HP's StorageWorks RISS and Permabit's Permeon Compliance Store include this as part of their software, but users need to turn it on.

NetApp introduced ASIS last March and EMC has announced a partnership with Avamar Technologies Inc. to provide similar functionality for Centera. HP says users will experience a three- to five-fold reduction in total storage using deduplication, but the technology will introduce some performance overhead. NetApp estimates that its filers will experience a 1% to 3% performance hit when ASIS is turned on.

CAS products classify data in several ways, using mostly meta data databases. As files are stored in RAIN architectures, meta data is extracted based on policies provided by the vendor and user. NetApp's filers index files after they're stored, although users can use any data classification engine to index, classify and tag data. NetApp's IS1200 appliance uses Kazeon Systems Inc.'s algorithms to deliver this functionality.

IBM's DR550 classifies data based on policies set previously with its TSM software. TSM then places the data on the correct tier of storage, moves the data to other tiers of storage when appropriate, and deletes the file at the end of its retention period. For this scenario to work, TSM APIs must be on each server.

A problem with all data classification approaches is the need to re-index data if requirements change. Depending on the size of the data store, re-indexing can be a performance-intensive exercise.

CAS cost considerations

Upfront and ongoing costs associated with each CAS product should be considered; as data grows, the hidden costs/savings of these architectures will become more apparent. Purchasers of RAIN products from Archivas, Bycast, EMC, HP and Permabit should expect to pay as little as $7,500 for a 1.5 terabyte (TB) configuration to as much as $350,000 or more for a 50 TB setup.

Permabit is the only vendor with two pricing models: per node or by capacity. The per-node option, also offered by Bycast, EMC and Permabit, will probably be the less-expensive option because capacity on CAS products could scale into the petabytes. Per-node licensing is also more favorable for products such as Bycast's StorageGrid, which support portable media such as tape, which requires fewer nodes to house data.

There are exceptions, of course. Users who anticipate keeping data from different departments or customers on separate nodes may find licensing by total capacity to be cheaper. Users also need to examine what storage optimization features, if any, they'll turn on and if that will control data growth.

In the future, it's likely CAS products will emerge as the preferred means of managing structured and unstructured retention data. The four CAS options -- RAIN, file system, HSM and storage architectures -- all allow users to start small, scale economically and satisfy the data -retention requirements that meet specific applications. But integrating CAS with various applicationss isn't always as easy as sometimes advertised, and reclassifying data after it's stored is a major headache.

About the author: Jerome M. Wendt is a storage analyst specializing in the field of open systems storage and SANs. He has managed storage for small- and large-sized organizations in this capacity.


TechTarget ANZ sites: SearchCIO.com.au | SearchNetworking.com.au | SearchSecurity.com.au | SearchStorage.com.au | SearchVoIP.com.au

WF Online community sites: ElectricalSolutions | ElectronicsOnline | FoodProcessing | InMotionOnline | LabOnline | ProcessOnline | RadioComms | SafetySolutions | SustainabilityMatters | Voice&Data

Copyright © 2010 Westwick-Farrow Pty Ltd. All rights reserved.
About Us | Contact Us | TechTarget