In the storage business, data de-duplication is all the rage. Customers are clamoring to cash in on the savings, because it offers a number of improvements over traditional storage for backups. But with those benefits comes a confusing set of questions, the key one being: How do we choose the best de-dupe technology? In answering that question, it's important not to jump ahead to focus on specific products -- by first choosing product type, whether it be host-based, VTL-based or NAS-based, you can simplify the decision process.
Here's how they break down.
Host-based data de-duplication
Host-based de-duplication requires the backup client to do a lot of the de-dupe work. In many cases, that's not a problem, especially when the client is not CPU-bound. Host-based de-dupe really helps when backup bandwidth is constrained by small wide area network (WAN) pipes or consolidated virtual servers.
Host-based data de-duplication solutions usually require you to replace traditional backup software with the de-dupe backup software, so before you recommend such a change, make sure that the benefits are significant enough.
Remote office backups to the corporate site will benefit from host-based de-duplication because it eliminates most or all of the backup hardware located at the remote site and optimises the network bandwidth required to centralise backups to corporate data centers. VMware backups benefit from host-based de-duplication by limiting the network bandwidth required to back up multiple guest machines concurrently.
Virtual tape library (VTL) data de-duplication
De-duped virtual tape libraries (VTLs) work well when the backups are localised to the data center and/or bandwidth between the client and backup storage is not an issue. Naturally, many customers will want to take advantage of de-duplication in their existing or planned virtual tape infrastructure.
VTLs are already very common in mid-sized and large enterprises and consume a significant part of many companies' overall storage budget. De-duping at the VTL should be simple for customers because almost all backup software platforms support VTLs. In addition, de-duped VTLs are a good fit for disaster recovery replication and when the customer wants to replace tape for primary backups. Given the increased efficiency and de-duped VTL-to-VTL replication, there may finally be an opportunity to show real ROI for backup to disk instead of tape.
Primary network-attached storage (NAS) data de-duplication
VTLs introduce a lot of the same challenges that physical tape presents, such as tape contention, poor cartridge utilisation and intolerance to high storage area network (SAN) latencies. In some cases, customers want the benefits of target hardware-based de-duplication without the complexity and limitations of tape. In these cases, de-duped NAS file systems may be the perfect remedy. De-duped NAS storage has some impressive cost advantages because it doesn't require SAN connections or VTL licensing in the backup software. In some cases, the de-duped NAS storage can be used for more than just backups, such as highly duplicate archive data where throughput is less important than space savings.
