At its most basic, the term "cloud storage" describes data storage that is made available as a service via a network. But that definition is so broad that it can encompass everything from Internet-based services that store a consumer's email and digital photographs to a major corporation's backup data.
This learning guide will supply readers with a cloud storage case study, explain the difference between cloud storage and cloud service, and provide a look at the future of cloud storage.
Table of contents:
>> Cloud storage criteria
>> A look at Amazon S3
>> Cloud storage case study
>> Web service APIs
>> Cloud storage vs. cloud service
>> Build your own cloud infrastructure
>> The future of cloud storage
There could be hundreds of cloud storage offerings or just a handful, depending on how broadly or narrowly one defines the term. Either way, the numbers are expected to grow this year, as cloud storage continues its skyward trajectory.
"The problem with cloud storage is it's so much in its infancy that the term is wide open to interpretation," said George Crump, founder of consulting firm Storage Switzerland.
In the interest of clarification, industry analysts who deal with corporate IT professionals have been refining the definition of cloud storage to incorporate the following characteristics:
--Not tied to a specific geographic location
--Based on commodity components
--Billed on a usage basis, as in 15 cents per gigabyte
Software as a Service (SaaS) offerings, such as Salesforce.com and Google Apps, would not qualify as cloud storage, even though they do afford users the ability to store data. "If you're a developer, you can't write to their storage infrastructure, so we don't look at that as a storage utility," said Adam Couture, an analyst at Gartner Inc.
Couture counts only a couple of players in the cloud storage space: Amazon.com., Nirvanix. and Rackspace Hosting's Mosso cloud division. But he said the list will grow this year with the launch of new offerings from several companies, including at least three "big names" planning services similar to Amazon's Simple Storage Service (S3). Australia has several cloud storage players, listed here.
Amazon's service, based on the infrastructure the company itself uses, has experienced substantial growth since launching nearly three years ago. The company claims S3 held 800,000 objects at the end of the second quarter in 2006, but grew to 5 billion within a year and mushroomed to 40 billion by the end of 2008.
A range of customers exist, from consumers saving personal data to Web startups offering online services and larger companies wanting to back up databases or store archival data. The New York Times, for instance, uses S3 to store and deliver articles from its historical archives. Another Amazon customer, NASDAQ, stores historical market data, making it available to traders through its Market Replay tool.
One of the most enticing aspects is cost. S3 pricing starts at $US15 cents per gigabyte per month of storage used. The cost decreases as customers hit the 50 TB threshold.
"We think about it as storage for the Internet," said Alyssa Henry, general manager of Amazon S3. "It's storage that's accessible from anywhere via http request and pay as you go, no upfront commitment, use as much or as little as you want."
Cloud storage is not a good choice for high-transaction databases or temporary storage. But it can make sense for a company with unpredictable storage demands, a need for an inexpensive storage tier or a low-cost, long-term archive.
In particular, cloud storage is a no-brainer for Web startups such as Behance , which maintains sites for creative professionals to showcase and share their work and ideas. One of its sites, called Action Method, allows users to upload and share content, including documents, audio files and movies.
"When we started building Action Method, we didn't know what the uptake was going to be, so we needed to make sure we had enough storage to accommodate any number of users," said Behance chief technology officer (CTO) Chris Henry.
Mosso Cloud Files from Rackspace, the vendor that hosts Behance's Web sites, can supply infinite scale at far less expense than a SAN or NAS would cost. "With cloud storage," Henry said, "you never run out."
The hardest part was writing an API to access Cloud Files, since Behance jumped the gun before Rackspace made available its Representational State Transfer (REST) API. Henry said it would have been easier if he had waited, since the Rackspace Web services API worked just as well.
Still, "integrating any cloud service into an application requires some degree of work," Henry noted. He advises his peers to build into their applications early on the ability to store files in the cloud, so they can take full advantage of a system with no storage constraints.
Amazon's S3 also requires customers, whether Web startups or enterprises, to do some development work. The company does no contract development but provides lists of partners that have built applications on top of S3, which supports both REST and SOAP interfaces, according to Henry.
One cloud storage vendor that takes a different approach is Nirvanix. Its CloudNAS software, which is free to customers with contracts of 2 TB or more, removes the need to develop to an API. CloudNAS customers can turn any Linux or Windows server into a virtual NAS gateway to the Nirvanix Storage Delivery Network (SDN). The software permits access to the SDN offsite storage through standard storage protocols, such as NFS, CIFS and FTP.
Another distinguishing feature is the Internet media file system that Nirvanix developed to let customers manage their accounts. Nirvanix's principal focus is the enterprise that needs to store unstructured data such as video or audio files, medical images or engineering designs -- "all those large pieces of information that don't fit well in the traditional SAN or NAS systems," said CEO Jim Zierick.
Its SDN has storage nodes at co-location facilities in California, Texas, New Jersey, Germany and Japan to cut down on latency issues, one of the knocks on Internet-based storage.
Rackspace's Mosso takes the concept a step further, integrating its cloud offering with a content delivery network (CDN) through its partnership with Limelight Networks Inc. Users access content from the closest end point on the CDN.
"We've had plenty of customers that use Cloud Files specifically for the ease with which that can take advantage of a CDN," said E.J. Johnson, the lead architect of Cloud Files. Noting that Amazon offers a CDN service called CloudFront, which can be purchased separately from S3, Johnson added, "I think the CDN aspect is now tending to be inclusive if you think cloud storage."
For Rick Villars, an analyst at IDC, content delivery starts to cross into the territory of cloud service. Also falling into that gray area are a raft of online data backup services such as Carbonite Online Backup, EMC's Mozy (through its recently formed Decho), Iron Mountain Digital, Seagate Technology's EVault (through its i365 company) and Symantec's Online Backup, with files copied to the Symantec Protection Network.
"In addition, IBM Corp. this week announced an as yet unnamed service that will give users of its Tivoli Continuous Data Protection for Files product the option to back up information to its private cloud-based storage. The offering, due by the end of March, will incorporate technology that IBM acquired through its purchase of Digital Arsenal Solutions a year ago.
"Are things like Mozy cloud storage? Mozy is offering an advanced service -- this backup service -- which is a very good infrastructure service. There is a cloud storage facility that the application itself is leveraging," said Villars. "Mozy is a service built on top of cloud storage."
In his estimation, such a cloud service doesn't constitute cloud storage per se, when the storage is a byproduct of another service.
Mozy, for its part, doesn't classify itself as cloud storage either. "It's not storage as a service; it's backup as a service," said Devin Knighton, Decho's PR director.
EMC's real cloud storage play is infrastructure, or what it calls "cloud-optimized storage." Its new Atmos product lets corporate IT departments and large-scale Internet or telecommunications service providers build their own cloud infrastructure.
Jon Martin, director of product management in EMC's cloud infrastructure group, said cloud-optimized storage is characterized by massive scale -- to petabytes and beyond -- policy-based management and operational efficiency.
"One of the fundamental things we went after is this idea that storage technologies are no longer confined and consumed within a single data center," Martin said, noting that the Atmos-based infrastructure is designed to have footprints around the world.
Other vendors in the cloud infrastructure space include Bycast, Cleversafe, Ibrix and ParaScale.
Such offerings may hold special appeal for corporations that want to take advantage of cloud technology yet have concerns about security. A company has the option to build the cloud storage infrastructure and leave it behind the firewall in a private cloud.
Terri McClure, an analyst at Enterprise Strategy Group, predicts that private clouds will spring up, the same way networks evolved from a small token ring inside a company to virtual private networks (VPNs) and IP networks. "It's going to take a long time for people to get comfortable to just throw everything out there into the ether" in a public cloud, she said.
Yet analysts expect cloud storage to grow during the next couple of years.
"The big thing this year will be the entrance of traditional IT suppliers, the likes of the IBMs, the [Hewlett-Packards] HPs, perhaps the Microsofts. They will become more explicit about their offerings and how they're going to integrate them into their other product portfolios," wrote IDC's Villars in an e-mail.
Published reports also continue to stoke discussion on a possible new entry from Google. But it won't just be major vendors capitalizing on the hot trend.
Vaultscape launched a storage cloud specifically designed to handle archival storage for enterprise companies that have large amounts of data. The company's veteran team includes founder and CTO Chris Williams, who was formerly with Nirvanix, and scientific advisor Dr. Aloke Guha, who founded Copan Systems Inc. and developed its file system and architecture.
Vaultscape's differentiators include a self-healing file system and "multi-master architecture," with two independent vaults that are available at all times. Data is replicated to two different geographic locations, but it is not done through a single core, so there is no single point of failure, according to Michael Witz, the company's co-founder and director.
"What Vaultscape is showing is how the industry is maturing so there are storage clouds focused on specific needs within the large storage world," Witz said. "It's not a one-size-fits-all world."
Storage Switzerland's Crump predicts the next two years will be good for cloud storage vendors. The challenge for potential customers will be sifting through the many choices that will flood the market.
"Cloud storage should be one of those technologies that benefits from a down economy," Crump said. "Buying your storage a gigabyte at a time as opposed to a complete system may make a lot of sense for companies with very tight budgets."