
On July 19, 2010,
Rackspace led the announcement of
OpenStack, with a goal of creating an open source cloud software solution for use on industry-standard hardware. The initial releases contemplate solutions for both
cloud compute and
object storage. While these are the first two releases, they are separate offerings.
Remember, cloud storage is not just the storage target for cloud computing, it is one potential storage target for cloud computing, and is in and of itself a stand alone cloud offering of programmable storage.Now, I have purposely used a term from the clothing industry,
"off the rack", to spend a moment looking at a framework for evaluating the opportunities this may present. With dress shirts, you can buy off the rack, semi custom, or custom, each with a unique value proposition based on fit, choice and cost. Interestingly enough, this may be a good lens through which to consider the possibilities of OpenStack, and in particular,
OpenStack Object Storage.
Rackspace has made no secret of its motivations for leading this initiative, and its desire to focus on
"fanatical" service as it's key differentiator versus the fundamental technology on which the service is based. Fair enough, and so the question becomes, is the rapidly emerging and immature cloud marketplace already "mature" enough to seek homeostasis? (Homeostasis is the property of a system, either open or closed, that regulates its internal environment and tends to maintain a stable, constant condition.) Have enough models and innovations, from startups, academia, open source movements and large tech companies, been tested in the marketplace to the extent that we can already race to the common denominator? Perhaps now is a good time to start, as long as you are willing to acknowledge that the desired results are a good ways off.
Before we jump off into "Off the Rack" software, a quick look back at open source is helpful. For more reading on the open source software industry a good introduction is
The Cathedral and the Bazaar. Six things are particularly interesting:
- An open source alternative can emerge as a follow on to a successful commercial technology and can become pervasive versus the commercial offerings it succeeded (LINUX versus UNIX is the reference case here).
- A second result of this approach can also end up with a big success, although in more of a niche than a pervasive replace for the earlier commercial offerings (MySQL versus Oracle, IBM and Microsoft in the relational data base space).
- An open source effort can also emerge earlier in a technology cycle and come of age as a pervasive solution (Apache Web Server comes to mind here).
- Open source generally requires very careful cultivation of the community of developers, with active interest by academia (and partnering with NASA is part of the formula here). Commercially sponsored open source efforts are becoming more common, although it as of yet has not been proven as the typical "breeding ground" for most great open source successes. Eucalyptus, with its roots at University of California Santa Barbara, seems to be a more traditional route.
- Open source is not necessarily reflective of rapid commercial opportunities for success. Eucalyptus is obviously beginning to maneuver towards a repeat of the commercialization model. OpenStack is taking the approach most favored by other open source successes like Apache. A couple of good reads here are this article from BusinessWeek and this. See also Derrick Harris' post over at GigaOm.
- There are also hundreds of thousands of open source projects that had mixed success or languished altogether. A quick look at SourceForge (an open source project hosting site) shows nearly a quarter million hosted projects. How many of these have languished or had little impact on the market.
So, the first issue is that there will exist for some time to come a real question as to the adoption potential of OpenStack. I believe that adoption is driven by applicability to need. In a moment we will address a serious issue which OpenStack Object Storage must overcome to be successful, at best, and at worst, will confine it to a niche market. My views are very much directed at the Object Storage offering, versus the compute offering, which I believe exists in a different space and as a different type of solution. With this backdrop, let's have a look at the cloud storage marketplace today, and use the analogy of off the rack, semi custom and custom:
- Off the Rack: implement as is, one size fits all, each with unique approaches for performance, scalability, bit integrity, may or may not provide geo services.
- Semi Custom: Select from storage types (DAS, SAN, NAS, JBOD), shared or distributed file systems and object systems, mix and match storage for different SLA and cost/usage patterns on the same infrastructure, multiple APIs, meta data and catalog abstracted from storage layer, geo services.
- Custom: Generally a service only offering and not available as deployable infrastructure, specifics will vary widely based on service provider offering strategy.
Infrastructure |
Type |
Comments |
Eucalyptus |
Off the Rack |
Limited S3 APIs |
OpenStack |
Off the Rack |
CloudFiles APIs |
Scality |
Off the Rack |
S3 APIs |
Mezeo |
Semi Custom |
Mezeo Cloud Storage Platform API and Interoperability API |
NetApp |
Off the Rack |
Bycast APIs, NetApp storage |
EMC Atmos |
Off the Rack |
Atmos ReST APIs, EMC storage |
Service |
Type |
Comments |
Amazon S3 |
Custom |
S3 APIs |
Microsoft Azure |
Custom |
Windows centric |
Rackspace |
Off the Rack |
Is the basis for OpenStack |
Nirvanix |
Custom |
SOAP APIs, multi node |
Google |
Custom |
Offers S3 APIs |
AT&T Synaptic |
Off the Rack |
Based on EMC Atmos |
OpSource, SoftLayer, Layered Tech and others |
Custom |
Based on Mezeo |
As you can see from the summary above, there exist as many views of what constitutes either a cloud storage service or a desirable cloud storage deployable infrastructure as there are service providers and vendors. Note that a semi custom infrastructure results in a "custom" service as implemented. "Off the rack" results in very similar services by those who utilize the same infrastructure unless they make their own major additions. Any offering can be differentiated by service, and the degree and quality of service is critical to customer satisfaction and plays a strong role in value creation.
The OpenStack announcement as it regards Object Store and its approach to cloud storage seems to view cloud storage infrastructure as highly akin to an operating system (or at least a "hypervisor") and more similar to a selection of LINUX or Windows than that of an application or middleware layer. While I agree that cloud compute is very close to this model,
cloud storage is a service oriented architecture, with programmability for new applications that can tolerate Internet latency because of Web Services (like ReST APIs).
The industry constantly overlooks this key point as it is consumed with the low cost, pay for use and thin provisioning capabilities of this storage tier. Solutions for thin provisioning and low cost have been available far longer than cloud storage. Further, pay for use is more of a business decision than a technology.
In the earliest days of cloud storage, there existed initial confusion that cloud storage was defined by cost, scalability, pay for use, and thin provisioning only and not programmable access (usually via ReST APIs). ParaScale paid a huge price for not understanding that cloud storage requires Web services (like ReST API) access. Now, with OpenStack Object Store, we see a follow on case of this same perspective, but with basic APIs for Put, Get and List.
Yes, it provides for Internet access via ReST APIs, but the focus continues to be primarily cost based versus new application enablement based. It could be argued that the open source approach will provide for the appropriate additions of "advanced services" to be added. However, even the use of the platform by NASA is more focused on cost of storage than on advanced functionality because NASA stores much more data than almost any institution or enterprise in the world.
I think
Savio Rodrigues states this view very well in his post:
"Select products based on business needs, not license alone: It's also interesting to note that very few enterprises are in NASA's position with regards to size of IT investment and skills in-house. While NASA engineers were ready and willing to contribute new features into the Eucalyptus open source community, few companies have the skills or governance to consider allowing their developers to contribute to open source projects. Summary trend number 7 from the 2010 Eclipse survey results highlighted this issue.
To suggest that NASA's buying or IT decision making patterns represents much more than the top 1 percent of IT buyers would be a stretch."
The overwhelming majority of enterprises would rather pay a vendor to deliver, maintain, support and enhance their private cloud software infrastructure than place that burden on internal IT staff. Whether the enterprise is paying for a closed source commercial product, a commercial product based on an open core product, or a subscription to an open source product, the product selection decision will be made based on business requirements much broader than 'is the product open source or not?' "
Keep in mind that cloud storage is a stand alone service associated with application delivery over the Internet and
also associated with low cost, pay for use, scalable storage resources. Social media applications and many Web based applications exploit these capabilities; for example publishing a file to a URL and significant tagging of files.
This view of cloud storage as nothing more than cost and volume-based ignores its extraordinary importance as a service-oriented architecture for new application enablement. I believe both views are equally important and need to be equally served. Will OpenStack, with its pervasive cost focus, be able to drive its community to this additional view of needed contributions of advanced services for cloud storage?
Lydia Leong of
Gartner Group provides an interesting view of the open source community issues associated with this in her post:
"At the same time, open sourcing is not necessarily a way to software success. Rackspace has a whole host of new challenges that it will have to meet. First, it must ensure that the roadmap of the new project aligns sufficiently with its own needs, since it has decided that it will use the project's public codebase for its own service. Second, it now has to manage and just as importantly, lead, an open-source community, getting useful commits from outside contributors and managing the commit process. (Rackspace and NASA have formed a board for governance of the project, on which they have multiple seats but are in the minority.) Third, as with all such things, there are potential code-quality issues, the impact of which become significantly magnified when running operations at massive scale."
One last comment on this business of vendor lock in and cloud storage APIs (another focus of the OpenStack announcement). I would submit that while a specific set of APIs has the potential to create vendor lock in, this is a much smaller problem than what is experienced in other technologies. If you are really worried about it, you probably have never actually written a ReST API call. It is written in many languages, and we have seen cases where applications that run on S3 run unchanged on
Mezeo. Others need very minor modifications, and still others are excited to take advantage of some of the unique Mezeo services. It just is not a problem, and this is much more related to FUD (fear, uncertainty and doubt) and marketing zealotry than it is associated with technological reality. The APIs of choice will shake out, and it is far too early to say if it will be S3, OpenStack, CDMI or a combination of all of these, and others, as yet unforeseen.
At Mezeo, we have never believed there will be one winner, and instead focused on architecture to enable easy and effective delivery of whichever APIs stand the test of time. The
Mezeo Cloud Storage Platform API enables advanced services and programmatic access to Mezeo enabled storage clouds. The Mezeo Interoperability API enables seamless interoperability of applications developed for Amazon S3, Google and Eucalyptus based storage clouds.
The interesting view that seems to be missing here is that marketplace competition by service providers already serves to drive down the price of cloud storage, so a commoditized stack embraced by most is unlikely to yield extraordinary incremental savings. At the same time, while the competitive market conspires to drive cloud storage costs ever lower, the need to differentiate, and deliver solutions as well as a programmable storage to enable multiple new and exciting types of applications will rapidly replace the pure cost and scale focus of current cloud storage offerings.
Sometimes, the "new" application is simply enabling it in the cloud, to produce the same result at a lower cost! This requires significant cloud storage functionality in order to make this easy and productive. Amazon continues to prove this with their many additions and capabilities which differentiate their service. Mezeo sees much the same view on the part of our customers. The focus is on what cloud storage can do, what problems will it solve, what business opportunities does it create, what new applications can it enable and all of these views assume it will be competitively priced.
Cloud storage represents significant opportunities for institutions, the enterprise (see my recent post on
the business case for enterprise cloud storage) and for the IT service provider. Cloud storage is substantially different from cloud compute, and requires that you understand this difference in order to effectively evaluate the impact of this announcement, as well as your next steps.