« Way to go Jonathan | Main | The GNU GPL is contagious. So what? »

May 12, 2008

Smoking the cloud: the technology of cloud computing

Pixmomamagrittefalsemirrorxx Everybody is getting high on cloud computing. Clouds are really all over the place. There is only one catch: it keeps everybody's mind hazy.  For instance Gordon Haff from c|net writes: 

Software as a Service (SaaS), Hardware as a Service (HaaS), Data as a Service (DaaS), and Web 2.0 are all part of the cloud. Even hosting providers are a sort of specialized, narrow case.

Sure  SaaS, HaaS (go Bears) , DaaS ... and  I'm sure MyaaS is part of the cloud too.

The common thread [as regard to cloud computing] is that the Network functions as a sort of abstraction layer and allows access mostly through “Web-y” protocols, languages, and standards like HTTP, RSS, XML, Javascript, and REST

Why not? And I'm sure WEB 2.0 is about having boxes with rounded corners which are easily drawn through an adequate abstraction layer.

Let's go to Wikipedia who does barely a better job:

The term derives from the fact that most technology architecture diagrams depict the Internet or IP availability by using a drawing of a cloud

By the same token one has to wonder why is not everybody  calling DataBase technology Cylinder computing... Wikipedia goes on:

The architecture behind cloud computing is a massive network of "cloud servers" [...]

Excellent. Did you know of Bozo computing? This is a massive network of "bozo servers"!

Ok, let's clean it up. For the most part  the confusion stems from 4 sources:

  • Amazon: they popularize the term therefore everybody started to equate their CPU and Storage utility computing business models to cloud computing: Good marketing job.
  • Marketing people: who have made fashionable  and trivialized words   like Utility, Grid, Mesh or even  SaaS  to re-launch over and over again this notion of  on-demand computing so dear to Gartner: when an analyst  prediction falls through you can always wait for the next hype cycle by inventing a new name.
  • Virtualization:    VMware discovered that manipulating runtime images (nicely reminiscent of LISP) would allow for dynamically transferring  running applications from one server to another and as such would be an incredible asset for on-demand provisioning and incremental hyper-scalability. Hence the strong association with on-demand and utility computing.
  • Supercomputing: not only because the CPU performances are compared  (silly since most often the coupling between super computer nodes is of much finer grain than the loose coupling between the nodes of  the cloud) but also because  supercomputing nodes are often linked according to specific network topology using the same vocabulary grid, cube etc.

So what is really cloud computing? Let's go back to the defining sources, originally (short of a few previous obscure examples) they were:

  • SETI @ home (and its modern avatar BOINC) that defined popular grid computing. Remember the times when SETI was tracking aliens  from your computer screen saver?
  • Amazon S3  whose CEO Jeff Bezos understood so well UPS business he decided to transpose it to Amazon IT infrastructure: it's not about  selling books or distributing package, it's -in both cases- about leveraging an extraordinary logistic infrastructure.

Defining technologies and concepts:

  • P2P services  who pioneered key technologies such as highly scalable and redundant look up services and massively distributed hash tables to name a few. (1)
  • Distributed AI that pioneered key concepts like Lisp map/reduce, actors and agents.

To be fair Google's and Amazon contribution in terms of technology is also  substantial.

Like any new paradigm Cloud computing represents a shift. In this case, it is best described by the addition of a new layer  we could call a Cloud Operating System.


At its core an operating system is really a task/process manager, a memory manager and an I/O manager. Similarly, a Cloud Operating System (COS) defines how are managed tasks/applications, how  memory/storage is organized and the mechanisms by which massive information flow  is handled. COS is a network operating system running atop of a cloud that is, an hyper network of computers.


Network

  • Massive distribution: often more than 100,000 nodes, maybe a million or more at Google.
  • Hyper reliability: Tens of  nodes can go up and down all the time without disturbing much the applications.(3)

 Information flow management

  • Semi-autonomy & near P2P coupling: since the overhead of having a completely centralized architecture over so many nodes would prevent any kind of scalability or reliability, cloud computing is heavily relying on neighboring algorithms where issues like discovery, monitoring, redundancy and hot swapping of tasks are managed/decided locally in a dynamic cluster of topological  neighbors.

Memory and Storage management

  • Distributed hash tables:  In such an environment you cannot directly use a classical database systems. Data bases even when as "simple" as an efficient hash table (e.g. Berkeley DB) would not scale enough since the cost of duplication/synchronization would quickly become higher than the cost of storage/retrieval. (see  Google BigTable, Amazon Dynamo)

Note that depending on the granularity of the system, a classical DB can be associated to a node or to a small cluster of nodes.

CPU and Load management

  • Hardware transparency: There is no notion of hardware (or guaranty of permanence thereof) on the application side. Computing units are usually referred to as nodes.
  • CPU distribution:  An application-level-only distribution  (often achieved through virtualization only), like that of Amazon EC2 or SUN network.com, is not equivalent to a Google cloud where applications themselves can be  explicitly  parallelized through special primitives like map/reduce.
       

Examples of Cloud Computing infrastructure


Example of Grid/utility computing

Possible Definitions

A computing cloud is a massively distributed network operating system allowing to build applications  on an  abstract layer implementing computing, storage and information flow management (key technology: Cloud Operating System). In a computer cloud the control is largely semi-autonomous and near-peer coupled.

A computing grid
  is a highly distributed  network of computing resources providing applications with transparent duplication (key technology: Virtualization). In a computer grid the control is largely centralized.

Notes

  • On-demand business models like SaaS can be implemented on a cloud,  a grid or just the classical way.
  • Companies can implement both grid and partial cloud (e.g. Amazon)
  • Cloud computing can be further enhanced by using virtualization as well.

     

This article is cited or quoted by:  Virtual Strategies

(1) See the relationship with P2P where nodes and super nodes are always going up and down as users are powering up and down their computers and an array of hundreds of thousands  of computers where (by virtue of their sheer numbers) machines are always breaking down or coming up.

(2)  One could also imagine Amazon S3 but distributed among clusters  of end users.  Would be cool to store redundantly those pictures of the dear little ones. One could sync pictures directly (from phones or from desktops) into a general or private (your friends/family) cloud. No more fear of disaster: a hard drive that dies that's often many  drawers of photographies burning. The paying version could even overflow on Amazon S3 itself. Maybe Fotonauts will bring us that. Would be cool.

(3) Imagine a server which reliability is  99.98 %. Pretty high ... But if you want an hyper network of cheaper servers maybe you'll require a still optimistic 99.9% reliability only. If you have 600,000 such servers it means that at all time, you have 600 servers down (best case) 6,000 servers down (worst case w/ a 99.0% reliability only)  And remember, if you want to replace them, you've got to find them!

 

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e550a42259883400e5521bc7938833

Listed below are links to weblogs that reference Smoking the cloud: the technology of cloud computing:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment