Up-Scale Your Apps with Distributed Caching

aspaces… was the subject of today’s Forrester-IBM webinar on distributed cache technology, with both Forrester and IBM citing CEP and EDA as users for this technology, amongst others. The overriding driver for this tech being eXtreme Transaction Processing, which we might just refactor as eXtreme Event Processing for the purposes of this blog!

One minor quibble: John Rymer of Forrester did the introduction and during so classified the cache market as .NET, Java and NoSQL camps, with TIBCO placed in the Java camp. This might seem a fair classification of a complex market area, but of the 2 relevant TIBCO distributed cache offerings:

  • TIBCO BusinessEvents, although Java-based, is more accurately described as a CEP product that embeds a distributed cache – it wouldn’t normally appear on a vendor list of distributed cache technologies;
  • TIBCO ActiveSpaces is more accurately described as a data grid, but has .NET, C and Java interfaces. I’m sure other caching / data grid products have similar multiple interfaces – after all the client is just “an interface” to the cache /  grid.

Spare a thought in passing, though, for the OODBMS guys. Amongst this buzz about data grids and caching, I notice the Forrester blog is reporting that the Progress guys (disclosure: a TIBCO competitor in some areas) are now considering their ObjectStore OODBMS a “legacy platform”. Plus ça change, perhaps.

Comments

  1. Merry says:

    Due to its wide range of benefits, distributed caching is becoming more and more popular among the developers community. It not only saves a lot of time but also helps to have scalable and 100% uptime data. It is also a very useful tool because of the fact that some market leaders in distributed caching, like NCache and MS Velocity offers a wide range of topologies.

    • Paul Vincent says:

      Hi Merry – fully agree. Indeed this is almost part of the “NoSQL” movement as you don’t need queries to act against a cache (although you can do if you want). In some ways it is an evolution away from the client-server/DB-focused architectures that have been prevalent to now …

      It will be interesting to see whether the de facto inclusion of caching in app development tools (typified by TIBCO BusinessEvents) turns into a trend or remains rare. For sure the other CEP vendors have started offering options for using cache, but I am not sure it is widespread in other (eg SOA) development environments yet.

      Cheers

  2. I think it’s an over-simplification so lump both distributed data caches and distributed data grids under a single ‘distributed caching platform’ just because you can use both as a caching platform and that happens to be at this point in time the simplest and most common (but certainly not the only) use case for a data grid. But this is changing, and people are starting to use data grids to design extremely scalable distributed systems rather than simply trying to get a ‘performance boost’ out of the old DB-centric designs by simply using a cache (which IMHO is only pushing the bottleneck a bit further but not really solving the problem for the long term)

    Beyond the ‘caches can evict data at any time and therefore can not be used as a System of Record’ argument Vincent mentions, compare the set of features offered by memcached for example (the most popular distributed cache) and a proper data grid like active spaces and you will find a lot of differences. Besides distributed code execution over the grid, simple caches (just like DBs) have no noting of eventing: you can’t listen (subscribe) to changes in the data stored in a cache, and you certainly can’t create continuously updated queries. You also don’t have concurrency access control or transactions with a distributed cache. And you certainly don’t have the ability to ‘consume’ data (see for example my latests post on my blog on how to ActiveSpaces let’s you use a Space as a kind of ‘Distributed Queue’).

    In essence (good) data grid product is a form of middleware that can be used to easily create distributed applications, while a distributed cache is … well, just that. Categorizing them both as ‘distributed caching platform’ is IMHO too reductive. It’s a bit like putting motorcycles, cars, trucks and busses under a single ‘motorized vehicles’ banner.

  3. Mike Gualtieri says:

    There is no difference between a data grid and distributed caching platforms. Many of the caching platforms also include distributed code execution. Is code execution in the cluster how you distinguish data grid. We chose to abandon the data grid term because it is not descriptive of the primary use cases for this technology.

    • Paul Vincent says:

      Hi Mike, thanks for the explanation. For sure distributed caches, distributed tuple stores, and data grids all have overlapping use cases AND it makes perfect sense for analysts to cover them in a single market survey.

      But may I refer you to Jean-Noel Moyne’s technical discussion of why data grids are not distributed caches.
      In a nutshell:
      - caches can evict (lose) data. Data grids don’t.

      Consider also: caching is about efficient temporary storage of data (and event data in the case of CEP) without resorting to expensive persistence mechanisms (backing store in the case of TIBCO BusinessEvents). As soon as I want to explicitly retrieve data I am less in “(distributed) cache mode” and more in “data (grid) mode”. So one could, possibly, add an additional criterion:
      - caches don’t require explicit access operations by the host application. Data grids do.

      In terms of distributed code execution, I’m afraid that seems to be a bit of a red herring IMHO. Distributed code + distributed data implies an “application platform” like TIBCO BusinessEvents is for event processing. For sure, certain grid tools may have evolved to allow the distribution of business logic too, but in so doing they become less “distributed cache and data grid technologies” and more distributed app platforms. Then again, if their primary use case is still fast distributed data for other applications then it would still make sense to cover them under this topic…

      I notice Jean-Noel actually prefers the term “data and messaging grid” rather than data grid: inserting data into the data grid can have the same effect as posting a message on a network, its just that the “message” is persisted. This makes sense when you think about it… and we all remember the marketing cry “the network is the computer” (now, who WAS that? :) ).

      Cheers

Speak Your Mind

*