Danny van der Rijn

This user hasn't shared any biographical information.


Connecting ActiveSpaces and Hadoop

TIBCO is pleased to announce that we’re making code available that connects TIBCO ActiveSpaces to Apache Hadoop MapReduce, Apache Pig, and Apache Hive.  Hadoop and ActiveSpaces both support data-intensive distributed applications, and so they’re the perfect match!

Hadoop’s HDFS™ is designed to hold files of petabyte size, but the architectural tradeoff that was made is that you don’t get random access to that file – you have to read the whole thing.  Hadoop has some other storage technologies like HBase™ that have some more random access, but they have nothing like the in-memory data grid that ActiveSpaces provides.  With ActiveSpaces, you get random access to all of your data, without ever having to hit a disk, and with peer-to-peer replication, keeping Big Data in memory is quite feasible.

The code that we’re making available comes in three parts.  The first part integrates ActiveSpaces into the core MapReduce functionality, and provides an InputFormat and an OutputFormat for ActiveSpaces.  Take your class that describes your Tuples, and inherit from the simple provided interface ASWritable.  Now you can use MapReduce to create operations on all the Tuples in your space, and chain them together in all the standard MapReduce ways.

If, on the other hand, you’d prefer not to write Java™ code, but would prefer to script your data flows in Pig, the code supplies a LoadFunc and a StoreFunc that allow full interoperability between Pig and ActiveSpaces.  Now the dataflows that you’ve designed in Pig can read from, and write to ActiveSpaces, taking advantage of all ActiveSpaces has to offer. [Read more...]

Big Data in Real Time

Big Data was first characterized in 2001 as having three Vs: Volume, Velocity and Variety.  Volume refers to the sheer size of data that you need to work with, whether it’s Gigabytes, Terabytes or Petabytes.  Velocity is about the speed at which new data is generated, coming from more and faster streams of events.  Variety talks to the many different ways that data is represented, whether it has different structures, or has no structure at all.

To these three Vs, I like to add a fourth: Volatility.  When I talk about data Volatility, I’m talking less about the actual data, and more about what it represents.  Events occur every day in your enterprise that are digital representations of threats to, or opportunities for, your organization.  Perhaps the event represents the chance to help a customer in your store find – and buy – an item he’s looking for.  Or maybe the event is telling you that a cyber-thief is making off with your sensitive information.

In either case, the situation isn’t waiting around for you to respond to it; it’s on its own schedule.  If you aren’t ready to respond in the appropriate amount of time, then your customer – or his personal information he entrusted you with – has left your premises.

So what do you do about these four Vs?  It all boils down to just three words:  Understand.  Anticipate.  Act. [Read more...]