TIBCO is pleased to announce that we’re making code available that connects TIBCO ActiveSpaces to Apache Hadoop MapReduce, Apache Pig, and Apache Hive. Hadoop and ActiveSpaces both support data-intensive distributed applications, and so they’re the perfect match!
Hadoop’s HDFS™ is designed to hold files of petabyte size, but the architectural tradeoff that was made is that you don’t get random access to that file – you have to read the whole thing. Hadoop has some other storage technologies like HBase™ that have some more random access, but they have nothing like the in-memory data grid that ActiveSpaces provides. With ActiveSpaces, you get random access to all of your data, without ever having to hit a disk, and with peer-to-peer replication, keeping Big Data in memory is quite feasible.
The code that we’re making available comes in three parts. The first part integrates ActiveSpaces into the core MapReduce functionality, and provides an InputFormat and an OutputFormat for ActiveSpaces. Take your class that describes your Tuples, and inherit from the simple provided interface ASWritable. Now you can use MapReduce to create operations on all the Tuples in your space, and chain them together in all the standard MapReduce ways.
If, on the other hand, you’d prefer not to write Java™ code, but would prefer to script your data flows in Pig, the code supplies a LoadFunc and a StoreFunc that allow full interoperability between Pig and ActiveSpaces. Now the dataflows that you’ve designed in Pig can read from, and write to ActiveSpaces, taking advantage of all ActiveSpaces has to offer. [Read more...]