CEP and the N+Tier Architecture

Work on the EPTS Reference Architecture is moving forward, with excellent contributions from the event processing industry and academia (EPTS members can observe the work-in-progress on the EPTS working group wiki). One interesting angle of discussion revolves around the compatibility and overlap of “event processing” with conventional “data processing” technologies, especially when it is possible to view “events” as “data” and process them as such. Of course, critics of CEP will counter that all event processing tools simply re-use existing data processing techniques (such as extended versions of SQL engines, or inference engines, or neural nets etc…), and to some extent they are right – what’s new is the idea of re-arranging these techniques to exploit higher-performance CPUs and larger amounts of available memory for in-situ event pattern detection in everyday business problems.

Defining a generalized event processing (and complex event processing) architecture that covers all use cases, even of a single (albeit ambidextrous) tool like TIBCO BusinessEvents, is rather challenging. There are a number of different architecture patterns possible, covering different functional and performance requirements. One generalized effort is included here, based on TIBCO’s work for customers in this area:

TierN+ CEP small

The various tiers can be described as:

Tier 0 (not displayed):  The sources of events could be anything from SCADA control systems, RFID detectors, BPM or ERP systems, or whatever.

Tier 1: Event channel (or transmission medium). In TIBCO’s world this is usually EMS (topic-based or queue-based) or RV message buses, although it could also easily be some other middleware mechanism.

Tier 2: Event PreProcessor. Optional. Handles basic event filtering, and roles such as concept enrichment (when the event is just adding information to an existing event object), concept creation (the event is transformed into an event object, or concept in BusinessEvents), and forwarding to local and/or distributed memory (a.k.a. event store, for processing elsewhere).

Tier 3: Distribution Agent(s) or Event Store. Optional – used for high-performance, high volume, fault-tolerence and large-scale CEP. Provides storage for events (and event objects) required for the event patterns representing complex (aggregate, abstract, derived) events – where these event patterns may occur over periods of time. Needed (in conjunction with Tier 4…n) where the volume of events, or amount of event processing, exceeds the limits of a single process node’s memory or CPU limits. Used also when information needs to be replicated across nodes to support fault tolerance and failovers. For performance reasons this is often some kind of high-performance cache or data grid (as is the case of BusinessEvents’ cache component), but it could also be a common-or-garden file-oriented database… in TIBCO BusinessEvents we support both approaches.

Tier 4…n: Event Processing Agent(s). In the simplest case, a single EP Agent can attach to a channel and process events (2-tier). Or it can involve a Preprocessor either in-process or as a separate agent (2.5-tier). Or be used with the cache (4-tier). Ultimately these can be organised as a hierarchy of EP services, with these services themselves constituting multiple, load-balanced, identical agents. These EP Agents define event patterns (through rule conditions, state transitions, and queries) and communicate either directly to event channels [Tier 1] or, more often, through the event store [Tier 3]. Large workloads can be distributed through queries (i.e. pull model) or through shared event / event object updates (i.e. push model). In BusinessEvents, EP Agents can be one of 2 base types: inference agents (for rules and states), and query agents.

Tier n+1: Services. Optional. Existing SOA services may also contribute data to the event processing agents, either as-required or cached in the (event) store to be used and updated as needed. Typically service invocation (which could also be a BPM workflow invocation)  involves a call out to some mechanism like TIBCO BusinessWorks, and possible some TIBCO Adapter.

Tier n+2: Persistence.  Optional. When the event store [Tier 3] is using high-performance cache there is a need to provide back-up persistence for tasks like archiving, start-up load and shut-down un-load, and standard analytics. You might also have operational data stores you need to query directly (rather than through the services layer [Tier n+1]) to enrich event objects for processing. If you want to visualise all your events over time to identify new patterns you would typically use something like TIBCO Spotfire against this layer, and perhaps S+ Miner to extract some rules to update some Tier 4…n agent with…

CEP and Complexity Theory

Jim Sinur at Gartner has posted his views on CEP on his blog. Jim comments on “event discovery” (a.k.a. detection?) via rules, patterns (maybe meaning streaming or continuous queries), process (maybe meaning process exceptions or manual detections), algorithms, and “context”. For the latter he proposes a connection between CEP and “Complexity Theory” which according to Wikipedia is either Chaos Theory or maybe Complex Adaptive Systems (“complex, self-similar collection of interacting adaptive agents … focuses on complex, emergent and macroscopic properties of the system”). That one is for the lab, methinks.

The Model2Agent Approach to configuring distributed systems

One of the interesting requirements for XTP/XEP (as mentioned in the previous blog posting) is that such performance (and the need for fault tolerance / failover) requires co-operation across multiple systems, ideally generalised in a multi-agent architecture [*1]. The problem here is usually in re-factoring to change performance characteristics as the project develops, to optimize the architecture to the performance needs, available compute nodes, and network (as well as changes to event throughputs).

One solution is to take the agent-based approach, where agents can be re-configured (through model parameters) very simply. Agents can:

  • execute declarative production rules “as required”, removing most of the need to specify / re-engineer (/ debug etc) “entry” and “exit” points;
  • use queries to partition data (/event objects) and process loads to separate agents;
  • rely on shared distributed cache to provide a “co-operative model” of events and data for processing.

The re-factoring / re-architecting process is, to some extent, as simple as specifying which rulesets are assigned to which agents, and defining more or fewer agents as required:
a. Revise named agents (aka agent types and roles)
b. Re-allocate rulesets (or queries) to appropriate agents
c.  Deploy (or re-deploy, with appropriate care) in numbers required.

So the agent modeling aspect is to map the event processing elements first to agent base types (e.g. inference or query [*2]) and subtype (e.g. event preprocessor, event router, event/data processor, batched event/data processor,  etc)  and verify the event flow / data flow is complete and without contention. In this way you can quickly re-architect a TIBCO BusinessEvents application from single-node to multi-node, high performance to extreme performance…

Notes:

[1] Agent representations and modelling is currently the subject of an OMG standard, AMP, currently under development.

[2] CEP agents in BusinessEvents come in 2 forms: inference agents and query agents, roughly equating to CEP agents and Event Stream Processing agents…

eXtreme Event Processing: 5K EPS…

Defining what is “eXtreme” is a difficult topic. For example, if I am simply “processing” simple events as incoming data with no correlation requirements, I have a significantly easier job than if I am doing complex event processing, abstracting higher level events over time from incoming events.

Here is an interesting example that was being discussed last week: during an on-site application test, an enterprise event processing application (using TIBCO BusinessEvents) was comfortably processing events at their arrival rate of ~750 eps (events per second) when a “testing glitch” caused the EMS (JMS) queue to back up; when the offending component (i.e. agent) was corrected and restarted (in situ) the CEP application maxed out its CPUs to clear the queue at a reported ~5K eps. This 5K figure compares quite well “as a number” with other complex applications (as opposed, maybe, to event stream processing of simple events), although probably it will be up to someone like the EPTS Use Case Working Group to make some classifications here.

My guess is that “eXtreme Event Processing” or “XEP” (in which we really mean “extreme CEP” involving correlation across events, XML processing, some number of business rules, large number of fields per base event message, 24+ hour retention, fault tolerance, and so forth) probably starts at ~1000 events per second. But ultimately its not just a matter of “how fast”, but also “work done”…

Is CEP a bird, plane or… OODB?

Last week saw 2 questions come up regarding the relationship between Complex Event Processing tools and object oriented databases or OODBs. First, while at OMG’s Technical Meeting, we were asked about TIBCO’s interest in the OMG ODMG RFP to plan a successor to ODMG3.0. Then came a question from ODBMS.ORG asked the community about common design patterns for ODBMS installations. So it is probably time to explore the how an advanced event processing platform like TIBCO BusinessEvents relates to an ODBMS.

Firstly, what is an ODBMS? The ODBMS.ORG definition says “[per] Malcolm Atkinson and others …  An object-oriented database system must satisfy two criteria: it should be a DBMS, and it should be an object-oriented system, i.e., to the extent possible, it should be consistent with the current crop of object-oriented programming languages.

  • The first criterion translates into five features: persistence, secondary storage management, concurrency, recovery and an ad hoc query facility.
  • The second one translates into eight features: complex objects, object identity, encapsulation, types or classes, inheritance, overriding combined with late binding, extensibility and computational completeness.”

TIBCO BusinessEvents (BE) includes what can be thought of as a “distributed grid-based real-time obect database” for storing events (and event objects) for use across multiple Event Processing Agents during complex event processing.

The RDBMS-style criteria compare well to BE as follows:

  • BE persists the event and data objects.
  • BE has a secondary back-up option to a conventional RDBMS.
  • Provides concurrency through supporting multiple Event Processing Agents (indeed, it is based on cache technology usually used to share information across applications in real-time).
  • Supports fault tolerence / recovery features (through redundancy in its cache nodes).
  • Although the objects are used to feed the Event Processing Agents, you can run ad hoc continuous (or static) queries in consoles against BE.

However, BusinessEvents has a mixed score on the object criteria:

  • Complex objects (and events) and identity (unique names for instances): in BE’s case the “classes” are called Concepts, and can be defined in terms of properties and (reference and inclusion) associations. Different types (and concept/class associations) are supported, per UML Class.
  • For encapsulation, BE encapsulates information, but not behaviors. BE uses UML State, production rules and queries against the objects; state models are defined per Concept / class, but rules and queries are not encapsulated… arguably you cannot encapsulate rules inside the objects they refer to, as rules will generally work against multiple objects. At best, the encapsulation is at the Event Processing Agent level.
  • Both Concepts and Events support single inheritance, like Java. State models are also inherited.
  • Overriding is not really supported (as BE does not support methods, let alone overridable methods).  Nor is late binding, except note that when transforming an “event” into an “event object”, your rules specify what object or objects to create or enrich…
  • For extensibility and completeness, the BE event processing language can be extended as required, and should be considered complete…

Some of the other BE OODB characteristics are:

  • BE has 2 options for controlling object persistence:
    • automatic persistence and update, for default persistence, and
    • explicit persistence via controlling functions, for fine-tuned control.
  • An OQL-based query language, extended for continuous monitoring of data / event updates (sliding time-based windows etc).
  • No direct API for 3rd party applications to do CRUD operations on BE’s OODB contents. Although this could be done…
  • The main processing type carried out on the OODB data is via the inference rules engine, not queries embedded in some process – although the query language can be used to provide supporting set-based facts for the rules’ operation. Indeed, BE’s queries are not used for “insert” or “update” type operations – those are done by rule actions.
  • No batch-type “report generators” – BE’s OODB is used to serve the Event Processing Agents which in turn can provide event-based “reports” (as event payloads), as BAM-style notifications.

At best, TIBCO BusinessEvents, as a general purpose event processing platform with object persistence, can be described as providing an “embedded OODB” rather than being a “general purpose OODBMS”. In other words, it provides not a “general purpose object persistence server”, but instead an “object and event persistence service, supporting event processing”. Subtle, but maybe significant.

CEP in hardware, too?

TIBCO announced today the delivery of the first TIBCO Messaging Appliance – basically an RV agent in a box – and this has been widely reported and blogged about. It’s quite feasible that the same approach could be used for “basic” complex event processing operations, especially those that don’t require history (or much persistence) – examples that spring to mind include simple streaming queries, aggregations, identifying missing events over time, and so forth. Indeed, there are academic studies of such approaches in progress today.

The TIBCO Messaging Appliance works with the TIBCO BusinessEvents CEP product, whose multi-agent architecture may well be required to consume such high messaging rates, as well as any other RV-enabled tool.

Cloud Event Processing vs Event Cloud Processing?

Interesting to see the huge interest in Cloud Computing. David Luckham’s complexevents.com just referenced one of several recent Infoworld articles and blogs that attempt to define the term. Another compares the buzz to past excitements – remember Application Service Provider and Software As A Service? But Clouds are of direct interest to the Complex Event Processing community: CEP is, afterall, about processing clouds of events.

Just to be clear…

1. Complex Event Processing is another term for Event Cloud Processing: processing clouds of events, which may include multiple event streams of course.

2. Cloud-based Event Processing is where some event processing application is put into a closed / corporate, or public / distributed, cloud computing environment. Event latency and event store latency will be major considerations here.

It’s interesting to compare “cloud computing” with more conventional advances in distributed corporate IT. TIBCO BusinessEvents, for example, provides an “event cloud processing” (aka CEP!) platform that includes both the event processing and data cache agents. Will this ever be requested, or offered, as a cloud service by TIBCO or a TIBCO partner? Possibly. Application domains for this might be the interface between the social network and commercial IT worlds – consider things like medical drug programs, healthcare data collection, and so forth. We’ll let you know if and when it happens…

The Eight Fallacies of Distributed Computing

Ashwin used this great slide at our TIBCO BusinessEvents QLGroup best-practices session this week, courtesy of James Gosling’s blog and credited to Peter Deutsch.

Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn’t change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous

Arnon Rotem-Gal-Oz did a follow-up white paper that goes into more details.

TIBCO BusinessEvents 3.0

… has hit the newswires [*1,2,3] along with an IDC comment that TIBCO BusinessEvents is “the undisputed CEP leader, with a market share of 40.2 percent”. TIBCO BusinessEvents version 3 has already been in the hands of customers for a few weeks, and has lots of interesting cool features, such as:

  • Distributed decision engine – for real-time event-driven decisions
  • Multi-agent blackboard architecture – for complex Complex Event Processing, reliability and failover support, etc
  • Workflow-maintained decision management – for rule-driven control of the lifecycle of decisions
  • Support for multiple extensible event processing paradigms, including inference rule, continuous queries, and state management.

We’ll expand more on these over the coming weeks, but for now we need to start defining TIBCO BusinessEvents as an event processing platform, not just an “engine”. The platform covers:

  • Event processing via rules, queries, custom algorithms, etc
  • Distributed persistence of event history via a high-performance data grid
  • Reliability and scalability via multiple agents
  • IT modelling via concepts, states and rules
  • Business modelling via decision tables
  • Multiple event channels supported for flexibility.

Links:

[1] Infoworld / PCWorld Business gives some more analysis, and comments on the follow-up step being event analytics.

[2]  ZDNet’s Dana Gardner makes some excellent points on the news and why the CEP space is important. And ends with “… TIBCO’s products are pointing up how now to take the insights of CEP into the realm of near real-time responses and ability to identify and repeat effective patterns of business behaviors. Dare I say, “agility”?”

[3] TechTarget adds another view concentrating on the rule management feature.

Blackboards for Complex Event Processing

An interesting post on Tim Bass’ CEP blog [*1] describing Blackboard Systems, which is an established term from the era of AI research for “distributed knowledge systems” that co-operatively solve problems. Tim and I have previously mentioned blackboards and blackboard systems in the context of Complex Event Processing (CEP), but the passage of time has meant that “blackboard” is more significant for implying “distributed shared memory” [*2] in a CEP context, rather than just co-operating threads or agents looking at a shared database or memory structure [*3]. Distributed memory is a requirement we see for scalable, high-throughput event processing beyond what you can fit into a single machine’s (or JVM‘s) memory space.

A general progression for “CEP system complexity” on how the system handles memory is:

  • in-memory only, with persistence for reliability / restore operations
    = small, fast, independent CEP or Event Stream Processing (ESP) applications
  • single-machine, multi-process (for example using multiple cores), sharing the same memory
    = small-medium, pretty fast, with a restricted number of co-operating processes
  • multi-machine network of processes (exploiting control as well as data events across the network):
    • independent memory models
      = where the problem area can be partitioned without side effects: multiple parallel identical processes (for performance)
    • shared-memory models (usually using some cache technology)
      = where the problem area is large and inter-dependent, requiring inter-dependent or co-operating processes (for solution complexity) (as well as allowing for parallelism for performance).

CEP frameworks can generally support all these models (out of the box as for TIBCO BusinessEvents, or with various amounts of custom development work). Of course, the last model (multi-machine network with shared memory) is the interesting one for “Blackboard System” types of architectures (i.e. cooperative CEP agents working against a shared information model and event store, possibly under the control of a Master Control Program / Agent).

Other useful references are:

One suspects the “blackboard systems” domain and terminology is overdue some updates thanks to developments in the Complex Event Processing space.

Notes:

[1] Disclaimer: Tim is an ex-colleague and runs a vendor-independent blog on aspects of CEP.

[2] Blackboard systems historically used a single memory model (i.e. multiple threads or processes using a single machine’s memory model). But the interesting aspect for CEP is not that event processing agents can create new events to be used by other CEP agents (which is pretty much de facto CEP runtime behavior), but that the memory model can exist across multiple machines (i.e. can be distributed).

[3] This old paper even suggested that blackboard systems’ reign in AI research was curtailed by rule systems’ use of independent rulesets operating on a shared working memory – i.e. standard rule engine behavior. Rule-driven CEP engines like TIBCO BusinessEvents can certainly operate this way, with “independent” declarative rulesets cooperating on a problem. This approach is more difficult if you can represent your CEP or ESP solution only as a “flow diagram”, as you are explicitly fixing (non-declaratively) the interoperation of the CEP processing elements.