Tuesday 30 October 2012

Hadoop Analytics is not real time - A Reality or Myth?

Big Data is the talk of the street and Hadoop is emerging as the platform of choice for running analysis on both structured and unstructured Big Data. 

One of the main strengths of Hadoop is ad-hoc massive scale analysis against the data stored in Hadoop. In a typical Hadoop usage, enterprises will dump majorityof their unstructured in HDFS and periodically run Map-Reduce analysis to gain insights into new data and optionally structure it for storage in other external data sources for reuse by other applications.

Following diagram (though overly simplified) reflects this usage of Map-Reduce.


While this model works quite well for offline-batched analytics, its serious limitation is that it cannot be used for real time decision-making. Business use cases that demand quick action on their data (e.g. security markets, fraud detection, fault detection, location-based services, Facebook Insights, Twitter trends etc.), cannot leverage Hadoop Map-Reduce for immediate real time analytics on their new data and leverage alternate technologies to meet the needs.

There is a popular belief that Hadoop Map-Reduce cannot be real time, which is true so far. HFlame (www.hflame.com, a product from Data Advent) breaks the shackle without reinventing Hadoop or any of its components. HFlame transforms customer’s existing Hadoop infrastructure  (e.g. Apache Hadoop, CDH, HDP) with real time data analysis infrastructure.

Following diagram explains the change in Map-Reduce processing with HFlame –


HFlame Map-Reduce jobs are continuously running (i.e. job is still active even when no data is available in HDFS to process). As soon as new data is written to HDFS, it is immediately passed to the appropriate real time Map-Reduce jobs. Real time Map-Reduce will either

  • Produce the immediate insights on the new data or
  • Collect the new data for specific amount of time and produce analytics results on the collective data.

HFlame continuous analysis places Hadoop right in the center of real time business solutions. Businesses can analyze the data stream instantaneously and leverage patterns like continuous query, complex event processing without introducing any further complexity to their infrastructure.
HFlame is completely transparent to the Hadoop users and works with their own Hadoop distribution and installation. HFlame leverages the core of Hadoop HDFS and Map-Reduce data processing framework.

Check out http://www.hflame.com or http://www.dataadvent.com for more details.