Friday, March 23, 2007

MapReduce plug-in for Eclipse. Now go and change the world...



Over the years, Google has developed some truly awesome "force-multiplier" architectural software components. These have helped them roll out hugely-scalable and high-performance applications very very fast at low low cost. Among them are -
-- MapReduce parallel computation framework
-- BigTable database
-- Google File System

MapReduce is a framework that -
-- Allows a programmer to define a program that needs to be run in parallel over a huge cluster of computers
-- Executes the software program in parallel by distributing it to a huge cluster, monitoring the execution progress and collecting the results

MapReduce has lowered the bar within Google for writing hugely-parallel applications. Mostly, the programmer has to worry only about writing the program in compliance with some MapReduce framework requirements. The entire parallelization and its enormous complexity is handled and hidden by the framework, freeing the programmer to concentrate on the actual functionality performed by his/her program. Within Google, MapReduce is used for building the search index etc etc.

Google has explained their MapReduce framework in this excellent paper by Jeffrey Dean and Sanjay Ghemawat. Based on this paper, Hadoop is an open-source implementation of MapReduce components.


Now, IBM has made available a plug-in for Eclipse that simplifies the development and deployment of MapReduce programs meant to be run within Hadoop. So all you brilliant engineers out there, go ahead and code the next parallel program that will change the world etc etc

No comments: