Dan Templeton takes a closer look at the new Apache Hadoop integration with Sun Grid Engine 6.2u5:

"The Hadoop integration consists of two halves. The first half is a parallel environment that can start up a jobtracker and tasktrackers on the nodes assigned by the scheduler. (HDFS is assumed to already be running on all the nodes in the cluster.) In the end, it's really no different than an MPI integration (especially MPICH2). The second half is called Herd. It's the part that talks to HDFS about blocks and racks, and it has two parts. The first part is the load sensor that reports on the block and rack data for each execution host (hosts that are running the SGE execution daemon). The second part is a Job Submission Verifier that translates requests for HDSF data paths into requests for racks and blocks." Full Story



More...