MapReduce

Deep Dive into Hadoop Map Reduce Part -2

Reading Time: 8 minutes Prerequisite: Hadoop Basic and understanding of Deep Dive in Hadoop Map reduce Part -1 Blog. MapReduce Tutorial: Introduction In this MapReduce Tutorial blog, I am going to introduce you to MapReduce, which is one of the core building blocks of processing in the Hadoop framework. Before moving ahead, I would suggest you to get familiar with HDFS concepts which I have covered in my previous HDFS tutorial blog. Continue Reading

Deep dive into Map Reduce: Part -1

Reading Time: 5 minutes Prerequisite : Basic concepts of Hadoop and Distributed File system. Map-Reduce Architecture is a programming model and a software framework utilised for preparing enormous measures of data. Map-Reduce program works in two stages, to be specific, Map and Reduce. Map requests that arrange with mapping and splitting of data while Reduce tasks reduce and shuffle the data . Map-Reduce is a programming model Neither platform- nor Continue Reading

Hadoop Word Count Program in Scala

Reading Time: 2 minutes You must have seen Hadoop word count program in java, python or in c/c++ but probably not in Scala. so, lets learn how to build Hadoop Word Count Program in Scala. Submitting a Job to Hadoop which is written in Scala is not that easy, because Hadoop runs on Java so, it does not understand the functional aspect of Scala. For writing Word Count Program Continue Reading

Introduction To Hadoop Map Reduce

Reading Time: 4 minutes In this Blog we will be reading about Hadoop Map Reduce. As we all know to perform faster processing we needs to process the data in parallel. Thats Hadoop MapReduce Provides us. MapReduce :- MapReduce is a programming model for data processing. MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal.MapReduce works Continue Reading

Apache PIG : Installation and Connect with Hadoop Cluster

Reading Time: 4 minutes Apache PIG, It is a scripting platform for analyzing the large datasets. PIG is a high level scripting language which work with the Apache Hadoop. It enables workers to write complex transformation in simple script with the help PIG Latin. Apache PIG directly interact with the data in Hadoop cluster. Apache PIG transform Pig script into the MapReduce jobs so it can execute with the Continue Reading

Let Us Grid Compute

Reading Time: 3 minutes Since early times oxen were used for heavy pulling. Sometimes the logs were huge and an oxen could not pull it. The smart people from the earlier times did not build a bigger ox. Instead they used two or three together. Simple, isn’t it? It is the same concept which has gone behind the use of multiple commodity hardware linked together to provide super processing Continue Reading