apache hadoop

Deep Dive into Hadoop Map Reduce Part -2

Reading Time: 8 minutes Prerequisite: Hadoop Basic and understanding of Deep Dive in Hadoop Map reduce Part -1 Blog. MapReduce Tutorial: Introduction In this MapReduce Tutorial blog, I am going to introduce you to MapReduce, which is one of the core building blocks of processing in the Hadoop framework. Before moving ahead, I would suggest you to get familiar with HDFS concepts which I have covered in my previous HDFS tutorial blog. Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Difference between Apache Hadoop and Apache Spark Mapreduce

Reading Time: 4 minutes The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. In this blog, we will cover what is the difference between Apache Hadoop and Apache Spark MapReduce. Introduction Spark – It is an open source Continue Reading

Working with Hadoop Filesystem Api

Reading Time: 2 minutes Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system. To start Continue Reading

Setting Up Multi-Node Hadoop Cluster , just got easy !

Reading Time: 3 minutes In this blog,we are going to embark the journey of how to setup the Hadoop Multi-Node cluster on a distributed environment. So lets do not waste any time, and let’s get started. Here are steps you need to perform. Prerequisite: 1.Download & install Hadoop for local machine (Single Node Setup) http://hadoop.apache.org/releases.html – 2.7.3 use java : jdk1.8.0_111 2. Download Apache Spark from : http://spark.apache.org/downloads.html choose spark release Continue Reading

Hadoop Word Count Program in Scala

Reading Time: 2 minutes You must have seen Hadoop word count program in java, python or in c/c++ but probably not in Scala. so, lets learn how to build Hadoop Word Count Program in Scala. Submitting a Job to Hadoop which is written in Scala is not that easy, because Hadoop runs on Java so, it does not understand the functional aspect of Scala. For writing Word Count Program Continue Reading

Introduction to Apache Hadoop: The Need

Reading Time: 3 minutes In this Blog we will read about the Hadoop fundamentals. After reading this blog we will be able to understand why we need Apache Hadoop, So lets starts with the problem. Whats the Problem :- The problem is simple: the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives— have not kept Continue Reading

Introduction To Hadoop Map Reduce

Reading Time: 4 minutes In this Blog we will be reading about Hadoop Map Reduce. As we all know to perform faster processing we needs to process the data in parallel. Thats Hadoop MapReduce Provides us. MapReduce :- MapReduce is a programming model for data processing. MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal.MapReduce works Continue Reading