apache hadoop

Apache Hadoop vs Apache Spark

The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. In this blog, we will cover what is the difference between Spark and Hadoop MapReduce. Introduction Spark – It is an open source big data Continue Reading

Working with Hadoop Filesystem Api

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system. To start Continue Reading

Setting Up Multi-Node Hadoop Cluster , just got easy !

In this blog,we are going to embark the journey of how to setup the Hadoop Multi-Node cluster on a distributed environment. So lets do not waste any time, and let’s get started. Here are steps you need to perform. Prerequisite: 1.Download & install Hadoop for local machine (Single Node Setup) http://hadoop.apache.org/releases.html – 2.7.3 use java : jdk1.8.0_111 2. Download Apache Spark from : http://spark.apache.org/downloads.html choose spark release Continue Reading

Hadoop Word Count Program in Scala

You must have seen Hadoop word count program in java, python or in c/c++ but probably not in Scala. so, lets learn how to build Word Count Program in Scala. Submitting a Job to Hadoop which is written in Scala is not that easy, because Hadoop runs on Java so, it does not understand the functional aspect of Scala. For writing Word Count Program in Continue Reading

Introduction to Apache Hadoop: The Need

In this Blog we will read about the Hadoop fundamentals. After reading this blog we will be able to understand why we need Apache Hadoop, So lets starts with the problem. Whats the Problem :- The problem is simple: the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives— have not kept Continue Reading

Introduction To Hadoop Map Reduce

In this Blog we will be reading about Hadoop Map Reduce. As we all know to perform faster processing we needs to process the data in parallel. Thats Hadoop MapReduce Provides us. MapReduce :- MapReduce is a programming model for data processing. MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal.MapReduce works Continue Reading

%d bloggers like this: