Tag Archives: Hadoop

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. … Continue reading

Posted in apache spark, big data, Scala | Tagged , , , , , | Leave a comment

Understanding HDFS Federation


In this blog, we will discuss about Hadoop federation, Hadoop architecture vs Hadoop Federated architecture and will talk about various issues solved by hdfs federation. So let us first see why it is gaining so much popularity. To address this … Continue reading

Posted in Scala | Tagged | 1 Comment

Resolving the Failure Issue of NameNode


In the previous blog “Smattering of HDFS“, we learnt that “The NameNode is a Single Point of Failure for the HDFS Cluster”. Each cluster had a single NameNode and if that machine became unavailable, the whole cluster would become unavailable … Continue reading

Posted in big data, HDFS, Scala | Tagged , , , , , , , | 1 Comment

Working with Hadoop Filesystem Api


Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to … Continue reading

Posted in Java | Tagged , , | Leave a comment

Tableau: Getting into Tableau Public


Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So … Continue reading

Posted in apache spark, big data, Scala, Spark, Tableau | Tagged , , , , , , , | Leave a comment

Business Intelligence-Data Visualization: Tableau


Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use … Continue reading

Posted in Scala, Tableau | Tagged , , , , , , , , | 2 Comments

Setting Up Multi-Node Hadoop Cluster , just got easy !


In this blog,we are going to embark the journey of how to setup the Hadoop Multi-Node cluster on a distributed environment. So lets do not waste any time, and let’s get started. Here are steps you need to perform. Prerequisite: … Continue reading

Posted in Architecture, big data, Scala | Tagged , , , , , , | 7 Comments

BigData Specifications – Part 1 : Configuring MySql Metastore in Apache Hive


Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive … Continue reading

Posted in Scala | Tagged , , , , , , , , , | Leave a comment

Hadoop Word Count Program in Scala


You must have seen Hadoop word count program in java, python or in c/c++ but probably not in Scala. so, lets learn how to build Word Count Program in Scala. Submitting a Job to Hadoop which is written in Scala … Continue reading

Posted in big data, Scala | Tagged , , , , , | 4 Comments

Introduction to Apache Hadoop: The Need


In this Blog we will read about the Hadoop fundamentals. After reading this blog we will be able to understand why we need Apache Hadoop, So lets starts with the problem. Whats the Problem :- The problem is simple: the … Continue reading

Posted in Scala | Tagged , , | Leave a comment