Author Archives: Anubhavtarar

A Step-by-step guide for setting MultiNode Mesos Cluster with Spark and Hdfs on EC2


Apache Mesos is open source project for managing computer clusters originally developed at the University Of California. It sits between the application layer and operating system to manage the application works efficiently on the large-scale distributed environment. In this blog, … Continue reading

Posted in Scala | Leave a comment

How Does Spark Use MapReduce?


In this talk we will talk about a interesting scenario did spark use mapreduce or not?answer to the question is yes,it use mapreduce but only the idea not the exact implementation lets talk about a example to read a text … Continue reading

Posted in Scala | 1 Comment

How To Use Hive With Out Hadoop


Reason for writing this blog is to answer the Most Common Question Can We use Hive With Out hadoop,so lets started it answer is yes Starting with release 0.7, Hive also supports a mode to run map-reduce jobs in local-mode … Continue reading

Posted in Scala | 2 Comments

How to query external hive Metastore From Spark


In this Blog we will learn how can we access tables from hive metastore in spark,so now just lets get started start your hive metastore as  as service with following command hive –service metastore by default it will start metastore … Continue reading

Posted in Scala | 1 Comment

Spark On Mesos(Installation)


In this Article We Will Learn How to Use Mesos On spark,so lets get started all you required is spark on your machine as a prerequisite,here are the steps to configure 1.Download Latest Mesos Version from here 2.extract the jar … Continue reading

Posted in Scala | 4 Comments

Why Dataset Over DataFrame?


In this Blog We Will Learn What is Really The Advantage That Dataset Api in spark 2 has over Dataframe api DataFrame is weakly typed and developers aren’t getting the benefits of the type system thats why the Dataset Api … Continue reading

Posted in Scala | 3 Comments

Create Your Own MetastoreEvent Listeners in Hive With Scala


HIve MetaStore Event Listeners are used to Detect the every single event that takes place whenever an event is executed in hive, in case You want some action to take place for an event you can override MetaStorePreEventListener and provide it your own … Continue reading

Posted in Scala | 1 Comment

How To Use Vectorized Reader In Hive


Reason For Writing This Blog is That  I tried to use Vectorized Reader In Hive But Faced some problem with its documentation,thats why decided to write this block Introduction Vectorized query execution is a Hive feature that greatly reduces the … Continue reading

Posted in Scala | Leave a comment

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Partitioning in Apache Hive


Partitions Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.thats where Partitioning comes into … Continue reading

Posted in Scala | Leave a comment