Author Archives: Anubhavtarar

How To Use Hive With Out Hadoop


Reason for writing this blog is to answer the Most Common Question Can We use Hive With Out hadoop,so lets started it answer is yes Starting with release 0.7, Hive also supports a mode to run map-reduce jobs in local-mode … Continue reading

Posted in Scala | 1 Comment

How to query external hive Metastore From Spark


In this Blog we will learn how can we access tables from hive metastore in spark,so now just lets get started start your hive metastore as  as service with following command hive –service metastore by default it will start metastore … Continue reading

Posted in Scala | Leave a comment

Spark On Mesos(Installation)


In this Article We Will Learn How to Use Mesos On spark,so lets get started all you required is spark on your machine as a prerequisite,here are the steps to configure 1.Download Latest Mesos Version from here 2.extract the jar … Continue reading

Posted in Scala | 3 Comments

Why Dataset Over DataFrame?


In this Blog We Will Learn What is Really The Advantage That Dataset Api in spark 2 has over Dataframe api DataFrame is weakly typed and developers aren’t getting the benefits of the type system thats why the Dataset Api … Continue reading

Posted in Scala | 2 Comments

Create Your Own MetastoreEvent Listeners in Hive With Scala


HIve MetaStore Event Listeners are used to Detect the every single event that takes place whenever an event is executed in hive, in case You want some action to take place for an event you can override MetaStorePreEventListener and provide it your own … Continue reading

Posted in Scala | Leave a comment

How To Use Vectorized Reader In Hive


Reason For Writing This Blog is That  I tried to use Vectorized Reader In Hive But Faced some problem with its documentation,thats why decided to write this block Introduction Vectorized query execution is a Hive feature that greatly reduces the … Continue reading

Posted in Scala | Leave a comment

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Partitioning in Apache Hive


Partitions Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.thats where Partitioning comes into … Continue reading

Posted in Scala | Leave a comment

UnderStanding Optimized Logical Plan In Spark


LogicalPlan is a tree that represents both schema and data,these trees are manipulated and optimized by catalyst framework There are three types of logical plans ○ Parsed logical plan ○ Analysed Logical Plan ○ Optimized logical Plan Analysed Logical plan … Continue reading

Posted in Scala | Leave a comment

Starting Hive-Client Programmatically With Scala


Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce … Continue reading

Posted in Scala | Leave a comment