Author Archives: Anubhavtarar

How To Use Vectorized Reader In Hive


Reason For Writing This Blog is That  I tried to use Vectorized Reader In Hive But Faced some problem with its documentation,thats why decided to write this block Introduction Vectorized query execution is a Hive feature that greatly reduces the … Continue reading

Posted in Scala | Leave a comment

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Partitioning in Apache Hive


Partitions Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.thats where Partitioning comes into … Continue reading

Posted in Scala | Leave a comment

UnderStanding Optimized Logical Plan In Spark


LogicalPlan is a tree that represents both schema and data,these trees are manipulated and optimized by catalyst framework There are three types of logical plans ○ Parsed logical plan ○ Analysed Logical Plan ○ Optimized logical Plan Analysed Logical plan … Continue reading

Posted in Scala | Leave a comment

Starting Hive-Client Programmatically With Scala


Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce … Continue reading

Posted in Scala | Leave a comment

Apache Hive On Yarn


YARN is a software rewrite that decouples MapReduce’s resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications. For example, Hadoop clusters can now run interactive querying … Continue reading

Posted in Scala | Leave a comment

Apache Hive 2.1 Installation with TroubleShooting


Apache Hive is considered the defacto standard for interactive SQL queries over petabytes of data in Hadoop.Apache Hive 2.1 was released almost about a year ago,in this blog i will tell you all the installation steps for apache hive 2.1 … Continue reading

Posted in Scala | Leave a comment

Starting HiveServer2 Programmatically


HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client … Continue reading

Posted in Scala | 1 Comment

UnderStanding External Table In Hive


Usually when you create tables in hive using raw data in HDFS, it moves them to a different location – “/user/hive/warehouse”. If you created a simple table, it will be located inside the data warehouse. The following hive command creates … Continue reading

Posted in Scala | Leave a comment

Introduction to Scala Parser And Combinators


Scala parser combinators are a Powerful way to build parsers that can be used in everyday programs. But it’s hard to understand the plumbing pieces and how to get started. After you get the first couple of samples to compile … Continue reading

Posted in Scala | Leave a comment