Partitioning in Apache Hive


Partitions

Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.thats where Partitioning comes into play a partition is nothing but a directory which contains the chunk of data when we do partitioning, we create a partition for each unique value of the column

lets run a simple example to see what it is

syntax to create partition table is

create table tablename(colname type) partitioned by(colname type);

if hive.exec.dynamic.partition.mode is set to strict, then you need to do at least one static partition. In non-strict mode, all partitions are allowed to be dynamic

yyy

partition

here we create a table named emp info with two fields name and addresss we partitioned the table by column ID of type int and then we insert the value in this table

it’s important to consider the cardinality of the column that will be partitioned on. Selecting a column with high cardinality will result in fragmentation of data Do not over-partition the data. With too many small partitions, the task of recursively scanning the directories becomes more expensive than a full table scan of the table.

syntax for inserting values is

insert into partition values();

first we insert record with id=1 now insert another record with id=2

par

now got to /user/hive

/warehouse/default/empinfo directory in your hdfs

ppp.png

as we can see there are two partitions one with name id=1 and other one as id =2 now when a  select query is fired with where clause it will not scan the full table it will only scan the required partition

www.png

if you will tried  it with a non partition table with large dataset it will take more time in comparison because it will have to go through entire table scan

i hope this blog will be helpful happy coding

KNOLDUS-advt-sticker

Advertisements
This entry was posted in Scala. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s