We are back with another Important concept of big data is Dynamic partitioning in Hive. Before moving to the dynamic one we should know about static partitioning which I explained In the blog Static partitioning
Now it’s time to deep dive into a dynamic one.
How Dynamic Differ from Static Partitioing
- In this partition, columns values are only known at EXECUTION TIME
- User is just mentioning the column, on which partition is required. REST is taken care by hive itself
Working of Dynamic Partitioing
Dynamic-partition insert (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table. This is a newly added feature that is only available from version 0.6.0.
In the dynamic partition insert, the input column values are evaluated to determine which partition this row should be inserted into. If that partition has not been created, it will create that partition automatically. Using this feature you need only one insert statement to create and populate all necessary partitions.
In addition, since there is only one insert statement, there is only one corresponding MapReduce job. This significantly improves performance and reduces the Hadoop cluster workload comparing to the multiple insert case.
Step1: First, select the database in which we want to create a table. use StudentData; Step2: Enable the dynamic partition by using the following commands: - set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; Step3: Create a dummy table to store the data. create table stud_demo(id int, name string, age int, institute string,course string) row format delimited fields terminated by ','; Step4: Now, load the data into the table. load data local inpath '/home/codegyani/hive/student_details' into table stud_demo; Step5: Create a partition table by using the following command: - create table student_part (id int, name string, age int, institute string) partitioned by (course string) row format delimited fields terminated by ','; Step6: Now, insert the data of dummy table into the partition table. insert into student_part partition(course) select id, name, age, institute, course from stud_demo;
In the following screenshot, we can see that the table student_part is divided into two categories.
Can we drop/delete the hive partitions?
Yes, we can drop the partitions if required. To drop or delete the partition column using the below command
Let’s see a few variations of drop partition.
Drop a single partition
ALTER TABLE student_part DROP IF EXISTS PARTITION(course = 'java');
Drop a multiple partition
ALTER TABLE student_part drop if exists partition (course = 'java'), partition (course = 'hadoop');
Advantages of Dynamic Partition
- Good for loading huge files in tables.
- Data read row wise.
- Generally, it reduces the query processing time.
- We can use to load data from the table that is not partitioned.
- External and managed tables is used for this type of partition.
Disadvantages of Dynamic Partition
- We cannot perform alter on Dynamic Partition.
- Sometime needs to be extra carefully because one wrong partition column can lead to creating millions of partition in hdfs directory
From the above article, we saw how to used in the hive and how to create it. We also check the advantage of having a dynamic partition over the hive and how to use it. So from this article, we can have a fair idea of how it works in the hive and its advantage.