Hive

Dynamic Partitioning in Apache Hive

Reading Time: 3 minutes Introduction We are back with another Important concept of big data is Dynamic partitioning in Hive. Before moving to the dynamic one we should know about static partitioning which I explained In the blog Static partitioning Now it’s time to deep dive into a dynamic one. How Dynamic Differ from Static Partitioing In this partition, columns values are only known at EXECUTION TIME User is Continue Reading

Overview of Static Partitioning in Apache Hive

Reading Time: 4 minutes What is Partitioning? In simple words, we can explain Partitioning as the process of dividing something into sections or parts, with the motive of making it easily understandable and manageable. Apache Hive allows us to organize the table into multiple partitions where we can group the same kind of data together. It is used for distributing the load horizontally which also helps to increase query Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

BigData Specifications – Part 1 : Configuring MySql Metastore in Apache Hive

Reading Time: 2 minutes Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive can use Spark SQL or HiveQL for the implementation of queries. Now Hive uses its metastore which contains the following information, Ids of tables, Ids Continue Reading

Apache Spark 2.0 with Hive

Reading Time: < 1 minute Hello geeks , we have discussed about how to start programming with spark in scala. In this blog we will discuss about how we can use hive with spark 2.0. When you start to work with hive , at first we need HiveContext (inherits SqlContext)  , core-site.xml , hdfs-site.xml and hive-site.xml for spark. In case if you dont configure hive-site.xml then the context automatically creates metastore_db in the Continue Reading

Hive-Metastore : A Basic Introduction

Reading Time: 3 minutes As we know database is the most important and powerful part for any organisation. It is the collection of Schema, Tables, Relationships, Queries and Views. It is an organized collection of data. But can you ever think about these question – How does database manage all the tables? How does database manage all the relationship? How do we perform all operations so easy? Is there Continue Reading