Apache Hive

Best Way of Optimization: Bucketing in Hive

Reading Time: 4 minutes Apache Hive is an open-source data warehouse system used to query and analyze large datasets. Data in Apache Hive can be categorized into the following three parts : Tables Partitions Buckets What is Bucketing in Hive? Bucketing in the hive is the concept of breaking data down into ranges, which are known as buckets, to give extra structure to the data so it may be Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Hive-Metastore : A Basic Introduction

Reading Time: 3 minutes As we know database is the most important and powerful part for any organisation. It is the collection of Schema, Tables, Relationships, Queries and Views. It is an organized collection of data. But can you ever think about these question – How does database manage all the tables? How does database manage all the relationship? How do we perform all operations so easy? Is there Continue Reading