Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive can use Spark SQL or HiveQL for the implementation of queries.
Now Hive uses its metastore which contains the following information,
- Ids of tables,
- Ids of databases,
- Time of creation,
- Table names,
- Type of the table,
- And its owner’s names
Hive metastore is constructed with the following,
Metastore DB
It is defined as and Relational Database Management System (RDBMS), which contains the metadata for the schema and the two major types of tables as,
- Managed tables,
- External tables
Metastore Service
Metastore runs a background service as metastore service which is used to perform the database operations and manage the metastore data and storing of the data into Hive Tables.
Warehouse
Hive basically uses the HDFS to store the data retrieved into the tables, usually under the directory user/hive/warehouse.
Steps to setup MySQL metastore
Install the MySQL server (Optional if already installed)
Install MySQL java connector
If you are using the Spark’s internal hive, then copy the connector jar file into Spark’s lib folder as
If you are using the hive apart from Spark, then copy the connector jar file into Hive’s lib folder as
After we create a table in hive, we can see the metadata by executing the following queries in MySQL,
Keep blogging ….