As we know database is the most important and powerful part for any organisation. It is the collection of Schema, Tables, Relationships, Queries and Views. It is an organized collection of data. But can you ever think about these question –
- How does database manage all the tables?
- How does database manage all the relationship?
- How do we perform all operations so easy?
- Is there any way to know about all this?
There is one answer for all the question and that is Metastore. In the Metastore database keeps all the information related to our databases, tables and relations as Metadata. When ever you want to know about database than in the Metastore we can easily find all the information.
Here we will talk about the Hive-Metastore system where it keep all the information about the tables and relations.
All hive implementation need a metastore service, where it stores metadata. It is implemented using tables in relational database. By default, Hive uses built-in Derby SQL server. It provides single process storage, so when we use Derby we can not run instance of Hive CLI. Whenever we want to run Hive on a personal machine or for some developer task than it is good but when we want to use it on cluster then MYSQL or any other similar relational database is required.
Now when you run your hive query and you are using default derby database you will find that your current directory now contains a new sub-directory metastore_db. Also the metastore will be created if it doesn’t already exist. The property of interest here is javax.jdo.option.ConnectionURL. The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true. This value specifies that you will be using embedded derby as your Hive metastore and the location of the metastore is metastore_db.
We can also configure directory for hive store table information. By default, the location of warehouse is file:///user/hive/warehouse and we can also use hive-site.xml file for local or remote metastore.
hive-site.xml : We used hive-site.xml for changing the configuring to specifying to Hive where the database is stored. We used JDBC compliant database for the metastore because default embedded database is not suitable for the production. For providing these configuration we used hive-site.xml file. An example of hive-site.xml for using MYSQL Database for storing metastore :
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionURL.metastore_spark_plsql</name> <value>jdbc:mysql://localhost/metastore_sql</value> <description>user metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>user</value> <description>user name for connecting to mysql server </description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> <description>password for connecting to mysql server </description> </property> </configuration>
When we use MYSQL JDBC driver than we download Jconnector(MYSQL JDBC Driver) and place in the $HIVE_HOME/lib and place hive-site.xml in $HIVE_HOME/conf. After this we will be able to store metastore in MYSQL.
To know about the metastore tables, field and relation please look into this diagram :
Here in this diagram, we can find our all the answer regarding the metastore. We can find out that how metastore stores information of database & tables and How these tables are internally connected with each other. In these tables we an find all the information which related to the tables.
I hope it will help you to develop basic understanding with the metastore in hive.