Usually when you create tables in hive using raw data in HDFS, it moves them to a different location – “/user/hive/warehouse”. If you created a simple table, it will be located inside the data warehouse. The following hive command creates a table with data location at “/user/hive/warehouse/empl”
hive> CREATE TABLE EMPL(ID int,NAME string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; OK Time taken: 0.933 seconds
now load csv into this table
hive> load data inpath 'hdfs://localhost:54310/dat.csv' into table empl; Loading data to table default.empl Table default.empl stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0] OK
now try to drop this table
hive> drop table empl; OK Time taken: 0.397 seconds
When you drop the table, the raw data is lost as the directory corresponding to the table in warehouse is deleted.
You may also not want to delete the raw data as some one else might use it
so here comes the concept of external table
For External Tables ,Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.have a look at the below commands
hive> CREATE EXTERNAL TABLE EMPL(ID int,NAME string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/opt/carbonStore' ; OK Time taken: 0.13 seconds
To create external table, simply point to the location of data while creating the tables. This will ensure that the data is not moved into a location inside the warehouse directory.
now load data in this table
hive> load data inpath 'hdfs://localhost:54310/opt/carbonStore/dat.csv' into table empl; Loading data to table default.empl Table default.empl stats: [numFiles=0, totalSize=0] OK Time taken: 0.379 seconds
now drop this table
hive> drop table empl; OK Time taken: 0.301 seconds
even though table is deleted but still raw data is not
i hope this blog will help to understand external table concept