DELIVERBI Big Data Hive Command
Reference
Today we are sharing some of the frequently used Hive
commands and settings that will come handy
To see the status of the Hive Server2 and also view the logs
etc. you can use the following URL
http://<IPAddresses>:10002/hiveserver2.jsp
As you can see here, this URL allows you to view the
configuration among other things.
To List Databases use show databases;
To create a new database in the default location use create
database <db_name>;
To create a new database in specified location – create
database db_final location '/storate/<db_name>'
To drop a database use drop
database <db_name>;
To start using a database –
use <db_name>;
show tables; for listing the tables available in the current
database
For gathering table statistics analyze table deliverbi_part_t1_f
partition (cost_centre) COMPUTE STATISTICS;
To locate the storage directory – set hive.metastore.warehouse.dir;
To locate the storage directory along with other information
for a specific table – describe formatted deliverbi_part_t1_f;
To show the column headers – set hive.cli.print.header=true;
To stop showing column headers, set the above property with
value false
You can see the explain plan for a specific query using the
following
explain select count(*) from deliverbi_t1_f;
You can adjust the number of reducers using SET
mapreduce.job.reduces=5;
We will talk about sizing approach to work out optimum
number of mappers & reducers in future posts.
To enable dynamic inserts into a partitioned table set
hive.exec.dynamic.partition.mode=nonstrict;
For increasing the number of partitions from the default
value of 101 use the following set statements
set
hive.exec.max.dynamic.partitions.pernode=5000;
set
hive.exec.max.dynamic.partitions=5000;
You can enable the Cost Based Optimiser using set
hive.cbo.enable=true;
Below are some of the performance influencing settings. You
need to play around a bit to work out the best combination of these settings
that fits your specific setup
set
hive.compute.query.using.stats=true;
set
hive.stats.fetch.column.stats=true;
set
hive.stats.fetch.partition.stats=true;
set
hive.vectorized.execution.enabled=true;
set
hive.vectorized.execution.reduce.enabled = true;
set
hive.vectorized.execution.reduce.groupby.enabled = true;
set hive.exec.parallel=true;
set
hive.exec.parallel.thread.number=16;
Having covered the basics of Hive now, in future posts we
will touch upon a bit of scripting using Python to execute Hive commands.
Bye for now – Krishna and Shahed
Note: only a member of this blog may post a comment.