Translate

Sunday, 29 October 2017

Shahed

Hive Big Data Commands Reference




DELIVERBI Big Data Hive Command Reference

 
 

Today we are sharing some of the frequently used Hive commands and settings that will come handy

To see the status of the Hive Server2 and also view the logs etc. you can use the following URL

http://<IPAddresses>:10002/hiveserver2.jsp



 
 
 
 
 
 
 
 
 
 
 
 
 
 
As you can see here, this URL allows you to view the configuration among other things.

To List Databases use show databases;

To create a new database in the default location use create database <db_name>;

To create a new database in specified location – create database db_final location '/storate/<db_name>'

To drop a database use drop database <db_name>;

To start using a database –  use <db_name>;

show tables; for listing the tables available in the current database

For gathering table statistics analyze table deliverbi_part_t1_f partition (cost_centre) COMPUTE STATISTICS;

To locate the storage directory – set hive.metastore.warehouse.dir;

To locate the storage directory along with other information for a specific table – describe formatted deliverbi_part_t1_f;

To show the column headers – set hive.cli.print.header=true;

To stop showing column headers, set the above property with value false

You can see the explain plan for a specific query using the following

explain select count(*) from deliverbi_t1_f;

You can adjust the number of reducers using SET mapreduce.job.reduces=5;

We will talk about sizing approach to work out optimum number of mappers & reducers in future posts.

To enable dynamic inserts into a partitioned table set hive.exec.dynamic.partition.mode=nonstrict;

For increasing the number of partitions from the default value of 101 use the following set statements

set hive.exec.max.dynamic.partitions.pernode=5000;

set hive.exec.max.dynamic.partitions=5000;

You can enable the Cost Based Optimiser using set hive.cbo.enable=true;

Below are some of the performance influencing settings. You need to play around a bit to work out the best combination of these settings that fits your specific setup

set hive.compute.query.using.stats=true; 

set hive.stats.fetch.column.stats=true;

set hive.stats.fetch.partition.stats=true;

set hive.vectorized.execution.enabled=true;

set hive.vectorized.execution.reduce.enabled = true;

set hive.vectorized.execution.reduce.groupby.enabled = true;

set hive.exec.parallel=true;

set hive.exec.parallel.thread.number=16;

 

Having covered the basics of Hive now, in future posts we will touch upon a bit of scripting using Python to execute Hive commands.

Bye for now – Krishna and Shahed

Note: only a member of this blog may post a comment.



About Authors

Shahed Munir

Krishna Udathu

Shahed and Krishna are Oracle / Big Data Experts