Thursday, 29 August 2019


HADOOP Balancing the cluster

Hadoop Balancing the Cluster

Balancing an hadoop cluster i feel is very important and the nodes should have a deviation of no more than 1%. I just feel as though this helps in having a healthy hadoop cluster , Schedule a job to run maybe once a week on a quiet day. The reason im saying quiet is the command i use stretches the balancer to run very fast 100x faster than the normal way of just running the balancer

The following command - speeds up the balancer to 100x

sudo as hdfs - or the hdfs user.

Firstly set this

hdfs dfsadmin -setBalancerBandwidth 100000000 

then run the following command - you can run as nohup or background job if you want. roughly takes care of 350gb in 5 minutes.

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1

About Authors

Shahed Munir

Krishna Udathu

Shahed and Krishna are Oracle / Big Data Experts