Hadoop Balancing the Cluster
Balancing an hadoop cluster i feel is very important and the nodes should have a deviation of no more than 1%. I just feel as though this helps in having a healthy hadoop cluster , Schedule a job to run maybe once a week on a quiet day. The reason im saying quiet is the command i use stretches the balancer to run very fast 100x faster than the normal way of just running the balancer
The following command - speeds up the balancer to 100x
sudo as hdfs - or the hdfs user.
Firstly set this
hdfs dfsadmin -setBalancerBandwidth 100000000
then run the following command - you can run as nohup or background job if you want. roughly takes care of 350gb in 5 minutes.
hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1
Note: only a member of this blog may post a comment.