Translate

Friday, 13 March 2020

Shahed

HDFS Namenode ,FsImage,Editlogs Backup And Restore


HDFS Namenode ,FsImage,Editlogs Backup And Restore




How to perform HDFS metadata backup:
Backing up HDFS primarily involves creating a latest fsimage and fetching & copying it to another DR location. This can be done in there basic steps:
Note: These steps involves putting HDFS under safe mode (ready-only mode), so the Hadoop admins need to plan for that.
1. Become HDFS superuser
1.  # su - hdfs
2. (Optional) If Kerberos authentication is enabled, then do kinit as well
1.  # kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM
3. Put HDFS in safemode, so no write operation will be allowed
1.  # hdfs dfsadmin -safemode enter
2. Create new fsimage by merging any outstanding edit logs with the latest fsimage, saving the full state to a new fsimage file, and rolling edits
1.  # hdfs dfsadmin -saveNamespace
3. Copy the latest fsimage from HDFS to directory on local file system. This file can be stored for backup purpose
1.  # hdfs dfsadmin -fetchImage <local_dir>
4. Get Namenode out of safe mode to allow write operation and normal operations
1.  # hdfs dfsadmin -safemode leave
Explained above is a very basic level of HDFS metadata backup.
Apart from this, one can also plan to backup & maintain elaborated HDFS artifacts like the fsck output, directory listing, dfsadmin report and all the fsimage + editlog + checkpoints.
Back up the following critical data.
  1. On the node that hosts the NameNode, open the Hadoop Command Line shortcut (or open a command window in the Hadoop directory). As the hadoop user, go to the HDFS home directory:
runas /user:hadoop "cmd /K cd %HDFS_DATA_DIR%"
  1. Run the fsck command to fix any file system errors.
hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
The console output is printed to the dfs-old-fsck-1.log file.
  1. Capture the complete namespace directory tree of the file system:
hdfs dfs -ls -R / > dfs-old-lsr-1.log
  1. Create a list of DataNodes in the cluster:
hdfs dfsadmin -report > dfs-old-report-1.log
  1. Capture output from the fsck command:
hdfs fsck / -blocks -locations -files > fsck-old-report-1.log
Verify that there are no missing or corrupted files/replicas in the fsck command output.
  1. Save the HDFS namespace:
    1. Place the NameNode in safe mode, to keep HDFS from accepting any new writes:
hdfs dfsadmin -safemode enter
    1. Save the namespace.
hdfs dfsadmin -saveNamespace
Warning
From this point on, HDFS should not accept any new writes. Stay in safe mode!
    1. Finalize the namespace:
 hdfs namenode -finalize
    1. On the machine that hosts the NameNode, copy the following checkpoint directories into a backup directory:
5.  %HDFS_DATA_DIR%\hdfs\nn\edits\current
6.  %HDFS_DATA_DIR%\hdfs\nn\edits\image
%%HDFS_DATA_DIR%\hdfs\nn\edits\previous.checkpoint
7.Get Namenode out of safe mode to allow write operation and normal operations
1.  # hdfs dfsadmin -safemode leave
Restoring Name Node Metadata
This section describes how to restore Name Node metadata. If both the Name Node and the secondary Name Node were to suddenly go offline, you can restore the Name Node by doing the following:
  1. Add a new host to your Hadoop cluster.
  2. Add the Name Node role to the host. Make sure it has the same host name as the original Name Node.
  3. Create a directory path for the Name Node name.dir (for example, /dfs/nn/current), ensuring that the permissions are set correctly.
  4. Copy the VERSION and latest fsimage file to the /dfs/nn/current directory.
  5. Run the following command to create the md5 file for the fsimage.
$ md5sum fsimage > fsimage.md5
  1. Start the Name Node process.


Here is another way --- The Shahed Way


Put HDFS in safemode, so no write operation will be allowed
# hdfs dfsadmin -safemode enter

Create new fsimage by merging any outstanding edit logs with the latest fsimage, saving the full state to a new fsimage file, and rolling edits
# hdfs dfsadmin -saveNamespace

Copy the latest fsimage from HDFS to directory on local file system. This file can be stored for backup purpose
# hdfs dfsadmin -fetchImage <local_dir>

Get Namenode out of safe mode to allow write operation and normal operations
# hdfs dfsadmin -safemode leave

NOTES:
  • Safemode will impact any HDFS clients that are trying to write to HDFS.
  • The active NameNode is the source of truth for any HDFS operation.
  • A good practice is to perform a back once per month but more often never hurts

About Authors

Shahed Munir

Krishna Udathu

Shahed and Krishna are Oracle / Big Data Experts