Prerequisites for HDFS High Availability (HA) on Debian
Before configuring HDFS HA, ensure the following prerequisites are met:
- Debian Nodes: At least 5 nodes (2 NameNodes, 3 JournalNodes, and multiple DataNodes) with identical Debian versions (e.g., Debian 11/12).
- Java Environment: Install OpenJDK 11 or 17 on all nodes (
sudo apt install openjdk-11-jdk). - Hadoop Installation: Download and extract the same Hadoop version (e.g., 3.3.6) on all nodes. Configure basic environment variables in
~/.bashrc(e.g.,export HADOOP_HOME=/usr/local/hadoop,export PATH=$PATH:$HADOOP_HOME/bin). - Hostname & Hosts File: Set unique hostnames (e.g.,
namenode1,namenode2,journalnode1) and update/etc/hostswith IP-hostname mappings for all nodes. - SSH Configuration: Enable passwordless SSH between all nodes (generate keys with
ssh-keygen -t rsaand copy to other nodes usingssh-copy-id). - ZooKeeper Cluster: Deploy a 3-node ZooKeeper ensemble (critical for HA coordination; follow standard ZooKeeper setup steps on separate nodes).
Step 1: Configure JournalNode Nodes
JournalNodes store edit logs (transaction records for HDFS metadata) and ensure consistency between Active and Standby NameNodes.
- Create a dedicated directory for JournalNode data:
sudo mkdir -p /usr/local/hadoop/journalnode/data sudo chown -R $USER:$USER /usr/local/hadoop/journalnode - Add JournalNode configuration to
$HADOOP_HOME/etc/hadoop/hdfs-site.xmlon all nodes:<property> <name>dfs.journalnode.edits.dirname> <value>/usr/local/hadoop/journalnode/datavalue> property> - Format JournalNodes (run once on each JournalNode):
hdfs namenode -formatJournalNode - Start JournalNode service on all JournalNodes:
Verify status withhadoop-daemon.sh start journalnodejps(should showJournalNodeprocesses).
Step 2: Configure NameNode High Availability
This step enables two NameNodes (Active/Standby) to share metadata via JournalNodes.
- Edit
$HADOOP_HOME/etc/hadoop/core-site.xmlto define the HDFS namespace and ZooKeeper address:<property> <name>fs.defaultFSname> <value>hdfs://myclustervalue> property> <property> <name>ha.zookeeper.quorumname> <value>zk1:2181,zk2:2181,zk3:2181value> property> - Edit
$HADOOP_HOME/etc/hadoop/hdfs-site.xmlto configure NameNode roles, RPC/HTTP addresses, shared edits, and failover:<property> <name>dfs.nameservicesname> <value>myclustervalue> property> <property> <name>dfs.ha.namenodes.myclustername> <value>nn1,nn2value> property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1name> <value>namenode1:8020value> property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2name> <value>namenode2:8020value> property> <property> <name>dfs.namenode.http-address.mycluster.nn1name> <value>namenode1:9870value> property> <property> <name>dfs.namenode.http-address.mycluster.nn2name> <value>namenode2:9870value> property> <property> <name>dfs.namenode.shared.edits.dirname> <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/myclustervalue> property> <property> <name>dfs.client.failover.proxy.provider.myclustername> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue> property> <property> <name>dfs.ha.fencing.methodsname> <value>sshfencevalue> property> <property> <name>dfs.ha.fencing.ssh.private-key-filesname> <value>/home/$USER/.ssh/id_rsavalue> property> <property> <name>dfs.ha.automatic-failover.enabledname> <value>truevalue> property> - Format the Active NameNode (run only once on
nn1):hdfs namenode -format - Start the NameNode on
nn1:hadoop-daemon.sh start namenode - Bootstrap the Standby NameNode (copy metadata from
nn1tonn2):hdfs namenode -bootstrapStandby - Start the NameNode on
nn2:
Verify both NameNodes are running withhadoop-daemon.sh start namenodehdfs haadmin -getServiceState nn1(should return “active”) andhdfs haadmin -getServiceState nn2(should return “standby”).
Step 3: Start HDFS Services
Start all HDFS components in the correct order:
start-dfs.sh # Starts JournalNodes, NameNodes, and DataNodes
Check cluster status with:
hdfs dfsadmin -report # Lists DataNodes and their health
hdfs haadmin -getAllServiceStates # Shows NameNode states (active/standby)
Access NameNode Web UIs (e.g., http://namenode1:9870, http://namenode2:9870) to confirm HA status.
Step 4: Test Automatic Failover
Simulate a failure to verify automatic failover works:
- Kill the Active NameNode Process:
Onnn1, find the NameNode PID (jps | grep NameNode) and kill it:kill -9 - Verify Standby Takes Over:
Onnn2, check its state:hdfs haadmin -getServiceState nn2 # Should return "active" - Restore the Original Active NameNode:
Restart the NameNode onnn1and verify it becomes standby:hadoop-daemon.sh start namenode hdfs haadmin -getServiceState nn1 # Should return "standby" - Check Data Availability:
Create a test file in HDFS and verify it persists after failover:hdfs dfs -put /local/file.txt /test/ hdfs dfs -get /test/file.txt /local/ # Should succeed after failover
Step 5: Monitor and Maintain
Set up monitoring to detect issues early:
- Metrics: Use Hadoop’s built-in metrics (via JMX) or tools like Prometheus + Grafana to track NameNode memory, DataNode disk usage, and replication status.
- Logs: Regularly check NameNode logs (
$HADOOP_HOME/logs/hadoop-*-namenode-*.log) for errors. - Backups: Backup critical data (e.g., NameNode metadata, ZooKeeper data) to an offsite location.
- Updates: Keep Hadoop and ZooKeeper versions up-to-date to patch security vulnerabilities.
以上就是关于“Debian如何实现HDFS高可用”的相关介绍,筋斗云是国内较早的云主机应用的服务商,拥有10余年行业经验,提供丰富的云服务器、租用服务器等相关产品服务。云服务器资源弹性伸缩,主机vCPU、内存性能强悍、超高I/O速度、故障秒级恢复;电子化备案,提交快速,专业团队7×24小时服务支持!
简单好用、高性价比云服务器租用链接:https://www.jindouyun.cn/product/cvm