Debian如何实现HDFS高可用_运维文库_资讯中心

发布时间:2026-04-27 16:26:42

阅读量:0

Prerequisites for HDFS High Availability (HA) on Debian
Before configuring HDFS HA, ensure the following prerequisites are met:

Debian Nodes: At least 5 nodes (2 NameNodes, 3 JournalNodes, and multiple DataNodes) with identical Debian versions (e.g., Debian 11/12).
Java Environment: Install OpenJDK 11 or 17 on all nodes (sudo apt install openjdk-11-jdk).
Hadoop Installation: Download and extract the same Hadoop version (e.g., 3.3.6) on all nodes. Configure basic environment variables in ~/.bashrc (e.g., export HADOOP_HOME=/usr/local/hadoop, export PATH=$PATH:$HADOOP_HOME/bin).
Hostname & Hosts File: Set unique hostnames (e.g., namenode1, namenode2, journalnode1) and update /etc/hosts with IP-hostname mappings for all nodes.
SSH Configuration: Enable passwordless SSH between all nodes (generate keys with ssh-keygen -t rsa and copy to other nodes using ssh-copy-id).
ZooKeeper Cluster: Deploy a 3-node ZooKeeper ensemble (critical for HA coordination; follow standard ZooKeeper setup steps on separate nodes).

Step 1: Configure JournalNode Nodes
JournalNodes store edit logs (transaction records for HDFS metadata) and ensure consistency between Active and Standby NameNodes.

Create a dedicated directory for JournalNode data:

sudo mkdir -p /usr/local/hadoop/journalnode/data
sudo chown -R $USER:$USER /usr/local/hadoop/journalnode

Add JournalNode configuration to $HADOOP_HOME/etc/hadoop/hdfs-site.xml on all nodes:

<property>
    <name>dfs.journalnode.edits.dirname>
    <value>/usr/local/hadoop/journalnode/datavalue>
property>

Format JournalNodes (run once on each JournalNode):
```
hdfs namenode -formatJournalNode
```
Start JournalNode service on all JournalNodes:
```
hadoop-daemon.sh start journalnode
```
Verify status with jps (should show JournalNode processes).

Step 2: Configure NameNode High Availability
This step enables two NameNodes (Active/Standby) to share metadata via JournalNodes.

Edit $HADOOP_HOME/etc/hadoop/core-site.xml to define the HDFS namespace and ZooKeeper address:

<property>
    <name>fs.defaultFSname>
    <value>hdfs://myclustervalue> 
property>
<property>
    <name>ha.zookeeper.quorumname>
    <value>zk1:2181,zk2:2181,zk3:2181value> 
property>

Edit $HADOOP_HOME/etc/hadoop/hdfs-site.xml to configure NameNode roles, RPC/HTTP addresses, shared edits, and failover:

<property>
    <name>dfs.nameservicesname>
    <value>myclustervalue> 
property>
<property>
    <name>dfs.ha.namenodes.myclustername>
    <value>nn1,nn2value> 
property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1name>
    <value>namenode1:8020value> 
property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2name>
    <value>namenode2:8020value> 
property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn1name>
    <value>namenode1:9870value> 
property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn2name>
    <value>namenode2:9870value> 
property>
<property>
    <name>dfs.namenode.shared.edits.dirname>
    <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/myclustervalue> 
property>
<property>
    <name>dfs.client.failover.proxy.provider.myclustername>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue> 
property>
<property>
    <name>dfs.ha.fencing.methodsname>
    <value>sshfencevalue> 
property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-filesname>
    <value>/home/$USER/.ssh/id_rsavalue> 
property>
<property>
    <name>dfs.ha.automatic-failover.enabledname>
    <value>truevalue> 
property>

Format the Active NameNode (run only once on nn1):
```
hdfs namenode -format
```
Start the NameNode on nn1:
```
hadoop-daemon.sh start namenode
```
Bootstrap the Standby NameNode (copy metadata from nn1 to nn2):
```
hdfs namenode -bootstrapStandby
```
Start the NameNode on nn2:
```
hadoop-daemon.sh start namenode
```
Verify both NameNodes are running with hdfs haadmin -getServiceState nn1 (should return “active”) and hdfs haadmin -getServiceState nn2 (should return “standby”).

Step 3: Start HDFS Services
Start all HDFS components in the correct order:

start-dfs.sh  # Starts JournalNodes, NameNodes, and DataNodes

Check cluster status with:

hdfs dfsadmin -report  # Lists DataNodes and their health
hdfs haadmin -getAllServiceStates  # Shows NameNode states (active/standby)

Access NameNode Web UIs (e.g., http://namenode1:9870, http://namenode2:9870) to confirm HA status.

Step 4: Test Automatic Failover
Simulate a failure to verify automatic failover works:

Kill the Active NameNode Process:
On nn1, find the NameNode PID (jps | grep NameNode) and kill it:
```
kill -9 
```

Verify Standby Takes Over:
On nn2, check its state:

hdfs haadmin -getServiceState nn2  # Should return "active"

Restore the Original Active NameNode:
Restart the NameNode on nn1 and verify it becomes standby:

hadoop-daemon.sh start namenode
hdfs haadmin -getServiceState nn1  # Should return "standby"

Check Data Availability:
Create a test file in HDFS and verify it persists after failover:

hdfs dfs -put /local/file.txt /test/
hdfs dfs -get /test/file.txt /local/  # Should succeed after failover

Step 5: Monitor and Maintain
Set up monitoring to detect issues early:

Metrics: Use Hadoop’s built-in metrics (via JMX) or tools like Prometheus + Grafana to track NameNode memory, DataNode disk usage, and replication status.
Logs: Regularly check NameNode logs ($HADOOP_HOME/logs/hadoop-*-namenode-*.log) for errors.
Backups: Backup critical data (e.g., NameNode metadata, ZooKeeper data) to an offsite location.
Updates: Keep Hadoop and ZooKeeper versions up-to-date to patch security vulnerabilities.

以上就是关于“Debian如何实现HDFS高可用”的相关介绍，筋斗云是国内较早的云主机应用的服务商，拥有10余年行业经验，提供丰富的云服务器、租用服务器等相关产品服务。云服务器资源弹性伸缩，主机vCPU、内存性能强悍、超高I/O速度、故障秒级恢复；电子化备案，提交快速，专业团队7×24小时服务支持！

简单好用、高性价比云服务器租用链接：https://www.jindouyun.cn/product/cvm