Prerequisites for Hadoop HA on Debian
Before configuring Hadoop High Availability (HA), ensure the following prerequisites are met:
- Debian Nodes: At least 5 nodes (1 NameNode active/passive, 3 JournalNodes, 1 ResourceManager active/passive, and multiple DataNodes/NodeManagers).
- Java Environment: Install OpenJDK 8/11 (e.g.,
sudo apt install openjdk-11-jdk). - Hadoop Installation: Download and extract Hadoop (e.g., version 3.3.6) to a consistent directory (e.g.,
/opt/hadoop) on all nodes. - Hostname Configuration: Assign unique hostnames (e.g.,
namenode1,journalnode1) and update/etc/hostswith IP-hostname mappings (e.g.,192.168.1.10 namenode1). - SSH Key Setup: Generate SSH keys on all nodes (
ssh-keygen -t rsa) and distribute public keys (ssh-copy-id user@node-ip) to enable passwordless login.
1. Configure ZooKeeper Cluster (Critical for Coordination)
ZooKeeper ensures consistent failover by managing locks and leader election for NameNode/ResourceManager.
- Install ZooKeeper: On each ZooKeeper node (3+ recommended), install via
sudo apt install zookeeper zookeeperd. - Configure
zoo.cfg: Edit/etc/zookeeper/conf/zoo.cfgon all nodes to include server entries (replace1,2,3with node IDs and IPs):server.1=192.168.1.10:2888:3888 server.2=192.168.1.11:2888:3888 server.3=192.168.1.12:2888:3888 - Start ZooKeeper: Run
sudo systemctl start zookeeperon all nodes and verify status (sudo systemctl status zookeeper). - Initiate HA State in ZooKeeper: On the active NameNode, run
hdfs zkfc -formatZKto create a znode for HA coordination.
2. Configure HDFS High Availability (NameNode HA)
HDFS HA uses an Active/Passive NameNode pair with Quorum Journal Manager (QJM) for shared edits.
- Modify
core-site.xml: Add default file system and ZooKeeper quorum:<property> <name>fs.defaultFSname> <value>hdfs://myclustervalue> property> <property> <name>ha.zookeeper.quorumname> <value>192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181value> property> - Modify
hdfs-site.xml: Define NameNode roles, RPC addresses, shared edits, and failover settings:<property> <name>dfs.nameservicesname> <value>myclustervalue> property> <property> <name>dfs.ha.namenodes.myclustername> <value>nn1,nn2value> property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1name> <value>192.168.1.10:8020value> property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2name> <value>192.168.1.11:8020value> property> <property> <name>dfs.namenode.shared.edits.dirname> <value>qjournal://192.168.1.13:8485;192.168.1.14:8485;192.168.1.15:8485/myclustervalue> property> <property> <name>dfs.ha.automatic-failover.enabledname> <value>truevalue> property> <property> <name>dfs.client.failover.proxy.provider.myclustername> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue> property> - Start JournalNodes: On all JournalNode nodes, run
hadoop-daemon.sh start journalnodeto store edit logs. - Bootstrap Standby NameNode: On the standby NameNode (e.g.,
namenode2), runhdfs namenode -bootstrapStandbyto sync metadata from the active NameNode. - Format and Start NameNodes: Format the active NameNode (
hdfs namenode -format), then start HDFS (start-dfs.sh). Verify status withhdfs haadmin -getServiceState nn1(should return “active”) andhdfs haadmin -getServiceState nn2(should return “standby”).
3. Configure YARN High Availability (ResourceManager HA)
YARN HA enables failover for ResourceManager, which schedules resources for applications.
- Modify
yarn-site.xml: Enable ResourceManager HA and define roles:<property> <name>yarn.resourcemanager.ha.enabledname> <value>truevalue> property> <property> <name>yarn.resourcemanager.ha.rm-idsname> <value>rm1,rm2value> property> <property> <name>yarn.resourcemanager.hostname.rm1name> <value>192.168.1.10value> property> <property> <name>yarn.resourcemanager.hostname.rm2name> <value>192.168.1.11value> property> <property> <name>yarn.resourcemanager.zk-addressname> <value>192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181value> property> - Start ResourceManager: On both ResourceManager nodes, run
yarn-daemon.sh start resourcemanager. Verify status withyarn rmadmin -getServiceState rm1(active/passive).
4. Configure Data Redundancy and Backup
Ensure data availability via replication and snapshots:
- Set Replication Factor: In
hdfs-site.xml, configuredfs.replication(default is 3) to store multiple copies of data blocks across nodes. - Enable Snapshots: Add
dfs.namenode.snapshot.enabled=truetohdfs-site.xmlto create point-in-time snapshots of HDFS directories. - Regular Backups: Use
hdfs dfsadmin -saveNamespaceto take periodic namespace backups and copy critical data to external storage.
5. Set Up Monitoring and Alerting
Proactively monitor cluster health to detect failures early:
- Built-in Tools: Use Hadoop’s NameNode/ResourceManager web UI (e.g.,
http://namenode1:9870) to track metrics like node status, disk usage, and job progress. - Third-Party Tools: Integrate with Prometheus (for metrics collection) + Grafana (for visualization) or Ambari (for cluster management) to set alerts for thresholds (e.g., node down, high CPU).
6. Validate High Availability
Test failover to ensure automatic recovery:
- Simulate NameNode Failure: Stop the active NameNode (
hadoop-daemon.sh stop namenodeonnamenode1) and verify the standby becomes active (hdfs haadmin -getServiceState nn2should return “active”). - Simulate ResourceManager Failure: Stop the active ResourceManager (
yarn-daemon.sh stop resourcemanageronresourcemanager1) and check the standby takes over (yarn rmadmin -getServiceState rm2should return “active”). - Check Data Availability: Create a test file (
hdfs dfs -put /local/file /test) and access it after failover to confirm data integrity.
以上就是关于“Debian上Hadoop高可用性如何实现”的相关介绍,筋斗云是国内较早的云主机应用的服务商,拥有10余年行业经验,提供丰富的云服务器、租用服务器等相关产品服务。云服务器资源弹性伸缩,主机vCPU、内存性能强悍、超高I/O速度、故障秒级恢复;电子化备案,提交快速,专业团队7×24小时服务支持!
简单好用、高性价比云服务器租用链接:https://www.jindouyun.cn/product/cvm