Prerequisites
Before configuring HDFS on Debian, ensure the following prerequisites are met:
- A Debian system (physical/virtual) with root or sudo access.
- Java Development Kit (JDK) 8 or 11 installed (Hadoop 3.x requires JDK 8+). Verify with
java -version. - Network connectivity between all cluster nodes (static IPs recommended).
- Synchronized time across nodes (install
ntporchronyfor time synchronization). - SSH configured for passwordless login between nodes (generate SSH keys and copy public keys to all nodes using
ssh-keygenandssh-copy-id).
Step 1: Install Hadoop
Download the latest stable Hadoop release from the Apache Hadoop website. Extract it to a dedicated directory (e.g., /usr/local):
wget https://downloads.apache.org/hadoop/core/hadoop-3.3.6/hadoop-3.3.6.tar.gz
sudo tar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop # Rename for simplicity
Set ownership of the Hadoop directory to the current user (replace youruser with your username):
sudo chown -R youruser:youruser /usr/local/hadoop
Step 2: Configure Environment Variables
Edit the ~/.bashrc file (or /etc/profile for system-wide configuration) to add Hadoop environment variables:
nano ~/.bashrc
Add the following lines (adjust paths as needed):
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 # Update to your JDK path
Apply changes by running:
source ~/.bashrc
Verify Hadoop commands are accessible:
hadoop version
Step 3: Configure Core HDFS Files
Navigate to the Hadoop configuration directory ($HADOOP_HOME/etc/hadoop) and edit the following files:
a. core-site.xml
This file defines HDFS defaults (e.g., NameNode address). Add:
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://namenode:9000value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoop/tmpvalue>
property>
configuration>
b. hdfs-site.xml
This file configures HDFS-specific parameters (e.g., replication factor, data directories). Add:
<configuration>
<property>
<name>dfs.replicationname>
<value>3value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/data/hadoop/namenodevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/data/hadoop/datanodevalue>
property>
configuration>
c. mapred-site.xml
Create this file if it doesn’t exist (copy from mapred-site.xml.template):
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
Edit to use YARN as the MapReduce framework:
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
d. yarn-site.xml
Configure YARN (Yet Another Resource Negotiator) for resource management:
<configuration>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>resourcemanagervalue>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
configuration>
Step 4: Format NameNode
The NameNode stores metadata about the HDFS filesystem. Formatting initializes this metadata (only required on the first startup):
hdfs namenode -format
This command creates the directories specified in dfs.namenode.name.dir and writes initial metadata.
Step 5: Start HDFS Services
Start the HDFS services (NameNode and DataNode) using the start-dfs.sh script (run from the NameNode):
$HADOOP_HOME/sbin/start-dfs.sh
Check the status of running services:
jps # Should show NameNode, DataNode, and SecondaryNameNode (if configured)
Verify HDFS is running by accessing the Web UI (default: http://namenode:9870).
Step 6: Verify HDFS Configuration
Run basic HDFS commands to validate functionality:
- Create a test directory:
hdfs dfs -mkdir -p /user/youruser/test - Upload a local file to HDFS:
hdfs dfs -put /path/to/localfile.txt /user/youruser/test - List files in the test directory:
hdfs dfs -ls /user/youruser/test - View file content:
hdfs dfs -cat /user/youruser/test/localfile.txt
Optional: Configure High Availability (HA)
For production clusters, set up HDFS HA to avoid single points of failure. Key steps include:
- Configure NameNode HA: Modify
hdfs-site.xmlto define two NameNodes (e.g.,nn1,nn2), shared edits directory (via JournalNode), and failover proxy provider. - Set Up JournalNode: Start 3+ JournalNode processes (on separate nodes) to store edit logs.
- Bootstrap Standby NameNode: Use
hdfs namenode -bootstrapStandbyto sync metadata from the active NameNode. - Start HA Cluster: Use
start-dfs.shto start all NameNodes and DataNodes. - Verify Failover: Simulate a NameNode failure (stop the active NameNode) and confirm the standby takes over automatically.
以上就是关于“Debian HDFS配置步骤全解析”的相关介绍,筋斗云是国内较早的云主机应用的服务商,拥有10余年行业经验,提供丰富的云服务器、租用服务器等相关产品服务。云服务器资源弹性伸缩,主机vCPU、内存性能强悍、超高I/O速度、故障秒级恢复;电子化备案,提交快速,专业团队7×24小时服务支持!
简单好用、高性价比云服务器租用链接:https://www.jindouyun.cn/product/cvm