Prerequisites
Before installing HDFS on Debian, ensure your system is up-to-date and install essential tools:
sudo apt update && sudo apt upgrade -y
sudo apt install wget ssh vim -y
These commands update package lists, upgrade installed packages, and install wget (for downloading Hadoop), ssh (for remote access), and vim (for configuration editing).
1. Install Java Environment
Hadoop requires Java 8 or higher. Install OpenJDK 11 (recommended for compatibility):
sudo apt install openjdk-11-jdk -y
Verify the installation:
java -version
You should see output indicating OpenJDK 11 is installed.
2. Create a Dedicated Hadoop User
For security and isolation, create a non-root user (e.g., hadoop) and add it to the sudo group:
sudo adduser hadoop
sudo usermod -aG sudo hadoop
Switch to the new user:
su - hadoop
This user will manage all Hadoop operations.
3. Download and Extract Hadoop
Download the latest stable Hadoop release (e.g., 3.3.6) from the Apache website:
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
Extract the archive to /usr/local/ and rename the directory for simplicity:
sudo tar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop
Change ownership of the Hadoop directory to the hadoop user:
sudo chown -R hadoop:hadoop /usr/local/hadoop
4. Configure Environment Variables
Set up Hadoop-specific environment variables in /etc/profile (system-wide) or ~/.bashrc (user-specific). Open the file with vim:
vim ~/.bashrc
Add the following lines at the end:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 # Adjust if using a different Java version
Load the changes into the current session:
source ~/.bashrc
Verify the variables are set:
echo $HADOOP_HOME # Should output /usr/local/hadoop
5. Configure SSH Passwordless Login
Hadoop requires passwordless SSH between the NameNode and DataNodes. Generate an SSH key pair:
ssh-keygen -t rsa -b 4096 -C "hadoop@debian"
Press Enter to accept default file locations and skip passphrase entry. Copy the public key to the local machine (for single-node clusters) or other cluster nodes:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
Test passwordless login:
ssh localhost
You should log in without entering a password.
6. Configure Hadoop Core Files
Navigate to the Hadoop configuration directory:
cd $HADOOP_HOME/etc/hadoop
Edit the following files to define HDFS behavior:
-
core-site.xml: Sets the default file system (HDFS) and NameNode address.<configuration> <property> <name>fs.defaultFSname> <value>hdfs://namenode:9000value> property> configuration> -
hdfs-site.xml: Configures replication factor (for fault tolerance) and data directories.<configuration> <property> <name>dfs.replicationname> <value>1value> property> <property> <name>dfs.namenode.name.dirname> <value>/opt/hadoop/hdfs/namenodevalue> property> <property> <name>dfs.datanode.data.dirname> <value>/opt/hadoop/hdfs/datanodevalue> property> configuration> -
mapred-site.xml: Specifies the MapReduce framework (YARN).<configuration> <property> <name>mapreduce.framework.namename> <value>yarnvalue> property> configuration> -
yarn-site.xml: Configures YARN resource management.<configuration> <property> <name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname> <value>org.apache.hadoop.mapred.ShuffleHandlervalue> property> configuration>
7. Create HDFS Data Directories
Create the directories specified in hdfs-site.xml for NameNode and DataNode storage:
sudo mkdir -p /opt/hadoop/hdfs/namenode
sudo mkdir -p /opt/hadoop/hdfs/datanode
sudo chown -R hadoop:hadoop /opt/hadoop # Change ownership to the hadoop user
8. Format the NameNode
The NameNode must be formatted once before starting HDFS. Run this command carefully (it will erase existing HDFS data):
hdfs namenode -format
You should see output indicating successful formatting.
9. Start HDFS Services
Start the HDFS daemons (NameNode and DataNode) using the start-dfs.sh script:
$HADOOP_HOME/sbin/start-dfs.sh
Check the status of HDFS processes with jps:
jps
You should see NameNode and DataNode running (along with other Java processes).
10. Verify HDFS Installation
Use HDFS commands to confirm the cluster is operational:
- List the root directory:
hdfs dfs -ls / - Create a test directory:
hdfs dfs -mkdir -p /user/hadoop/input - Upload a local file to HDFS:
echo "Hello, HDFS!" > test.txt hdfs dfs -put test.txt /user/hadoop/input/ - Read the file from HDFS:
hdfs dfs -cat /user/hadoop/input/test.txt
You should see the output Hello, HDFS!
Troubleshooting Tips
- Port Conflicts: Ensure ports like 9000 (NameNode) and 50070 (Web UI) are not blocked by your firewall.
- Java Issues: Verify
JAVA_HOMEis correctly set in$HADOOP_HOME/etc/hadoop/hadoop-env.sh. - Permission Errors: Use
chownto ensure thehadoopuser owns all Hadoop-related directories.
以上就是关于“Debian如何安装HDFS”的相关介绍,筋斗云是国内较早的云主机应用的服务商,拥有10余年行业经验,提供丰富的云服务器、租用服务器等相关产品服务。云服务器资源弹性伸缩,主机vCPU、内存性能强悍、超高I/O速度、故障秒级恢复;电子化备案,提交快速,专业团队7×24小时服务支持!
简单好用、高性价比云服务器租用链接:https://www.jindouyun.cn/product/cvm