Hadoop 安装（Windows）

Hello everyone, today we are going to to install Hadoop 3.3.0 on Windows 11.

Prerequisites

Java 8 runtime environment (JRE)
Apache Hadoop 3.3.0

Step 1 - Download Hadoop binary package

The first step is to download Hadoop binaries from the official website. The binary package size is about 478M MB.

https://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

curl -O https://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Step 2 - Unpack the package

After finishing the file download, we should unpack the package using 7zip or command line.

D:
cd D:\bigdata

we create the the directory

mkdir D:\bigdata\Hadoop

then run the following command to unzip:

tar -xvzf  hadoop-3.3.0.tar.gz -C D:\bigdata\Hadoop

The command will take quite a few minutes as there are numerous files included and the latest version introduced many new features.

After the unzip command is completed we have to install the Java.

Step 2 -Java installation

Java is required to run Hadoop. If you have not installed Java , please install it.

You can install Java 8 from the following page here. I am choosing the Java SE Runtime Environment and I choose the Windows X64 version by https://javadl.oracle.com/webapps/download/AutoDL?BundleId=247948_0ae14417abb444ebb02b9815e2103550

After finishing the file download we open a new command prompt, we should unpack the package

cd D:\tool

we create the the directory

mkdir D:\tool\Java

then run the following command to unzip:

tar -xvzf  jre-8u361-windows-x64.tar.gz -C D:\tool\Java\

Step 3 - Install Hadoop native IO binary

Hadoop on Linux includes optional Native IO support. However Native IO is mandatory on Windows and without it you will not be able to get your installation working. The Windows native IO libraries are not included as part of Apache Hadoop release. Thus we need to build and install it.

infoThe following repository already pre-built Hadoop Windows native libraries:

https://github.com/ruslanmv/How-to-install-Hadoop-on-Windows/tree/master/winutils/hadoop-3.3.0-YARN-8246/bin

warning These libraries are not signed and there is no guarantee that it is 100% safe. We use it purely for test&learn purpose.

Download all the files in the following location and save them to the bin folder under Hadoop folder.

You can use Git by typing in your terminal

cd D:\bigdata
git clone https://github.com/ruslanmv/How-to-install-Hadoop-on-Windows.git

and then copy

cd How-to-install-Hadoop-on-Windows\winutils\hadoop-3.3.0-YARN-8246\bin
copy *.*  D:\bigdata\Hadoop\hadoop-3.3.0\bin

Step 5 - Configure environment variables

Now we've downloaded and unpacked all the artefacts we need to configure two important environment variables.

First you click the windows button and type environment

JAVA_HOME=D:\tool\Java\jre1.8.0_361
HADOOP_HOME=D:\bigdata\Hadoop\hadoop-3.3.0
PATH=%PATH%;%JAVA_HOME%\bin;%HADOOP_HOME%\bin

Verification of Installation

Once you complete the installation, Close your terminal window and open a new one and please run the following command to verify:

java -version

you will have

java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.361-b09, mixed mode)

You should also be able to run the following command:

hadoop -version

java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.361-b09, mixed mode)

Please verify that both versions of hadoop java and java coincide.

Finally directly to verify that our above steps are completed successfully:

winutils.exe

Step 6 - Configure Hadoop

Now we are ready to configure the most important part - Hadoop configurations which involves Core, YARN, MapReduce, HDFS configurations.

Configure core site

Edit file core-site.xml in %HADOOP_HOME%\etc\hadoop folder.

For my environment, the actual path is D:\bigdata\Hadoop\hadoop-3.3.0\etc\hadoop

Replace configuration element with the following:

<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://0.0.0.0:19000</value>
   </property>
</configuration>

Configure HDFS

Edit file hdfs-site.xml in %HADOOP_HOME%\etc\hadoop folder.

Before editing, please correct two folders in your system: one for namenode directory and another for data directory. For my system, I created the following two sub folders:

mkdir D:\bigdata\Hadoop\hadoop-3.3.0\data\datanode

mkdir D:\bigdata\Hadoop\hadoop-3.3.0\data\namenode

Replace configuration element with the following (remember to replace the highlighted paths accordingly):

<configuration>
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file://D:/bigdata/Hadoop/hadoop-3.3.0/data/namenode</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file://D:/bigdata/Hadoop/hadoop-3.3.0/data/datanode</value>
   </property>
</configuration>

Configure MapReduce and YARN site

Edit file mapred-site.xml in %HADOOP_HOME%\etc\hadoop folder.

Replace configuration element with the following:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property> 
        <name>mapreduce.application.classpath</name>
        <value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
    </property>
</configuration>

Edit file yarn-site.xml in %HADOOP_HOME%\etc\hadoop folder.

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

Step 7 - Initialise HDFS

Run the following command in Command Prompt

hdfs namenode -format

The following is an example when it is formatted successfully:

Step 8 - Start HDFS daemons

Run the following command to start HDFS daemons in Command Prompt:

%HADOOP_HOME%\sbin\start-dfs.cmd

Verify HDFS web portal UI through this link: http://localhost:9870/dfshealth.html#tab-overview.

You can also navigate to a data node UI:

Step 9 - Start YARN daemons

warning You may encounter permission issues if you start YARN daemons using normal user. To ensure you don't encounter any issues. Please open a Command Prompt window using Run as administrator.

Run the following command in an elevated Command Prompt window (Run as administrator) to start YARN daemons:

modify the following line in start-yarn.cmd file:

start "Apache Hadoop Distribution" %HADOOP_BIN_PATH%\..\bin\yarn resourcemanager
start "Apache Hadoop Distribution" %HADOOP_BIN_PATH%\..\bin\yarn nodemanager

notice: if you are installing nodejs, you may encounter permission issues. To resolve this, you can modify the following line to your start-yarn.cmd file:

start yarn.cmd

%HADOOP_HOME%\sbin\start-yarn.cmd

Similarly two Command Prompt windows will open and YARN daemons will start.

You can verify YARN resource manager UI when all services are started successfully.

http://localhost:8088

脚本 script

curl -O https://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
tar -xvzf  hadoop-3.3.0.tar.gz -C D:\bigdata\Hadoop

cd D:\tool 
mkdir D:\tool\Java
tar -xvzf  jre-8u361-windows-x64.tar.gz -C D:\tool\Java\

cd D:\bigdata\Hadoop\hadoop-3.3.0\bin
copy *.*  D:\bigdata\Hadoop\hadoop-3.3.0\bin

setx JAVA_HOME "D:\tool\Java\jre1.8.0_361"
setx HADOOP_HOME "D:\bigdata\Hadoop\hadoop-3.3.0"
setx PATH "%PATH%;%JAVA_HOME%\bin;%HADOOP_HOME%\bin"

mkdir D:\bigdata\Hadoop\hadoop-3.3.0\data\datanode
mkdir D:\bigdata\Hadoop\hadoop-3.3.0\data\namenode

hdfs namenode -format

cd D:\bigdata\Hadoop\hadoop-3.3.0\etc\hadoop
notepad core-site.xml

---
<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://0.0.0.0:19000</value>
   </property>
</configuration>
---

notepad hdfs-site.xml

---
<configuration>
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file://D:/bigdata/Hadoop/hadoop-3.3.0/data/namenode</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file://D:/bigdata/Hadoop/hadoop-3.3.0/data/datanode</value>
   </property>
</configuration>
---

notepad mapred-site.xml

---
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property> 
        <name>mapreduce.application.classpath</name>
        <value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
    </property>
</configuration>
---

notepad yarn-site.xml

---
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>
---


%HADOOP_HOME%\sbin\start-dfs.cmd

%HADOOP_HOME%\sbin\start-yarn.cmd

UI will look:

http://localhost:9870
http://localhost:8088