How to setup Hadoop Cluster using docker compose?

Check the pre-requisite.

docker –version

docker-compose –version

Now, let’s make sure if Dockerized web server is working fine.

sudo docker run -d -p 80:80 –name myserver nginx

tip: -d is for detached learn more here

docker nginx

If this is your first time running this command. Docker will pull image from the Docker Hub. After everything is finished. You can visit localhost to view the homepage of your new server.

nginx server homepage

We need Hadoop Docker image for the installation of hadoop. For this, we will use Big Data Europe repository. We will use git to clone the repository or you can simply download from here.

git clone https://github.com/big-data-europe/docker-hadoop.git

Let’s deploy hadoop cluster using following command. You need to go inside the cloned repository where docker-compose.yml is present.

tip : cd docker-hadoop

docker-compose up -d

This will setup multiple containers. After the process is completed you use following command to check currently running containers.

docker ps

docker ps

If everything worked well. You should be able to visit http://localhost:9870 . This will show the current status of your namenode.

hadoop

Let’s test our Hadoop cluster with classic WordCount program.

Download classic WordCount jar.

We need to copy downloaded jar file to our namenode. For this we need the container id in which namenode is running. Use the following command to list all the running containers.

sudo docker container ls

docker container ls

Notice: my container id of namenode is 280ccc05491b. This will be different in your case and you need to use your container id.

Now we will copy downloaded jar file to namenode using following command.

sudo docker cp ../hadoop-mapreduce-examples-2.7.1-sources.jar 280ccc05491b:hadoop-mapreduce-examples-2.7.1-sources.jar

tip: sudo docker cp <path of your downloaded file> <docker container Id : target file name>

Then we enter into namenode container

sudo docker exec -it namenode bash

This will create a new Bash session in the container namenode.

Let’s create sample input files

mkdir input

echo “Hello World” > input/f1.txt

Create input directory on HDFS

hadoop fs -mkdir -p input

copy input file to hadoop file system

hdfs dfs -put ./input/* input

Execute WordCount

hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output

hadoop map reduce

Once the execution is completed you can view the output using

hdfs dfs -cat output/part-r-00000

wordcount output

Voila, you have successfully setup your Hadoop cluster using Docker!.

Exit from name node using exit command.

exit

Leave a Reply Cancel reply