Check the pre-requisite.
docker –version
docker-compose –version
Now, let’s make sure if Dockerized web server is working fine.
sudo docker run -d -p 80:80 –name myserver nginx
tip: -d is for detached learn more here
If this is your first time running this command. Docker will pull image from the Docker Hub. After everything is finished. You can visit localhost to view the homepage of your new server.
We need Hadoop Docker image for the installation of hadoop. For this, we will use Big Data Europe repository. We will use git to clone the repository or you can simply download from here.
git clone https://github.com/big-data-europe/docker-hadoop.git
Let’s deploy hadoop cluster using following command. You need to go inside the cloned repository where docker-compose.yml is present.
tip : cd docker-hadoop
docker-compose up -d
This will setup multiple containers. After the process is completed you use following command to check currently running containers.
docker ps
If everything worked well. You should be able to visit http://localhost:9870 . This will show the current status of your namenode.
Let’s test our Hadoop cluster with classic WordCount program.
Download classic WordCount jar.
We need to copy downloaded jar file to our namenode. For this we need the container id in which namenode is running. Use the following command to list all the running containers.
sudo docker container ls
Notice: my container id of namenode is 280ccc05491b. This will be different in your case and you need to use your container id.
Now we will copy downloaded jar file to namenode using following command.
sudo docker cp ../hadoop-mapreduce-examples-2.7.1-sources.jar 280ccc05491b:hadoop-mapreduce-examples-2.7.1-sources.jar
tip: sudo docker cp <path of your downloaded file> <docker container Id : target file name>
Then we enter into namenode container
sudo docker exec -it namenode bash
This will create a new Bash session in the container namenode.
Let’s create sample input files
mkdir input
echo “Hello World” > input/f1.txt
Create input directory on HDFS
hadoop fs -mkdir -p input
copy input file to hadoop file system
hdfs dfs -put ./input/* input
Execute WordCount
hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output
Once the execution is completed you can view the output using
hdfs dfs -cat output/part-r-00000
Voila, you have successfully setup your Hadoop cluster using Docker!.
Exit from name node using exit command.
exit