If you’re looking for simple and painless Hadoop deployment, Docker is the right tool for you. deployment. We mostly use Docker community edition-CE (https://docs.docker.com/docker-for-windows/install/) on Microsoft Windows, under system requirement it clearly says “Hyper-V and Containers Windows features must be enabled.” to run Docker on Windows. In case you are using Docker Engine – Enterprise(EE) you might not require the Hyper-V. I think we developers just happy with docker CE.
Now the issue with Hyper-V is it reserves some ports that are required by Hadoop for inter-process communication. So as of now, you got my point that Hadoop uses certain ports such as 50070, those are required to communicate with data node and expose URI for hdfs but these ports are reserved by Hyper-V.
A good excuse Hyper-V reserve some ports is to switch communication between Linux and Windows systems. To view reserve port run below command on your command prompt.
netsh interface ipv4 show excludedportrange protocol=tcp 49848 49947 50000 50059 50160 50259 50260 50359 50360 50459 50460 50559 50780 50879 55609 55708
Now as per the documentation from Hadoop below is the snippet to showcase the default ports used by Hadoop, more you can find at:-
Daemon Default Port Configuration Parameter ----------------------- ------------ ---------------------------------- Namenode 50070 dfs.http.address Datanodes 50075 dfs.datanode.http.address Secondarynamenode 50090 dfs.secondary.http.address Backup/Checkpoint node? 50105 dfs.backup.http.address Jobracker 50030 mapred.job.tracker.http.address Tasktrackers 50060 mapred.task.tracker.http.address
So from above two it clearly understood that port conflict would happen if we run Hadoop container on windows docker. If you are lucky Hyper-V excluded above ports for your and you might not see the issue. I had this issue and get resolved by workaround used below.
We have two options to get rid of this issue, one I can change default ports of Hadoop in configuration files those are in conflict with Hyper-V or I can just go ahead modify Hyper-V exclusion list. So I thought later workaround would save some time and it has worked for me.
The steps are:
- Disable hyper-v (which will require a couple of restarts)
dism.exe /Online /Disable-Feature:Microsoft-Hyper-V
- When you finish all the required restarts, reserve the port you want so hyper-v doesn’t reserve it back
netsh int ipv4 add excludedportrange protocol=tcp startport=50070 numberofports=1 netsh int ipv4 add excludedportrange protocol=tcp startport=50075 numberofports=1 netsh int ipv4 add excludedportrange protocol=tcp startport=50090 numberofports=1
- Re-Enable Hyper-V (which will require a couple of restart)
dism.exe /Online /Enable-Feature:Microsoft-Hyper-V /All
When your system is back, you will be able to bind to that port successfully and the new exclusion list looks like below in my system:-
PS C:\Users\mukesh.kumar> netsh interface ipv4 show excludedportrange protocol=tcp Protocol tcp Port Exclusion Ranges Start Port End Port ---------- -------- 49694 49793 49794 49893 50010 50010 * 50020 50020 * 50030 50030 * 50060 50060 * 50070 50070 * 50075 50075 * 50090 50090 * 50100 50100 * 50105 50105 * 50106 50205 50230 50329 50624 50723 61059 61158 * - Administered port exclusions.
Now, After that, I was able to run Hadoop 4 node cluster without any issue.
If you want Hadoop cluster docker installation steps, visit my github repo:-
You can download the Hadoop docker image published at the docker hub for direct download at mukeshkumarmavenwavecom/hadoop_cluster.
I am an Architect and I am proud to build things consciousness that evolved beautifully, would say technology built by the man.