Install/run Spark docker on Ubuntu

Install/run Spark docker on Ubuntu

Instalation

sudo aptitude install docker.io

You then have to add your user to the docker group. If you don’t do it, you’ll have to type a lot of command as sudo.

sudo usermod -a -G docker [username]

Restart your machine and you are good to go.

Docker cheat sheet

To see which containers are running :

docker ps -a

To restart a docker which has an exited status:

docker start [id du docker]

To open a bash on a docker:

docker exec -it [docker's id] bash

Install Spark

First, you need to get an image of Spark

docker pull sequenceiq/spark:1.6.0

Then type

docker run -d -v $HOME/fac/SID/spark/data:/home -p 8080:8080 --name master1 -h master sequenceiq/spark:1.6.0 '/usr/local/spark/sbin/start-master.sh ; tail -f /usr/local/spark/logs/*'

Some explanations of the parameters are required:

-d: start the container in the background (“detached”)

- v [ubuntu's folder]: [docker’s folder]: mount the machine’s folder in the docker’s folder

-p [ubuntu's port]:[docker port]: bind the two ports

-- name: docker’s name

-h: host’s name

To start slaves on the Ubuntu’s machine, use:

docker run -d -v $HOME/fac/SID/spark/data:/home -e SPARK_MASTER_IP=master2 -e SPARK_WORKER_CORES=1 -e SPARK_WORKER_MEMORY=512M -e SPARK_WORKER_INSTANCES=4 --link master1:master2 sequenceiq/spark:1.6.0 '/usr/local/spark/sbin/start-slave.sh spark://master:7077 ; tail -f /usr/local/spark/logs/*'

Here as well, some explanations of the parameters are required:

-e SPARK_MASTER_IP: name of the master’s container

-e SPARK_WORKER_CORES=1: number of core per slave

-e SPARK_WORKER_MEMORY=1g: memory allowed per worker

-e SPARK_WORKER_INSTANCES=4: number of slaves

-link master1:master2: make sure that the slaves know their master

 


Leave a Reply

Your email address will not be published. Required fields are marked *