Docker for Data Scientist’s
Why Docker
As a Data Scientist, at average 3 hours a week I waste my time on fixing the model deployment in production systems ,system configuration issues, memory, library dependencies, not compatible packages, package not found and many more issues due to the diffrences in hardware/software/security policies.
There comes the Docker!!
Docker is a way to isolate a process from the system on which it is running. It allows us to isolate the code written to define an application and the resources required to run that application from the hardware on which it runs using containerization
Docker Is Not a Virtual Machine
A Virtual Machine is made up of user space plus kernel space of an operating system. Under VMs, server hardware is virtualized. Each VM has Operating system (OS) & apps. It shares hardware resource from the host.
Docker is container based technology and containers are just user space of the operating system. At the low level, a container is just a set of processes that are isolated from the rest of the system, running from a distinct image that provides all files necessary to support the processes. It is built for running applications. In Docker, the containers running share the host OS kernel.
In VM environment, each workload needs a complete OS. But with a container environment, multiple workloads can run with 1 OS.
Getting started
To check that everything is set-up, run the following:
foo@bar:~$ docker run hello-world
Pulling an Image
Lets start running our first container. Here we will run an ubuntu container, and run few docker commands.
foo@bar:~$ docker pull ubuntu
pull command fetches the latest ubuntu image from the Dockerhub a repository for hosting built dockers. To see which images are downloaded to your machine, run the following:
foo@bar:~$ docker images
###Running a Container To start a container run the following command:
foo@bar:~$ docker run ubuntu echo "hello!"
What just happened?
- When you call run, the Docker client calls the Docker daemon
- The Docker daemon checks locally to see if the image is available, if it is not, it downloads it from Dockerhub
- If the image is present, the daemon creates the container and runs the command you specified in the containter
- The output of the command is streamed to the client and you observe it
in order to not to exit the docker container we need to run it in interactive mode by using “-it”
foo@bar:~$ docker run -it ubuntu
Inorder to cleanup all your images and containers, you can use:
foo@bar:~$ docker rm $(docker ps -a -q)
To remove images
foo@bar:~$ docker rmi $(docker images -a -q)
Building our own docker image
A dockerfile is a script which contains a collection of dockerfile commands and operating system commands (ex: Linux commands). Before we create our first dockerfile, you should become familiar with the dockerfile command.
Below are some dockerfile commands you must know: FROM The base image for building a new image. This command must be on top of the dockerfile.
MAINTAINER Optional, it contains the name of the maintainer of the image.
RUN Used to execute a command during the build process of the docker image.
ADD Copy a file from the host machine to the new docker image. There is an option to use an URL for the file, docker will then download that file to the destination directory.
ENV Define an environment variable.
CMD Used for executing commands when we build a new container from the docker image.