AI News, Improving your data science workflow with Docker

Improving your data science workflow with Docker

Containerization is a trend that is taking the tech world by storm, but how can you, a data scientist, use it to improve your workflow?

This is not a new idea but it has recently gained widespread acceptance as tools mature and the barriers of entry into containerizing applications has been reduced.

For our purposes the biggest advantages of containerization are portability, lightweight and self-contained environments, and simplified versioning and dependency management.

It’s important to have a very basic understanding of how containerization works to avoid common mistakes, such as writing log files to your container layer.

I’ll mention repeatedly concerns of image size, this is something that is always good to keep in mind since Docker is great for deploying applications but it is not always applicable as we’ll see in the examples.

Again as we talk about size you might think that 275MB is not very large but this is just a base image, any requirements your application has will be added on top of that and a lot of data science libraries are very bloated.

Of course this needs to be weighed against your runtime, taking an extra 30 seconds to copy a 1GB image may not matter if your algorithm takes hours to run.

The next directive we’ll look at is the RUN command which allows us to run commands in our container at build time to get things setup the way we’d like.

Data stored on Docker volumes will be accessible from any containers running on that host which greatly increases your options when designing parallelization strategies.

Lastly, I’ll briefly introduce a concept of how Docker could be used to parallelize large data processing workloads to highlight to power of the tools we’ve learned about.

I have chosen not to use the Alpine version because although it is smaller it does not handle any libraries with libc dependencies very well and we need numpy, it’s not impossible but let’s keep it simple to start.

Then you can build the docker image with docker build -t api ., This will build an image from the Dockerfile in the current directory (What the .

Note that the -p 15000:5000 just tells Docker to map the exposed port 5000 on the container to our local machine’s port 15000.

When we want to update it with a new version we can replace our running containers with a new version of the image that we’ve built and be sure that all files and dependencies are handled properly.

Doing so in a container allows for even greater portability than scripting which could run into issues say with different commands on OSX vs linux flavors so let’s give it a try.

Let’s start by trying to define what we need for a development environment, I would argue that a decent environment would include the following: That sure does seem like a lot to get setup right?

The first section at the top says it gives you, jupyter notebook, conda python 3, pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh pre-installed.

You can go to your boss and say you need one c5.18xlarge (72 cores/144GB RAM) and you can do it in a day for $72/day, or you could dockerize your training to read data in from S3 (or save throughput with shared disk storage) and read the parameters from an environment variable.

However, now you can take advantage of your distributed environment and buy only a single node at on-demand pricing and use spot pricing for the rest of your fleet.

c5.xlarge go for about to 6¢/hr on the spot market ($1.44/day) so you could fill out your fleet for $28.56 ([0.17 + 17*0.06]*24) There is a lot to be learned with Docker and most of it is not necessary for a data scientist to understand.

If yesterday I asked you to spin up a full fledged jupyter notebook environment with all the basic scientific python packages in a single command you would need some time.

Here we’ve looked at just a couple of ways containers can make your life as a data scientist just a little bit easier and let you focus on getting the right model.

Docker: Easy as build, run, done!

The containerization service makes deploying microservices easy and stable, as each service can run an OS in its own virtual environment.

That means full compatibility… no more worrying about OS versions, dependencies and differences between your development and production machines!

The most significant difference is that a container engine is more lightweight because it can share certain hardware resources with its host OS, unlike traditional VMs that require their own separate allocation.

This setup requires 1GB for the host OS and perhaps 600MB per container (because 300MB is hypothetically shared with the host OS), for a total of 2.8GB required.

Docker Images are simply blueprints of environments that you want to create while containers are the actual running and functional environments that your app will be executed in.

We want to check for Ubuntu updates with RUN apt-get update -y and upgrades with RUN apt-get upgrade -y… pretty standard stuff for setting up your environment.

Also install curl RUN apt-get install curl -y and vim RUN apt-get install vim -y, both nice to have for general purposes.

Since we want to be using ES6 features, we will need the latest version of NodeJS attained via the node module n.

Then clean up npm to make way for n using RUN npm cache clean -f.

Last of all, we will explicitly expose port 8080 of our Docker image with EXPOSE 8080 so that the outside world can access our app.

This might take a while… at the end it should look like this: (Don’t worry about the non-critical errors such as the optional dependency skipped in the below screenshot).

-d -it is the command for daemon (running tasks in the background) and interactive terminal (giving us a way to interact with the container).

So now we take incoming connections on our machine’s port 80 (port 80 is the default for http) and redirect them to our container’s port 8080.

Finally, kangzeroo npm run ec2 refers to our image called kangzeroo and npm run ec2 is a command specific to this boilerplate app (for starting up the app).

If you were running a Python backend app, it would look like docker run -d -it --name=kz kangzeroo python app.py.

If you type docker ps you will not see the container anymore, but you will see it if you type docker ps -a ( -a means all, inclusive of running and not running container.

All things considered, Docker is a lot easier than setting up a hypervisor-based VM, and you can see how a microservice architecture becomes a lot easier to manage when you adopt containers.

With our Dockerfile, build.sh and run.sh files created in EC2, we can summarize the 3 steps to running Docker from our app root directory: That’s it!

Since this tutorial took a step-by-step approach to teaching Docker, I think it’s appropriate to leave you off with an overview of all the Docker commands you will need for general purpose use.

Build and run your first Docker Windows Server container

have followed your guide and all is installed after a few tweeks, the containers are created and i have used the latest release of all softwares, when i try to open the music store in a web browser i get the below error.

No type was specified for the decimal column 'Price'

This will cause values to be silently truncated if they do not fit in the default precision and scale.

Explicitly specify the SQL server column type that can accomadate all the values using 'ForSqlServerHasColumnType()'.

ConfigureWarnings can be used when overriding the DbContext.OnConfiguring method or using AddDbContext on the application service provider. web_1

No type was specified for the decimal column 'Total'

This will cause values to be silently truncated if they do not fit in the default precision and scale.

Explicitly specify the SQL server column type that can accomadate all the values using 'ForSqlServerHasColumnType()'.

ConfigureWarnings can be used when overriding the DbContext.OnConfiguring method or using AddDbContext on the application service provider. web_1

No type was specified for the decimal column 'UnitPrice'

This will cause values to be silently truncated if they do not fit in the default precision and scale.

TimeGenerated EntryType Message db_1

————- ———

at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.SocketOutput..ctor(IPipe pipe, KestrelThread thread, UvStreamHandle socket, Connection connection, String connectionId, IKestrelTrace log) web_1

Creating your first Dockerfile, image and container

This episodes shows you the step by step instructions and commands that you'll need to create your very first Dockerfile, build a ..

Docker Tutorial: Building Docker Images Using Dockerfile

Using pre-built Docker images can speed up your work greatly, but you'll often need to install software, add scripts and setup an image that is unique to your ...

Building Python apps with Docker

If you haven't heard of Docker yet, its a great tool that allows you wrap up your app and everything it needs to run: code, runtime, and even system libraries and ...

Using Docker images and containers on your machine

dW TV URL: This video shows you how to create Docker images and run them as Docker containers on ..

Docker Tutorial - How does Docker work

Learn more advanced front-end and full-stack development at: Docker is a software platform designed to make it easier to ..

Docker: Port and Volume Mapping

Understanding port and volume mapping is perhaps one of the most important thing in learning docker. Since docker runs inside of a container we need to ...

Browser Testing with Docker

Integration tests are an integral part of any modern web application, and regardless of which front-end or server side framework you choose, you'll likely be ...

Build containers faster with Jib, a Google image build tool for Java applications

Containers are at the heart of many complex distributed systems. Traditionally, Dockerfiles define container image builds from the system up. You start with a ...

Jenkins World 2017: Simplify your Jenkins Projects with Docker Multi-Stage Builds

When building Docker images we often use multiple build steps and Dockerfiles to keep the image size down. Using multi-stage Docker builds we can eliminate ...

Learn to use Docker in Development - Part 1

Learn how to use docker images & containers to create a multi-part Ruby on Rails development environment. For more information, check out the associated ...