Docker in the last couple of years has been a game changer for many developers. Not only has it made environments agnostic for executing different programs and applications, it has also changed the way dependencies and configuration for these programs are maintained.
Containerization technology enables greater application portability, empowering faster software delivery cycles, with much more confidence, isolating and bundling most of the requirements a software needs to go from ground up.
It’s not rocket science to maintain Dockerfiles if you already have a sysadmin/IT engineering point of view, or even if you are a developer who has an eye for releases and maintenance. A Dockerfile resembles what a user would have to install and configure in order to execute a program on their local machine, with the goal of generating an image to be distributed and used among other users.
Although we will discuss the pro’s and con’s of containerization, on a personal level I believe Docker is really good, and does fit specific needs.
Dockerfile — The holy grail monolith approach
Once a Dockerfile has been designed to fulfill the requirements a program needs to be executed, an image is generated based on the Dockerfile, named and tagged with a version, and finally pushed to some container registry — throughout this article we will use Docker Hub as reference for the sake of simplicity.
An image name should resemble what application requirements are being fulfilled in order to run a certain program, which may or may not include the binary to be executed — which majority of the images do, e.g. MySQL Docker Hub image.
Until now, everything seems as trivial as doing code changes and pushing them for review to Github, but the problem is when having to work with variants of these images, when working with distributed systems that are complex, or when working with potentially hundreds of developers.
Once you have rolled out an image for developers to use, images start becoming scattered — not by image names — but by tags, to fulfill multiple requirements such as multiple arch support for processors e.g. ARM, x86–64, MIPS, etc. — or OS distributions e.g. Debian jessie, stretch, or buster.
But if you have common code between these images, such as proxy configurations, apt sources lists, PKI certificates, or even SSH keys, or dependencies which are common between all of the variants, Dockerfile developers tend to create one holy grail monolith Dockerfile which isolates those configurations/dependencies, and then make any variants extend from the “base” image, so developers won’t have to perform those configurations repeatedly.
Problems rise with this approach, but to highlight if you work with large teams, the source of the Dockerfile can become a mystery which is somewhere hidden in your SCM — so it becomes hard to maintain it in order to fulfill more needs — such as adding more dependencies or configuration changes — so you start to treat the symptom instead of the problem, by generating more images to fulfill one only specific need, which are extended from the monolith image, loosing track of any changes which are performed in this “base” image.
I have seen examples in which a Dockerfile is extended from three or four custom images (picture it as a chain of dependent images), which itself fulfills the dependencies of five different distributed systems, such as running different binaries in one container, to achieve one output.