From Docker in Action, Second Edition by Jeff Nickoloff and Stephen Kuenzli
This article gives you a rundown of what Docker in and what problems it solves.
What problems does Docker solve?
Using software is complex. Before installation you must consider what operating system you’re using, the resources the software requires, what other software is already installed, and what other software it depends on. You need to decide where it should be installed. Then you need to know how to install it. It’s surprising how drastically installation processes vary today. The list of considerations is long and unforgiving. Installing software is, at best, inconsistent and overcomplicated. The problem is only worsened if you want to make sure that several machines use a consistent set of software over time.
Package managers like apt, brew, yum, npm, etc. attempt to manage this but few of those provide any degree of isolation. Most computers have more than one application installed and running. And most applications have dependencies on other software. What happens when applications you want to use don’t play well together? Disaster. Things are only made more complicated when applications share dependencies:
- What happens if one application needs an upgraded dependency but the other doesn’t?
- What happens when you remove an application? Is it gone?
- Can you remove old dependencies?
- Can you remember all the changes you had to make to install the software you now want to remove?
The truth is that the more software you use, the more difficult it is to manage. Even if you can spend the time and energy required to figure out installing and running applications, how confident can you be about your security? Open and closed source programs release security updates continually, and being aware of all of the issues is often impossible. The more software you run, the greater the risk that it’s vulnerable to attack.
Even enterprise grade service software must be deployed with dependencies. It’s common for those projects to be shipped with and deployed to machines with hundreds, if not thousands, of files and other programs. Each of those create a new opportunity for conflict, vulnerability, or licensing liability.
All of these issues can be solved with careful accounting, management of resources, and logistics, but those are mundane and unpleasant things to deal with. Your time’ is better spent using the software that you’re trying to install, upgrade, or publish. The people who built Docker recognized that, and thanks to their hard work you can breeze through the solutions with minimal effort in almost no time at all.
It’s possible that most of these issues seem acceptable today. Maybe they feel trivial because you’re used to them. After reading how Docker makes these issues approachable, you may notice a shift in your opinion.
Without Docker, a computer can end up looking like a junk drawer. Applications have many types of dependencies. Some applications depend on specific system libraries for common things like sound, networking, or graphics. Others depend on standard libraries for the language they’re written in. Some depend on other applications, such as how a Java program depends on the Java Virtual Machine or a web application might depend on a database. It’s common for a running program to require exclusive access to some scarce resource such as a network connection or a file.
Today, without Docker, applications are spread all over the file system and end up creating a messy web of interactions. Figure 1 illustrates how example applications depend on example libraries without Docker.
Figure 1 Dependency relationships of example programs
Figure 2 Example programs running inside containers with copies of their dependencies
In contrast, the required software can be installed automatically, and that same software can be reliably removed with a single command. Docker keeps things organized by isolating everything with containers and images. Figure 2 illustrates these same applications and their dependencies running inside containers. With the links broken and each application neatly contained, understanding the system’s an approachable task. At first it seems like this introduces storage overhead by creating redundant copies of common dependencies like gcc.
Another software problem is that an application’s dependencies typically include a specific operating system. Portability between operating systems is a major problem for software users. Although it’s possible to have compatibility between Linux software and Mac OS X, using that same software on Windows can be more difficult. Doing this can require building whole ported versions of the software. Even this is only possible if suitable replacement dependencies exist for Windows. This represents a major effort for the maintainers of the application and it’s frequently skipped. Unfortunately for users, a whole wealth of powerful software is too difficult or impossible to use on their system.
At present, Docker runs natively on Linux and comes with a single virtual machine for OS X and Windows environments. This convergence on Linux means that software running in Docker containers need only be written once against a consistent set of dependencies. You might have thought to yourself, “Wait a minute. You told me that Docker is better than virtual machines.” This is correct, but they’re complementary technologies. Using a virtual machine to contain a single program is wasteful. This is true when you’re running several virtual machines on the same computer. On OS X and Windows, Docker uses a single, small virtual machine to run all the containers. By taking this approach, the overhead of running a virtual machine’s fixed and the number of containers can scale up.
This new portability helps users in a few ways. First, it unlocks a whole world of software that was previously inaccessible. Second, it’s now feasible to run the same software—exactly the same software—on any system. That means your desktop, your development environment, your company’s server, and your company’s cloud can all run the same programs. Running consistent environments is important. Doing this helps minimize any learning curve associated with adopting new technologies. It helps software developers better understand the systems that run their programs. It means fewer surprises. Third, when software maintainers can focus on writing their programs for a single platform and one set of dependencies, it’s a huge time-saver for them and a great win for their customers.
Without Docker or virtual machines, portability is commonly achieved at an individual program level by basing the software on some common tool. For example, Java lets programmers write a single program that mostly works on several operating systems because the programs rely on a program called a Java Virtual Machine (JVM). Although this is an adequate approach for writing software, other people, at other companies, wrote most of the software we use. For example, if there is a popular web server that I want to use, but it wasn’t written in Java or another similarly portable language, I doubt that the authors would take time to rewrite it for me. In addition to this shortcoming, language interpreters and software libraries create dependency problems. Docker improves the portability of every program regardless of the language it was written in, the operating system it was designed for, or the state of the environment where it’s running.
Protecting your computer
Most of what I’ve mentioned this far are problems from the perspective of working with software and the benefits of doing this from outside a container. But containers also protect us from the software running inside a container. a program might misbehave or present a security risk in all sorts of ways:
- A program might have been written specifically by an attacker.
- Well-meaning developers could write a program with harmful bugs.
- A program could accidentally do the bidding of an attacker through bugs in its input handling.
Any way you cut it, running software puts the security of your computer at risk. Because running software is the whole point of having a computer, it’s prudent to apply the practical risk mitigations.
Like physical jail cells, anything inside a container can only access things that are inside that container. Exceptions to this rule exist, but only when explicitly created by the user. Containers limit the scope of impact that a program can have on other running programs, the data it can access, and system resources. Figure 3 illustrates the difference between running software outside and inside a container.
Figure 3 Left: a malicious program with direct access to sensitive resources. Right: a malicious program inside a container.
What this means for you or your business is that the scope of any security threat associated with running a particular application’s limited to the scope of the application itself. Creating strong application containers is complicated and a critical component of any defense in-depth strategy. It’s far too commonly skipped or implemented in a half-hearted manner.