docker in action 2e

From Docker in Action, Second Edition by Jeff Nickoloff and Stephen Kuenzli

This article tells you how Docker can help you simplify software installation.

Take 37% off Docker in Action, Second Edition. Just enter fccnickoloff into the discount code box at checkout at

Installation files and isolation

Understanding how images are identified, discovered, and installed is a minimum proficiency for a Docker user. If you understand what files are installed and how those files are built and isolated at runtime, you’ll be able to answer more difficult questions that come up with experience, such as these:

  • What image properties factor into download and installation speeds?
  • What are all these unnamed images which are listed when I use the docker images command?
  • Why does output from the docker pull command include messages about pulling dependent layers?
  • Where are the files I wrote to my container’s file system?

Learning this material is the third and final step to understanding software installation with Docker, as illustrated in figure 1.

Figure 1 Step 3—Understanding how software is installed

When I write about installing software, I use the term image. This infers that the software you were going to use was in a single image and that an image was contained within a single file. Although this may occasionally be accurate, most of the time what I’ve been calling an image is a collection of image layers. A layer is an image which is related to at least one other image. It’s easier to understand layers when you see them in action.

Image layers in action

In this example you’re going to install the two images. Both depend on Java 6. The applications are simple Hello World–style programs. What I want you to keep an eye on is what Docker does when you install each. You should notice how long it takes to install the first compared to the second and read what it’s printing to the terminal. When an image is installed, you can watch Docker determine which dependencies it needs to download and then see the progress of the individual image layer downloads. Java is great for this example because the layers are quite large, and that gives you a moment to see Docker in action.

The two images you’re going to install are dockerinaction/ch3_myapp and dockerinaction/ch3_myotherapp. You should use the docker pull command because you only need to see the images install, not start a container from them. Here are the commands you should run:

 docker pull dockerinaction/ch3_myapp
 docker pull dockerinaction/ch3_myotherapp

Did you see it? Unless your network connection is far better than mine, or you already installed Java 6 as a dependency of some other image, the download for –dockerinaction/ch3_myapp should have been much slower than dockerinaction/ch3_myotherapp.

When you installed ch3_myapp, Docker determined that it needed to install the openjdk-6 image because it’s the direct dependency (parent layer) of the requested image. When Docker went to install that dependency, it discovered the dependencies of that layer and downloaded those first. Once all the dependencies of a layer are installed, that layer’s installed. Finally, openjdk-6 was installed, and then the tiny ch3_myapp layer was installed.

When you issued the command to install ch3_myotherapp, Docker identified that openjdk-6 was already installed and immediately installed the image for ch3_myotherapp. This was simpler, and because less than one megabyte of data was transferred, it was faster. But again, to the user it was an identical process.

From the user perspective this ability is nice to have, but you wouldn’t want to have to try to optimize for it. Take the benefits where they happen to work out. From the perspective of a software or image author, this ability should play a major factor in your image design.

If you run docker images now, you’ll see the following repositories listed:

  • dockerinaction/ch3_myapp
  • dockerinaction/ch3_myotherapp
  • java:6

By default, the docker images command only shows you repositories. Similar to other commands, if you specify the -a flag, the list includes every installed intermediate image or layer. Running docker images -a shows a list that includes several repositories listed as <none>. The only way to refer to these is to use the value in the IMAGE ID column.

In this example you installed two images directly, but a third parent repository was installed as well. You’ll need to clean up all three. You can do this more easily if you use the condensed docker rmi syntax:

 docker rmi \
     dockerinaction/ch3_myapp \
     dockerinaction/ch3_myotherapp \

The docker rmi command allows you to specify a space-separated list of images to be removed. This comes in handy when you need to remove a small set of images after an example. I’ll be using this when appropriate throughout the rest of the examples in this article.

Layer relationships

Images maintain parent/child relationships. In these relationships they build from their parents and form layers. The files available to a container are the union of all of the layers in the lineage of the image the container was created from. Images can have relationships with any other image, including images in different repositories with different owners. The two images use a Java 6 image as their parent. Figure 2 illustrates the full image ancestry of both images.

Figure 2 The full lineage of the two Docker images

The layers shown in figure 2 are a sample of the java:6 image at the time of this writing. An image is named when its author tags and publishes it. A user can create aliases using the docker tag command. Until an image is tagged, the only way to refer to it is to use its unique identifier (UID) that was generated when the image was built. In figure 2, the parents of the common Java 6 image are labeled using the first twelve digits of their UID. These layers contain common libraries and dependencies of the Java 6 software. Docker truncates the UID from sixty-five (base 16) digits to twelve for the benefit of its human users. Internally and through API access, Docker uses the full sixty-five. It’s important to be aware of this when you’ve installed images along with similar unnamed images. I wouldn’t want you to think something bad happened or some malicious software had made it into your computer when you see these images included when you use the docker images command.

The Java images are sizable. At the time of this writing, the openjdk-6 image is 348 MB, and the openjdk-7 image is 590 MB. You get some space savings when you use the runtime-only images, but even openjre-6 is 200 MB. Again, Java was chosen here because its images are particularly large for a common dependency.

Container file system abstraction and isolation

Programs running inside containers know nothing about image layers. From inside a container, the file system operates as though it’s not running in a container or operating on an image. From the perspective of the container, it has exclusive copies of the files provided by the image. This is made possible with something called a union file system. Docker uses a variety of union file systems and selects the best fit for your system. The details of how the union file system works are beyond what you need to know to use Docker effectively.

A union file system is part of a critical set of tools that combine to create effective file system isolation. The other tools are MNT namespaces and the chroot system call.

The file system is used to create mount points on your host’s file system that abstract the use of layers. The layers created are what are bundled into Docker image layers. Likewise, when a Docker image is installed, its layers are unpacked and appropriately configured for use by the specific file system provider chosen for your system.

The Linux kernel provides a namespace for the MNT system. When Docker creates a container, that new container has its own MNT namespace, and a new mount point is created for the container to the image.

Lastly, chroot is used to make the root of the image file system the root in the container’s context. This prevents anything running inside the container from referencing any other part of the host file system.

Using chroot and MNT namespaces is common for container technologies. By adding a union file system to the recipe, Docker containers have several benefits.

Benefits of this toolset and file system structure

The first and perhaps most important benefit of this approach is that common layers need to be installed only once. If you install any number of images and they all depend on some common layer, that common layer and all of its parent layers need to be downloaded or installed only once. This means you might be able to install several specializations of a program without storing redundant files on your computer or downloading redundant layers. By contrast, most virtual machine technologies store the same files as many times as you have redundant virtual machines on a computer.

Second, layers provide a coarse tool for managing dependencies and separating concerns. This is handy for software authors. From a user perspective, this benefit helps you quickly identify what software you’re running by examining which images and layers you’re using.

Lastly, it’s easy to create software specializations when you can layer minor changes on top of some basic image. Providing specialized images helps users get exactly what they need from software with minimal customization. This is one of the best reasons to use Docker.

Weaknesses of union file systems

Docker selects sensible defaults when it’s started, but no implementation is perfect for every workload. In fact, there are some specific use cases when you should pause and consider using another Docker feature.

Different file systems have different rules about file attributes, sizes, names, and characters. Union file systems are in a position where they often need to translate between the rules of different file systems. In the best cases they’re able to provide acceptable translations. In the worst cases features are omitted. For example, neither btrfs nor OverlayFS provides support for the extended attributes that make SELinux work.

Union file systems use a pattern called copy-on-write, and that makes implementing memory-mapped files (the mmap()

system call) difficult. Some union file systems provide implementations that work under the right conditions, but it may be a better idea to avoid memory-mapping files from an image.

The backing file system is another pluggable feature of Docker. You can determine which file system your installation is using with the info subcommand. If you want to specifically tell Docker which file system to use, do this with the --storage-driver or -s option when you start the Docker daemon. Most issues that arise with writing to the union file system can be addressed without changing the storage provider. These can be solved with volumes.

That’s all for now. If you want to learn more about the book, check it out on liveBook here and see this slide deck.