From Learn Docker in a Month of Lunches by Elton Stoneman
This article delves into Dockerfile and how it works.
Who needs a build server when you have a Dockerfile?
Building software on your laptop is something you do for local development, but when you’re working in a team there’s a more rigorous delivery process. A shared source control system like GitHub allows everyone to push their code changes, and typically a separate server (or online service) builds the software when changes get pushed.
That process exists to catch problems early. If a developer forgets to add a file when they push code, the build fails on the build server and the team gets alerted. It keeps the project healthy, but the cost is the maintenance of the build server. Most programming languages need a lot of tools to build projects—figure 1 shows some examples.
Figure 1. Everyone needs the same set of tools to build a software project
A new starter on the team will spend the whole of their first day installing the tools, which is a big maintenance overhead. If a developer updates their local tools to run a different version of a build server, the build can fail. You have the same issues even if you’re using a managed build service, and there you may need a limited set of tools you can install.
It’s cleaner to package the build toolset once and share it – which is exactly what you can do with Docker. You can write a Dockerfile which scripts the deployment of all your tools, and build that into an image. Then you can use that image in your application Dockerfiles to compile the source code, and the final output is your packaged application.
Let’s start with a simple example, because there are a couple of new things to understand in this process. Code listing 1 shows a Dockerfile with the basic workflow:
Code listing 1. A multi-stage Dockerfile
FROM diamol/base AS build-stage RUN echo 'Building...' > /build.txt FROM diamol/base AS test-stage COPY --from=build-stage /build.txt /build.txt RUN echo 'Testing...' >> /build.txt FROM diamol/base COPY --from=test-stage /build.txt /build.txt CMD cat /build.txt
This is called a multi-stage Dockerfile, because there are several stages to the build. Each stage starts with a
FROM instruction, and you can optionally give stages a name with the
AS parameter. I’ve three stages:
test-stage and the final un-named stage. Although there are multiple stages, the output is a single Docker image with the contents of the final stage.
Each stage runs independently, but you can copy files and directories from previous stages. I’m using the
COPY instruction with the
--from argument, which tells Docker to copy files from an earlier stage in the Dockerfile, rather than from the filesystem of the host computer. In this example I generate a file in the build stage, then copy it into the test stage, and then copy the file from the test stage into the final stage.
I use one new instruction to write files:
RUN instruction executes a command inside a container during the build, and any output from that command gets saved in the image layer. You can execute anything in a
RUN instruction, but the commands you want to run need to exist in the Docker image you’re using in the
FROM instruction. In this example I use
diamol/base as the base image, and that contains the
echo command; so I know my
RUN instruction works.
Figure 2 shows what’s going to happen when we build this Dockerfile—Docker runs the stages sequentially:
Figure 2. Executing a multi-stage Dockerfile
It’s important to understand that the individual stages are isolated. You can use different base images with different sets of tools installed and run whatever commands you like. The output in the final stage only contains what you explicitly copy in from earlier stages. If a command fails in any stage, that fails the whole build.
You’ll see that the build executes the steps in the order of the Dockerfile, which gives the sequential build through the stages you see in figure 3:
Figure 3. Building a multi-stage Dockerfile
This is a simple example, but the pattern is the same to build apps of any complexity with a single Dockerfile. Figure 4 shows what the workflow looks like for a Java application:
Figure 4. A multi-stage build for a Java application
In the build stage you use a base image which has your application’s build tools installed. You copy in the source code from your host machine and run the build command. You can add a test stage to run unit tests, which uses a base image with the test framework installed, copies the compiled binaries from the build stage and runs the tests. The final stage starts from a base image with only the application runtime installed, and it copies the binaries from the build stage which have been successfully tested in the test stage.
This approach makes your application truly portable. You can run the app in a container anywhere, but you can also build the app anywhere—Docker is the only pre-requisite. Your build server needs Docker installed, new team members get set up in minutes, and the build tools are all centralized in Docker images; so there’s no chance for getting out of sync.
All the major application frameworks already have public images on Docker Hub with the build tools installed, and separate images with the application runtime. You can use these images directly, or wrap them in your own images. You’ll get the benefit of using all the latest updates with images which are maintained by the project team.
App walkthrough: Java source code
We’re moving onto a real example now, with a simple Java Spring Boot application that we’ll build and run using Docker. You don’t need to be a Java developer or have any Java tools installed on your machine to use this app, everything you need comes in Docker images. If you don’t work with Java, you should still read through this section—it describes a pattern which works for other compiled languages like .NET Core and Erlang.
The source code is in the repository for the book, at the folder path
ch04/exercises/image-of-the-day. The application uses a fairly standard set of tools for Java: Maven, which is used to define the build process and fetch dependencies, and OpenJDK which is a freely distributable Java runtime and developer kit. Maven uses an XML format to describe the build, and the Maven command line is called
mvn. This should be enough information to make sense of the application Dockerfile in code listing 2:
Code listing 2. Dockerfile for building a Java app with Maven
FROM diamol/maven AS builder WORKDIR /usr/src/iotd COPY pom.xml . RUN mvn -B dependency:go-offline COPY . . RUN mvn package # app FROM diamol/openjdk WORKDIR /app COPY --from=builder /usr/src/iotd/target/iotd-service-0.1.0.jar . EXPOSE 80 ENTRYPOINT ["java", "-jar", "/app/iotd-service-0.1.0.jar"]
All the Dockerfile instructions here are ones you’ve seen before, and the patterns are familiar from examples which you’ve built. It’s a multi-stage Dockerfile, which you can tell because there is more than one
FROM instruction, and the steps are laid out to get maximum benefit from Docker’s image layer cache.
The first stage is called
builder. Here’s what happens in the builder stage:
- it uses the
diamol/mavenimage as the base. That image has the OpenJDK Java development kit installed, as well as the Maven build tool
- the builder stage starts by creating a working directory in the image, and then copies in the
pom.xmlfile, which is the Maven definition of the Java build
- the first
RUNstatement executes a Maven command, fetching all the application dependencies. This is an expensive operation, and it has its own step to make use of Docker layer caching. If there are new dependencies, the XML file changes and the steps run. If the dependencies haven’t changed then the layer cache is used.
- now the rest of the source code is copied in:
COPY . .means copy all files and directories from the location where the Docker build is running, into the working directory in the image
- the last step of the builder is to run
mvn package, which compiles and packages the application. The input is a set of Java source code files, and the output is a Java application package called a JAR file.
When this stage completes, the compiled application exists in the builder stage filesystem. If there are any problems with the Maven build—if the network is offline and fetching dependencies fails, or if there’s a coding error in the source—then the
RUN instruction fails and the whole build fails.
If the builder stage completes successfully, Docker goes on to execute the final stage which produces the application image:
- it starts from
diamol/openjdkwhich is packaged with the Java 11 runtime, but none of the Maven build tools
- this stage creates a working directory and copies in the compiled JAR file from the builder stage. Maven packages the application and all its Java dependencies in this single JAR file, and this is all you need from the builder
- the application is a web server which listens on port 80, exposing that port in the container image
ENTRYPOINTinstruction is an alternative to the
CMDinstruction—it tells Docker what to do when a container is started from the image, in this case running Java with the path to the application JAR.
This build creates a lot of output because you’ll see all the logs from Maven, fetching dependencies and running through the Java build. Figure 5 shows an abbreviated section of my build:
Figure 5. Output from running a Maven build in Docker
What have you built? It’s a simple REST API which wraps access to NASA’s Astronomy Picture of the Day service. The Java app fetches the details of today’s picture from NASA and caches it, and you can make repeated calls to this application without repeatedly hitting NASA’s service.
When you run several containers, they need to communicate with each other. Containers access each other across a virtual network, using the virtual IP address that Docker allocates when it creates the container. You can create and manage virtual Docker networks through the command line.
If you see an error from that command, it’s because your setup already has a Docker network called
nat, and you can ignore the message. Now when you run containers you can explicitly connect them to that Docker network using the
--network flag—and any containers on that network can reach each other using the container names.
Now you can browse to http://localhost:800 and you’ll see some JSON details about NASA’s image of the day. On the day I ran the container, the image was from a solar eclipse—figure 6 shows the details from my API:
Figure 6. The cached details from NASA in my application container
The application in this container isn’t important, but what is important is that you can build this on any machine with Docker installed by having a copy of the source code with the Dockerfile. You don’t need any build tools installed, you don’t need a specific version of Java—you clone the code repo and you’re a couple of Docker commands away from running the app.
One other thing to be clear on, the build tools aren’t part of the final application image. You can run an interactive container from your new
image-of-the-day Docker image and you’ll find there’s no
mvn command. Only the contents of the final stage in the Dockerfile get made into the application image, and anything you want from previous stages needs to be explicitly copied in that final stage.
That’s all for this article.