From Knative in Action by Jacques Chester

This article covers

•  Deploying a new Service with Knative Serving

•  Updating the Service with Revisions

•  Splitting traffic between Revisions

•  The major components of Serving and what they do

Take 40% off Knative in Action by entering fccchester into the discount code box at checkout at

Serving is where I’m going to start you off in Knative. To begin with I’m going to spend this article getting you warmed up in two ways.

To begin with, I’m going to use Knative. But now I’m going to put your fingers on a keyboard. We’ll use the kn CLI tool to deploy some software, change its settings, change its software and configure traffic. I won’t b do any YAMLeering. We’ll try a purely interactive approach to Knative.

In the second part of the article, I’ll take a whistlestop tour of Serving’s key software components. I’m doing this now because I want to introduce them in one easy-to-find place. The article is structured around the concepts that Knative exposes to developers.

In this article I’ll apply basic concepts to explain the high-level architecture of Serving, which is one based on hierarchical control loops.

By the end of the article my goal is that you can start poking around kn with your own example apps, and you’ll have a nodding acquaintance with Knative Serving’s runtime components.

A walkthrough

In this section I’m going to use kn exclusively to demonstrate some Knative Serving capabilities. I assume you’ve installed it following the directions in the appendix.

kn is the “official” CLI for Knative, but it wasn’t the first. Before it came a number of alternatives, such as knctl[1]. These helped to explore different approaches to a CLI experience for Knative.

kn serves two purposes. The first is as a CLI, specific to kn, rather than requiring users to anxiously skitter around kubectl pretending that Kubernetes isn’t right there. The secondary purpose is to drive out Golang APIs for Knative.

Your first deployment

Let’s first use kn service list to ensure you’re in a clean state. You should see No services found as the response.

Now we create a service using kn service create.

Listing 1. Use kn to create our first service

 $ kn service create hello-example \                        
   --image \           
   --env TARGET="First"                                     
 Creating service 'hello-example' in namespace 'default':   
   0.084s The Route is still working to reflect the latest desired specification.   0.260s Configuration "hello-example" is waiting for a Revision to become ready.
   4.356s ...
   4.762s Ingress has not yet been reconciled.   6.104s Ready to serve.
 Service 'hello-example' created with latest revision 'hello-example-pjyvr-1' and URL:             

The first argument for kn service create is the name of the service.

The docker image reference. In this case we’re using a sample app image provided by the Knative project.

Inject an environment variable, TARGET, which is consumed by the sample app.

kn monitors the deployment process and emits logs

kn gives you the URL for the newly-deployed software

The Service you provide is split into a Configuration and Route. The Configuration creates a Revision. The Revision needs to be ready before Route can attach Ingress to it and Ingress needs to be ready before traffic can be served at the URL.

This dance illustrates how hierarchical control breaks your high-level intentions into particular software to be configured and run. At the end of the process, Knative has launched the container you nominated and configured routing to listen at the given URL.

What’s at the URL? Let’s see:

Listing 2. The first hello.

 $ curl
 Hello First!

Very cheerful.

Your second deployment

Mind you, perhaps you don’t like First. Maybe you like Second better. Easily fixed:

Listing 3. Updating hello-example

 $ kn service update hello-example \
   --env TARGET=Second
 Updating Service 'hello-example' in namespace 'default':
   3.418s Traffic is not yet migrated to the latest revision.
   3.466s Ingress has not yet been reconciled.
   4.823s Ready to serve.
 Service 'hello-example' updated with latest revision 'hello-example-bqbbr-2' and URL:
 $ curl
 Hello Second!

I changed the TARGET environment variable that the example application interpolates into a simple template:

Listing 4. How a hello sausage gets made

 func handler(w http.ResponseWriter, r *http.Request) {
   target := os.Getenv("TARGET")
   fmt.Fprintf(w, "Hello %s!\n", target)

You may have noticed that the revision name changed. “First” was hello-example-pjyvr-1 and “Second” was hello-example-bqbbr-2. Yours might look slightly different, because part of the name is randomly generated. hello-example comes from the name of the Service, and the 1 and 2 suffixes indicate the “generation” of the Service (more on that in a second). But the bit in the middle is randomized to prevent accidental name collisions.

Did Second replace First? The answer is: it depends who you ask. If you’re an end user sending HTTP requests to the URL, yes, it appears as though a total replacement took place. But from the point of view a developer, both Revisions still exist.

Listing 5. Both revisions still exist

 $ kn revision list
 NAME                    SERVICE         GENERATION   AGE     CONDITIONS   READY
 hello-example-bqbbr-2   hello-example   2            2m3s    4 OK / 4     True hello-example-pjyvr-1   hello-example   1            3m15s   3 OK / 4     True

I can look more closely at each of these with kn revision describe.


It’s worth taking a slightly closer look at the Conditions table. Software can be in any number of states and it can be useful to know what they are. A smoke test or external monitoring service can detect that you have a problem, but it may not be able to tell you why you have a problem.

What this table gives you is four pieces of information:

  1. OK gives the quick summary about whether the news is good or bad. The ++ signals that everything is fine. The I signals an informational condition—not bad, but not as unambiguous as ++. If things were going badly, you’d see !!. If Knative doesn’t know what’s happening, you’ll see ??.
  2. TYPE is the unique condition being described. In this table we can see four being reported. The Ready condition, for example, surfaces the result of an underlying Kubernetes readiness probe. Of greater interest to us is the Active condition, which tells us whether there is an instance of the Revision running.
  3. AGE reports on when this Condition was last observed to have changed. In the example these are all three hours. But they don’t have to be.
  4. REASON allows a Condition to provide a clue as to deeper causes. For example, our Active condition shows NoTraffic as its reason.

This line:

I Active 3h NoTraffic

can be read as:

“As of three hours ago, the Active Condition has an Informational status due to NoTraffic“.

Suppose we got this line:

— Ready 1h AliensAttackedTooSoon

We could read it as:

“As of an hour ago, the Ready Condition become not-OK, because the AliensAttackedTooSoon“.

What does Active mean?

When the Active condition gives NoTraffic as a reason, that means that there are no active instances of the Revision running. Suppose that it’s in NoTraffic and then we poke it with curl:

 $ kn revision describe hello-example-bqbbr-2
 Name:       hello-example-bqbbr-2
 Namespace:  default
 Age:        7d
 Image: (pinned to 5ea96b)
 Env:        TARGET=Second
 Service:    hello-example
   OK TYPE                  AGE REASON
   ++ Ready                  4h
   ++ ContainerHealthy       4h
   ++ ResourcesAvailable     4h
    I Active                 4h NoTraffic
 $ curl
 # ... a pause while the container launches
 Hello Second!
 $ kn revision describe hello-example-bqbbr-2
 Name:       hello-example-bqbbr-2
 Namespace:  default
 Age:        7d
 Image: (pinned to 5ea96b)
 Env:        TARGET=Second
 Service:    hello-example
   OK TYPE                  AGE REASON
   ++ Ready                  4h
   ++ ContainerHealthy       4h
   ++ ResourcesAvailable     4h
   ++ Active                 2s

Note that we now see ++ Active, without the NoTraffic reason. Knative is saying that a running process was created and it’s active. If you leave it for a minute, it will be shut down again and the Active Condition returns to complaining about a lack of traffic.

Changing the image

The Go programming language, aka “Golang” to its friends, “erhrhfjahaahh” to its enemies, is the Old Hotness. The New Hotness is Rust, which I’ve been able to evade forming an opinion about. All I know is that it’s the New Hotness and that therefore, as a responsible engineer, I know that it is Better.

This means that helloworld-go no longer excites me, I’d like to use helloworld-rust. Easily done:

Listing 7. Updating the container image

 $ kn service update hello-example \
   --image Updating Service
 'hello-example' in namespace 'default':
  49.523s Traffic is not yet migrated to the latest revision.
  49.648s Ingress has not yet been reconciled.  49.725s Ready to serve.
 Service 'hello-example' updated with latest revision 'hello-example-nfwgx-3' and URL:

And then I poke it:

Listing 8. The New Hotness says hello

 Hello world: Second

Note that the message is slightly different: “Hello world: Second” instead of “Hello Second!” Not being deeply familiar with Rust I can only suppose that it’s excessively formal when greeting people it has never met. But it proves that I didn’t cheat and change the TARGET environment variable.

An important point to remember is that changing the environment variable causes the second Revision to come into being. Changing the image causes a third Revision to be created. And in fact, almost any update to a Service causes a new Revision to be stamped out.

Almost any? What’s the exception? It’s Routes. Updating these as part of a Service won’t create a new Revision.

Splitting traffic

I’m going to prove it by splitting traffic evenly between the last two Revisions.

Listing 9. Splitting traffic 50/50

 $ kn service update hello-example \
   --traffic hello-example-bqbbr-2=50 \
   --traffic hello-example-nfwgx-3=50 Updating Service 'hello-example' in namespace 'default':
   0.057s The Route is still working to reflect the latest desired specification.
   0.072s Ingress has not yet been reconciled.
   1.476s Ready to serve.
 Service 'hello-example' updated with latest revision 'hello-example-nfwgx-3'
 (unchanged) and URL:

The –traffic parameter allows us to assign percentages to each revision. The key is that the percentages must all add up to 100. If I give 50 and 60, I’ll be told that given traffic percents sum to 110, want 100. Likewise, if I try to cut some corners by giving 50 and 40, I’ll get given traffic percents sum to 90, want 100. It’s my responsibility to ensure that the numbers add up correctly.

Does it work? Let’s see:

Listing 10. Totally not a perfect made-up sequence of events

 $ curl
 Hello Second!
 $ curl
 Hello world: Second

It works. Half your traffic is now allocated to each Revision.

Fifty-fifty is one split, but you may split the traffic any way you please. Suppose you had Revisions called un, deux, trois and quatre. You might split it evenly:

Listing 11. Even four-way split

 $ kn service update french-flashbacks-example \
   --traffic un=25 \
   --traffic deux=25 \
   --traffic trois=25 \
   --traffic quatre=25

Or you can split it unevenly; quatre gets a tiny sliver to prove itself, although the bulk of the work lands on trois:

Listing 12. Production and next versions

 $ kn service update french-flashbacks-example \
   --traffic un=0 \
   --traffic deux=0 \
   --traffic trois=98 \
   --traffic quatre=2

You don’t explicitly need to set traffic to 0%, you can achieve the same by leaving out Revisions from the list:

Listing 13. Implicit zero traffic level

 $ kn service update french-flashbacks-example \
   --traffic trois=98 \
   --traffic quatre=2

Finally, if I’m satisfied that quatre is ready, I can switch over all the traffic using @latest as my target:

Listing 14. Targeting @latest

 $ kn service update french-flashbacks-example \
   --traffic @latest=100

Serving Components

As promised, I’m going to spend some time looking at some Knative Serving internals. Knative and Kubernetes are built on the concept of control loops. A control loop involves a mechanism for comparing a desired world and an actual world, then takes action to close the gap between them.

But this is the boxes-and-lines explanation. The concept of a control loop needs to be embodied as software processes. Knative Serving has several of these, falling broadly into four groups:

  1. Reconcilers, responsible for acting on both user-facing concepts like Services, Revisions, Configurations and Routes as well as lower-level housekeeping;
  2. The “Webhook”, responsible for validating and enriching the Services, Configurations and Routes that users provide;
  3. Networking controllers that configure TLS certificates and HTTP ingress routing; and
  4. The Autoscaler/Activator/Queue-Proxy triad, which manage the business of comprehending and reacting to changes on traffic.

The Controller and Reconcilers

Let’s talk about names for a second.

Knative has a component named controller, which is a bundle of individual “Reconcilers”. Reconcilers are controllers: a system that reacts to changes in the difference between desired and actual worlds. Reconcilers are controllers, but the controller isn’t a controller. Got it?

No? You’re probably wondering why the names are different. The simplest answer is: to avoid confusion about what’s what. That may sound silly. Bear with me, I promise it’ll make sense.

At the top, in terms of running processes managed directly by Kubernetes, Knative Serving only has one controller. But in terms of logical processes, Knative Serving has several controllers, running in goroutines inside the single physical controller process. Moreover, Reconciler is a Golang interface that implementations of the controller pattern are expected to implement.

To avoid saying “the controller controller” or “the controllers that run on the controller” or other less-than-illuminating naming schemes, there are instead two names: controller and Reconciler.

Each Reconciler is responsible for some aspect of Knative Serving’s work, which falls into two categories. The first category is simple to understand—it’s the reconcilers responsible for managing the developer-facing resources. Hence there are reconcilers called configuration, revision, route and service.

For example, when you use kn service create, the first port of call is for a Service record to be picked up by the service controller. When you use kn service update to create a traffic split, you send the route controller outside to do some work for you.

Reconcilers in the second category work behind the scenes to carry out essential lower-level tasks. These are labeler, serverlessservice and gc. The labeler is part of how networking works; it sets and maintains labels on Kubernetes objects that networking systems can use to target them for traffic. I’ll touch on this when we get to routing.

The serverlessservice reconciler is part of how the Activator works. It reacts to and updates

ServerlessService records (say that five times fast!). These are also mostly about networking in Kubernetes-land.

Lastly, the gc reconciler performs garbage-collection duties and hopefully, you’ll never need to think about it again.

Figure 1. The serving controller and its Reconcilers

The Webhook

Things go wrong. A great deal of software engineering is centered on ensuring that when things go wrong, they at least choose to go wrong at the least-painful and/or least-Tweetable moment. Type systems, static analysis, unit test harnesses, linters, fuzzers, the list goes on and on. We submit to their nagging because solving the mysteries of fatal errors in production is less fun than Agatha Christie made it out to be.

At runtime, Serving relies on the completeness and validity of information provided about things you want to manage (eg, Services) and how you want it to behave generally (eg, autoscaler configuration). This brings us to the webhook, which validates and augments your submissions. Like the controller, it’s a group of logical processes which are collected together into a single physical process for ease of deployment.

The name “webhook” is a little deceptive, because it describes the implementation rather than its purpose. If you’re familiar with webhooks, you might have thought that its purpose was to dial out to an endpoint that you provide. This is not the case. Or perhaps it was an endpoint that you could ping yourself. Closer, but still incorrect. Instead, the name comes from its role as a Kubernetes “admissions webhook”. When processing API submissions, the Knative Webhook is registered as the delegated authority to inspect and modify Knative Serving resources. A better name might be “Validation and Annotation Clearing House” or perhaps the “Ditch It or Fix It Emporium”. But “webhook” is what we have.

The Webhook’s principal roles include:

  • Setting default configurations. This includes values for timeouts, concurrency limits, container resources limits and garbage collection timing. This means that you only need to set values you want to override. I’ll touch on these as needed.
  • Injecting routing and networking information into Kubernetes. I’ll discuss this when I get to routing.
  • Validating that users didn’t ask for impossible configurations. For example, the webhook rejects negative concurrency limits. I’ll refer to these when needed.
  • Resolving partial docker image references to include the digest. For example, example/example:latest would be resolved to include the digest, and it looks like example/example@sha256:1a4bccf2…. This is one of the best things Knative can do for you, and the webhook deserves the credit for it.

Networking controllers

Early versions of Knative relied directly on Istio for core networking capabilities. That hasn’t entirely changed. In the default installation provided by the Knative project, Istio is installed as a component and Knative makes use of some of its capabilities.

As it has evolved, more of Knative’s networking logic has been abstracted up from Istio. Doing this allows you to swap out components. And Istio might make sense for your case, but it has many features and might be overkill. On the other hand, you might have Istio provided as part of your standard Kubernetes environment. Istio or not-Istio are acceptable alternatives for Knative.

Knative Serving requires that networking controllers answer for two basic record types: Certificate and Ingress.


TLS is essential to the safety and performance of the modern internet, but the business of storing and shipping TLS certificates has always been inconvenient. The Knative Certificate abstraction provides information about the TLS certificate which is desired, without providing it directly.

For example, TLS certificates are scoped to particular domain names or IP addresses. When creating a Certificate, a list of DNSNames is used to indicate what domains the Certificate should be valid for. A conforming controller can then create or obtain certificates that fulfill that need.

I’ll have more to say about Certificates when we dive into routing.


Routing traffic is always one of those turtles-all-the-way-down affairs. Something, somewhere, is meeting traffic at the boundary of your system. In Knative, this is the Ingress.[2]

Ingress controllers act as a single entrance to the entire Knative installation. They convert Knative’s abstract specification into particular configurations for their own routing infrastructure. For example, the default networking-istio controller converts a Knative Ingress into an Istio Gateway.

Knative Ingress implementations include Istio-Gateway, Gloo, Ambassador and Kourier.

Autoscaler, Activator and Queue-Proxy

These three work together quite closely, and I’ve grouped them under the same heading.

Figure 2. The triad of autoscaler, activator and queue-proxy

When I talk about “the Autoscaler”, I generally refer to the Knative Pod Autoscaler (KPA). This is the out-of-the-box autoscaler that ships with Knative

Serving. It’s possible to configure Knative to use the Kubernetes Horizontal Pod Autoscaler (HPA) instead. In future the KPA might be retired as the HPA becomes better-suited to serverless patterns of activity, but at time of writing that seemed to be a fairly distant milestone.

The Autoscaler is the easiest to give an elevator pitch for: observe demand for a Service, calculate the number of instances needed to serve that demand, then update the Service’s scale to reflect the calculation. You’ve probably recognized that this is a supervisory control loop. Its desired world is “minimal mismatch between demand and instances”. Its output is a scale number that becomes the desired world of a Service control loop.

Figure 3. The Knative Pod Autoscaler is a control loop

It’s worth noting that the Knative Pod Autoscaler operates solely through horizontal scaling. That is: launching more copies of your software. “Vertical scaling” means launching it with additional computing resources. In general, vertical scaling is simpler—you pay more for a beefier machine. But the costs are highly nonlinear and there’s always an upper limit to what can be achieved. Horizontal scaling typically requires deliberate architectural decisions to make it possible, but once achieved it’s more able to face higher demands than any one machine couldan handle. The Knative Pod Autoscaler assumes you’ve done the work to ensure that instances coming and going at a rapid clip won’t be overly disruptive.

When there’s no traffic, the desired number calculated by the Autoscaler eventually sets to zero. This is great, right until a new request shows up without anything ready to serve it. We could plausibly bounce the request with an HTTP 503 Service Unavailable status-perhaps even, in a fit of generosity, providing a Retry-After header. The problem is that (1) humans hate this, and (2) vast amounts of upstream software assumes that network requests are magical and perfect and can never fail. They’ll either barf on their users or, more likely, ignore your Retry-After and hammer the endpoint into paste. Not to mention (3), which is that all of this is screencapped and mocked on Reddit.

What do you do when there are no instances running—the dreaded cold start? In this case, the Activator is a traffic target of last resort: the Ingress is configured to send traffic for routes with no active instances to the Activator. Hence:

Figure 4. The Activator’s role in managing cold starts

  1. The Ingress receives a new request. The Ingress sends the request to its configured target, which is the Activator.
  2. The Activator places the new request into a buffer.
  3. The Activator “pokes” the Autoscaler. The poke does two things: firstly, it carries information about requests waiting in the buffer. Secondly, the arrival of a poke signal prompts the Autoscaler to make an immediate scaling decision, instead of waiting until the next scheduled decision time.
  4. After considering the fact that there’s a request waiting to be served, but there are zero instances available to serve it, the Autoscaler decides that there ought to be one instance running. It sets a new scale target for Serving.
  5. As you wait for the Autoscaler and Serving to do their work, the Activator polls Serving to see if any instances are live.
  6. Serving’s hierarchy of controllers ultimately cause Kubernetes to launch an instance of your software.
  7. The Activator learns from its polling that an instance is now available and moves the request from its buffer to a proxy service.
  8. The proxy component sends the request to the instance, which responds normally.
  9. The proxy component sends the response back to the Ingress, which then sends it back to the requester.

Does this mean all traffic flows through the Activator? No. The Activator remains on the data path during the transition from “no instances” to “enough instances”. Once the Autoscaler is satisfied that there’s enough capacity to meet current demand, it updates the Ingress, changes the traffic target from the Activator to the running instances. At this point that Activator no longer has any role in proceedings.

The exact timing of this update depends mostly on how much traffic has piled up and how long it is taking to launch instances to serve it. Imagine that ten thousand requests arrive and the Activator then sprayed them at the first instance foolish enough to stick its head above the trenches. Instead the Activator throttles its proxy until capacity catches up with demand. Once requests are flowing smoothly, the Autoscaler’s own logic removes the Activator from the data path.

The final component of this triad is the Queue-Proxy. This is a small proxy process that sits between your software and arriving traffic. Every instance of your Service has its own QueueProxy, running as a sidecar. Knative does this for a few reasons. One is to provide a small buffer for requests, allowing the Activator to have a clear signal that a request has been accepted for processing (this is called “positive handoff”). Another purpose is to add tracing and metrics to requests flowing in and out of the Service.


  • kn is a CLI tool for interacting with Knative, including Serving.
  • kn service lets you view, create, update and configure Knative Services, including splitting traffic between Revisions.
  • Knative Serving has a controller process, which is a collection of components called “Reconcilers”. Reconcilers act as feedback controllers.
  • Reconcilers are for Serving’s core record types (Service, Route, Configuration and Revision), as well as housekeeping Reconcilers.
  • Knative Serving has a webhook process, which intercepts new and updated records you submit. It can then validate your submissions and inject additional information.
  • The Knative Pod Autoscaler is a feedback control loop. It compares the ratio of traffic to instances, and raises or lowers the desired number of instances that the serving controller controls.
  • The Activator is assigned Ingress routes when no instances are available. This assignment is made by the Autoscaler.
  • The Activator is responsible for “poking” the Autoscaler when new requests arrive, to trigger a scale-up.
  • Although instances are becoming available, the Activator remains on the data path as a throttling, buffering proxy for traffic.
  • When the Autoscaler believes there’s enough capacity to serve demand, it removes the Activator from the data path by updating Ingress routes.
  • Knative Serving’s Networking is highly pluggable. Core implementations are provided for two functions: Certificates and Ingress.
  • Certificate controllers accept a definition of desired Certificates and must provision new certificates or map existing certificates into your software.
  • Ingress controllers accept Routes and convert these into lower-level routing or traffic management configurations.
  • Ingress controller implementations include Istio-Gateway, Gloo, Ambassador and Kourier.
  • That’s all for this article. If you want to learn more about the book, you can preview its contents on our browser-based liveBook platform here.

[2] This is distinct from a Kubernetes Ingress.