From Knative in Action by Jacques Chester

This article introduces Knative and explores its function(s) and best use.

Take 40% off Knative in Action by entering fccchester into the discount code box at checkout at


One of my north stars is the Onsi Haiku Test[haiku]:

 Here is my source code.
Run it on the cloud for me.
I do not care how.

This is a radical notion of how software can best be developed, deployed, upgraded, observed, managed and improved. It must be, because it often emerges long after we’ve tried everything else first. It implies:

  • That a fast, reliable path to production is a shared goal for everyone
  • That there’s a crisp contractual boundary between folks who provide platforms and folks whose work consumes the platform
  • That building software that handles other software is, for most developers, not the most urgent, most valuable work they could be doing.

Kubernetes, by itself, doesn’t pass the Onsi Haiku Test. The boundary between development and operation is unclear. Developers can’t walk up to a vanilla kubernetes cluster, provide it with raw source code, and get all the basic amenities of routing, logging, and service injection. Kubernetes gives you a rich toolbox for solving the Test in your own particular way. But a toolbox isn’t a machine. It’s a toolbox.

This book isn’t about Kubernetes, it’s about Knative. Knative builds on the toolbox Kubernetes provides, but also sets out to achieve a level of consistency, simplicity and ease of use that brings Kubernetes much closer to meeting the Test’s high standard. Knative is a machine.

While it has something to offer many different professional specialties, Knative is primarily focused on the needs and pains of developers, to elevate them to the heights of “I do not care how”. Kubernetes is amazing, but it never strongly demarcated what is means to be changed or managed by whom. This is a strength: you can do anything! And a weakness: you could, and did, do anything! Knative provides crisp abstractions that, by design, don’t refer to the grungy physical business of nodes and containers and VMs. I’ll also focus on developers in this article, referring to or explaining Kubernetes only when necessary to understand Knative.

What is Knative?

You can answer this question in several ways.

The purpose of Knative is to provide a simple, consistent layer over Kubernetes that solves common problems of deploying software, connecting disparate systems together, upgrading software, observing software, routing traffic and scaling automatically. This layer creates a firmer boundary between the developer and the platform, allowing the developer to concentrate on the software they’re directly responsible for.

The major subprojects of Knative are Serving and Eventing. Serving is responsible for deployment, upgrade, routing and scaling. Eventing is responsible for connecting disparate systems. Both Serving and Eventing have observability as concerns. Dividing responsibilities this way allows each to be developed more independently and rapidly by the Knative community.

The software artifacts of Knative are a collection of software processes packaged into containers, which run on a Kubernetes cluster. In addition, Knative installs additional customizations into Kubernetes itself to achieve its ends. This is true of both Serving and Eventing, each of which installs its own components and customizations. While this may interest a platform engineer or platform operator, it shouldn’t matter to a developer. You should only care that it’s installed, not where or how.

The API or surface area of Knative is primarily YAML documents that declaratively convey your intention as a developer. These are “CRDs”, Custom Resource Definitions. They’re plugins or extensions for Kubernetes that look and feel like vanilla Kubernetes.

You can also work in a more imperative style using the kn CLI, which is useful for tinkering and rapid iteration. I’ll show both of these approaches throughout the article.

Let’s take a quick motivational tour of Knative’s capabilities.

Deploying, upgrading and routing

Deployment has evolved: what used to be a process of manually promoting software artifacts through environments (with scheduled downtime, two hundred people on a bridge call all weekend…) became continuous delivery and blue-green deploys[cdbook].

But should deployment be all-or-nothing? Knative enables progressive delivery: instead of requests arriving at a production system which is entirely one version of the software, they arrive at a system where multiple versions can be running together with traffic being split between them. This means that deployments can proceed at the granularity of requests, rather than instances. [progressive] “Send 10% of traffic to v2” is different from “10% of instances are v2”.


Sometimes there’s no traffic. Sometimes there’s too much traffic. One of these is wasteful, the other is stressful. Knative is ready with the Knative Pod Autoscaler, a request-centric autoscaler that’s deeply integrated with Knative’s routing, buffering and metrics components. The autoscaler can’t solve all your problems, but it will solve enough that you can focus on more important problems.


Easy management of HTTP requests take you a long way, but not everything looks like a POST.

Sometimes we want to react to events instead of responding to requests. Events might come from your software or external services, but they may arrive without anyone requesting something. That’s where Knative Eventing for events comes into focus. It enables you to compose small pieces of software into flexible processing pipelines, connected through events. You can even prepare to process things that don’t even exist yet (really).

So what?

I know your secret: somewhere in your repo is It’s a grungy bash script which does some grep-and-sed and calls kubectl a bunch of times and has some sleeps and maybe you got ambitious and there’s a wget floating around in it too. You wrote it in a hurry and you’re going to do a better job but right now we’re busy working getting this thing done before Q3 and we need to implement floozlebit support and refactor the twizzleflorp and … works well enough.

But this is always true for everything, there’s never enough time. Why didn’t you make the change yet?  Easy, it’s too hard. Too much work when you already have enough.

Kubernetes itself is great, once you set it up. It absolutely shines at its core purpose in life: reconcile the differences between the desired state of the system and the true state of the system on a continuous basis. If all you ever needed was to deploy your system once and let it run forever without changing it, then you’re good to go and lucky you. The rest of us are on the hedonic treadmill. We have desired worlds that change. We ship bugs that need to be fixed, our users think of new features they want, our competitors make us scramble to answer new services.

And that’s how you wound up with the script. And doing a better job of deployment doesn’t seem urgent. After all: it works, right? Yes … if and only if your goal is to be afraid to upgrade anything or to have umpteen slightly different versions of floating around company repos or to write your own CD system without intending to. Why bother? Let Knative toil for you instead.

I know two of your secrets. Your code knows a lot about all your other code. The Login Service knows about the User Service and the Are-You-A-Robot? Service. It tells them what it wants and it waits for their answer. This is the imperative style, and with it we as a profession have built incredible monuments to human genius. But we’ve also built some incredible bowls of spaghetti and warm compost.

It would be nice to decouple your services a bit, to make software respond to reports of stuff happening and in turn reports stuff that it did. This isn’t a novel concept: the idea of software connected through pipes of events or data has sailed under various flags and in various fleets for decades now. Deep and important and profound differences exist between all of these historical schools of thought. I will, in an act of mercy, spare you any meaningful discussion of them. Because before you learn how to chisel apart the monolith, you need a chisel and a hammer.

Where Knative shines

Knative’s focus on event-driven, progressively-delivered, autoscaling architectures lends itself to some particular sweet spots.

Workloads with unpredictable, latency-insensitive demand

Variability is a fact of life: nothing repeats perfectly. Nothing can be perfectly predicted or optimized. Many workloads face demand variability: it isn’t always clear, from moment to moment, what demand to expect.

The Law of Variability Buffering[buffer] says that you can deal with demand variability by buffering it in one of three ways:

  1. With inventory: something you produced earlier and have at hand. For example, caching.
  2. With capacity: unused reserve capacity that can absorb more demand without meaningful effect. For example, idle instances.
  3. With time: by making the demand wait longer.

These are all costly. Inventory costs money to hold (RAM and disk space isn’t free), capacity costs money to reserve (an idle CPU still uses electricity) and famously, “time is money” and nobody likes to wait.

Inventory, capacity and time really are the only options for buffering variability. It’s basic calculus. Inventory is an integral, a sum of previous capacity utilization and demand. Capacity is a derivative, a rate of change of inventory. And time is time.

You can rearrange the terms and you can change their values, but you can’t escape the boundaries of mathematics. The only alternative is to reduce variability which reduces buffering.

Knative’s default strategy for buffering is time. If demand shows up but capacity is low or even zero, Knative’s autoscaler reacts by raising capacity and holding your request until it can be served. That’s well and good, but it takes time to bring capacity online. This is the famous “cold start” problem.

Does this matter? It depends on the nature of the demand. If the demand is latency-sensitive, then maybe scaling to zero isn’t for you. You can tell Knative to keep a minimum number of instances alive (no more pinging your function). On the other hand, if it’s a batch job or background process that can wait a while to kick off, buffering by time is sensible and efficient. Let that thing drop to zero. Spend the savings on icecream.

Regardless of sensitivity to latency, the other consideration is how predictable is the demand? Highly variable demands require larger buffers. Either you hold more inventory, or more reserve capacity, or make folks wait longer; there are no alternatives. If you don’t know how you want to trade these off, the autoscaler can relieves you of dealing with common cases.

Figure 1. Knative’s sweetspots in terms of latency sensitivity and demand predictability

One thing Knative can’t do much is save you from supply variability. That is, it can’t make variability due to your software vanish, or magic away variability due to upstream systems you rely on. How long your software takes to become live and how responsive it is depends on you. Upstream variability might be in your court, but you’ll still be affected by it.

Stitching together events from multiple sources

Sometimes you have a square peg, a round hole, and a deadline. Knative won’t shave the peg or hammer it into the hole, but Knative Eventing lets you glue things together to achieve your original purpose. By design, Eventing is meant to be able to receive events from heterogenous sources and convey them to heterogenous consumers. Webhook from Github? Yes. Pub/Sub message from Google? Yes. File uploaded? Yes.

Some combination of these? Also, yes, which is the interesting part. Relatively small, consistent, standards-based interfaces allow many combinations of elements. To this Knative adds some simple abstractions to enable you to go from dabs of glue to relatively sophisticated event flows. So long as some event or message can be expressed as a CloudEvent, which is pretty much anything ever, Knative Eventing can be used to do something smart with it.

Of course, the flipside of generality is that it can’t be everything to everyone. Should you use it for CI/CD? Maybe. For streaming data analysis? Perhaps. Business workflow processing? Reply hazy, try again.

The key is that for all of these, there are existing, more specialized tools that might be a better fit. For example, you can build a MapReduce pattern using Knative. But realistically, you won’t get anywhere near the kind of performance and scale of a dedicated MapReduce system. You can build CI/CD with Knative, but now you have to do a lot of homework to implement all the inflows and outflows.

Figure 2. Knative’s sweetspots in terms of event heterogeneity and implementation specialization

Where Knative can shine is when you want to connect a variety of tools and systems in simple ways, in small increments. We all do this in our work, but typically it gets jammed into whatever system happens to have room for boarders. Our web apps sprout obscure endpoints or our CI/CD accumulates increasingly hairy Bash scripts. Knative lets us pull these out into the open to more easily test, monitor and reuse.

Decomposing monoliths in small increments

Microservices as a term describes a family of powerful architectural patterns. But getting to a microservices architecture isn’t easy, because most existing systems aren’t designed for it. For better or worse, they are monoliths.

Easy, you say: use the strangler pattern[strangler]. Add microservices incrementally, route requests to them and the original codepath goes cold; repeat until you’re done.

Knative makes this easier in two ways. The first is that it’s good at the routing thing. The concept of routing portions of traffic is the key to its design. The strangler pattern tends to falter once you’ve strangled the less-scary bits (look boss, we broke out the cat gif subsystem!) and move onto the parts where the big money lives. Suddenly it’s a bit scarier, because a (1) cutover is a cutover, (2) a big-bang cutover is a bet-your-job event, and (3) Knative makes it easier to stop believing in (1) and (2).

Figure 3. Knative’s sweetspots in terms of resisting temptation to grow a monolith

The second way Knative makes strangulation easier is that you can deploy small units easily. Knative has a deep design assumption that you’ll have a bunch of little functions that come and go. A function is less to recreate than a service. The smaller you can start, the easier it is to start.

It’s a hit

So far, I promised a lot: easier deployments, easier event systems, incremental development, Martian unicorns – the usual stuff that everyone promises to developers. But I haven’t given you any concrete details. In order to support my pitch we can start in small increments, I’ll begin with one of the oldest, simplest examples of the dynamic web and show how Knative makes it faster, smarter and easier.

Remember hit counters?

Figure 4. The late 1990s were truly a golden era.

I sure do. The first time I saw one it blew my mind. It changed! By itself! Magic!

Not magic, of course, it was a CGI program, probably some Perl. CGI is one of the spiritual parents of Knative, and in its honor, we’re going a make a hit counter for MY AWESOME HOMEPAGE.

Listing 1. The awesome homepage HTML

  <style>body { font-family: "awesomefont" }</style>
    <b>MY AWESOME HOMEPAGE</b><br />
    <img src="//hits.png" />

First, let’s talk about the basic flow of requests and responses. A visitor to the homepage will GET an HTML document from the web server. The document contains some style and, most importantly, the hit counter.

Figure 5. The flow of requests and responses Specifically:

  1. The browser issues a GET request for the homepage
  2. The homepage service returns the HTML of the homepage
  3. The browser finds an img tag for hits.png. It issues a GET for hits.png
  4. A file bucket returns hits.png

In the old world, all of the processing needed to generate the hit counter would block the webserver response. You’d submit your request, the web server would bestir the elder gods of Cämelbuk, and then /CGI-BIN/ would render the image. It might take a second or two, but nobody could tell, unless they were using one of those blazing 28.8k modems.

But now everyone is impatient: spending a few second to render an image that could otherwise be served from a fast data path isn’t going to be acceptable. Instead we’ll break that responsibility out and do it asynchronously. That way, the web server can immediately respond with HTML and leave the creation of hit counter images to something else.

How does the web server signal that intention? It doesn’t. Instead it signals that a hit occurred. Remember: the web server wants to serve web pages, not orchestrate image rendering. Instead of blocking, it emits a new_hit CloudEvent.

Emits to where?

To a Broker, a central meeting point for systems in Knative Eventing. The Broker has no particular interest in the new_hit event, it merely receives and forwards events. The exact details of who gets what is defined with Triggers. Each Trigger represents an interest in some set of events and where to forward them to. When events arrive at the Broker, it applies each Trigger’s filter and, if it matches, forward the event to the subscriber:

Figure 6. Broker applying Triggers to CloudEvents

It’s Triggers enable the incremental composition of event flows. The web server doesn’t know where new_hit winds up and doesn’t care. Given our new_hit, we can start to tally up the count of hits. Already, we’re ahead of the 1999 status quo: we could take our original Perl script and have it react to the new_hit event instead of blocking the main web response.

But we’re here – let’s go a step further. After all, is rendering images the proper concern of a tallying service? When I perform an SQL UPDATE I don’t get back JPEG files. Instead I need the tally service to consume the new_hit and emit a new count, which can then wing its way to other subscribers.

Putting it together:

Figure 7. The flow of events

  1. The homepage service emits a new_hit event
  2. A Trigger matches new_hit, so the Broker forwards it to hit counter
  3. hit counter updates its internal counter, then emits a hits event with the value of that counter
  4. Another Trigger matches for hits, and the Broker forwards it to image renderer
  5. The image renderer renders a new image and replaces hits.png in the file bucket

And now, if the visitor reloads their browser, they can see that the hit counter has incremented.

Trouble in paradise

Except, maybe, they don’t. To see why, let’s put the diagrams together:

Figure 8. Combining the flows in one diagram

Note that I’m showing two sets of numbers, one for web request/response and another for the event flow. This illuminates the important point: the web flow is synchronous, but the event flow is asynchronous. You knew that, but I handwaved away the consequences, and now I need to slap my wrist. The distinction matters.

Because the event flow is asynchronous, there’s no guarantee that hits.png is updated before the next visitor arrives. I might see 0001336, reload and then see 0001336 again. And that’s not all: where one visitor might see no change, another visitor might observe that the hit counter jumps forward, because later renderings can overwrite earlier renderings before they were served. And that’s not all! An observer might see the count go backwards, because the rendering that increased the number to 0001338 might have finished before the rendering for 0001337 did. Or it may be that the events arrived out of order. Or some events never even arrived.

Figure 9. Synchronous flows can be inefficient. Asynchronous workloads can be inexplicable.

I’m not done. Remember how I said that hit counter was keeping a tally of hits? I didn’t say where. If it keeps a value in memory, then you have new problems. For example: if Knative’s autoscaler decides that things are too quiet lately, it reduces the number of hit counters to zero and pow, your tally is gone. Next time it spins up your hit count is reset to zero. But on the other hand, if you have more than one hit counter, they’re keeping separate tallies. The exact hit count image at any moment depend on traffic, but not in the way you might have expected.

I’m describing stateless systems, of course. The answer is to keep state in a shared location, separately from the logic that operates on it. For example, each hit counter might be using Redis to increment a common value. Or you might get super fancy  and have each instance listen for hits events. If the incoming event is a higher tally, jump to that value and hope you’re not participating in an infinite event loop.

Changing things

You’ve probably noticed that my focus has been on an already-deployed system. That’s the bad news. The good news is that you can fix a key bug I introduced in the previous section. Can you guess what it is?

Correct. The font sucks.

You quickly learn that Knative prizes immutability. This has a lot of implications. For now, it means that we can’t SSH into homepage, open vi and do it live. But it does raise the question of how changes get moved from your workstation to the cloud.

Knative encapsulates “run the thing” and “change the thing” into Services . When the Service is changed, Knative acts to bring the world into sync with the change.

Figure 10. Updating the homepage

  1. A user who arrives before the update sees the existing HTML, as served by homepage v1
  2. The developer uses kn to update the Service
  3. Knative starts homepage v2.
  4. homepage v2 passes its readiness check
  5. Knative stops homepage v1
  6. A second user arriving after the update sees a more professional font.

This blue/green deployment behavior is Knative’s default. When updating Services, it ensures that no traffic is lost and that load is only switched when it’s safe.

What’s in the Knative box?

Let’s break this down into subprojects: Serving and Eventing.


Serving is the first and most well-known part of Knative. It encompasses the logic needed to run your software, manage request traffic, keep your software running while you need it and stop it running when you don’t need it.

As a developer, Knative gives you three basic types of document you can use to express your desires: Configuration, Revision and Route.

Configuration is your statement of what your running system should look like. You provide details about the desired container image, environment variables and the like. Knative converts this information into lower-level Kubernetes concepts like Deployments. In fact, those of you with some Kubernetes familiarity might be wondering what Knative is adding. After all, you can create and submit a Deployment yourself, no need to use another component for that.

Which takes us to Revisions. These are snapshots of a Configuration. Each time that you change a Configuration, Knative first creates a Revision and in fact, it’s the Revision which is converted into lower-level primitives.

But this might still seem like overhead. Why bother with this versioning scheme in Knative, when you have git? Because blue/green deployment isn’t the only option. In fact, Knative allows you to create nuanced rules about traffic to multiple Revisions.

For example: when I deployed homepage v2, the deployment was all-or-nothing. But suppose I was worried that changing fonts would affect how long people stay on my page (ie, an A/B test). If I perform an all-or-nothing update, I will get lots of data for the before-and-after, but there may be a number of confounding factors, such as time-of-day effects. Without running both versions side-byside, I can’t control for those variables.

But Knative is able to divvy up traffic to Revisions by percentage. I might decide to send ten percent of my traffic to v2 and ninety percent of my traffic to v1. If the new font turns out to be worse for users, then I can roll it back easily without much fuss. If instead it was a triumph, I can quickly roll forwards, directing one hundred percent of traffic to v2.

It’s this ability to selectively target traffic that makes Revisions a necessity. In vanilla Kubernetes I can roll forward and I can roll back, but I can’t do this with traffic, I can only do it with instances of the service. This has important architectural and operational consequences.

Perhaps you wondered what happened to the Services I was talking about in the walkthrough.

Well, these are a one-stop shop for all things Serving. Each Service combines a Configuration and a Route. This compounding makes common cases easier, because everything you need to know is in one place.

But these concepts aren’t necessarily what’s listed on the marketing flyer. Many of you have come to hear about autoscaling, including scale-to-zero. For many folks, it’s the ability for the platform to scale all the way to zero that captures their imagination: No more wasting money on instances that are mostly idle. And similarly, the ability to scale up: no more getting paged at absurd o’clock in the morning in New York because something huge happened in Sydney (or vice versa). Instead you delegate the business of balancing demand and supply to Knative. Because sometimes you want to understand what the heck it’s doing, I’ll be spending some time delving into the surprisingly difficult world of autoscaling.


Eventing is the second, less-well-known part of Knative. It provides ways to express connections between different pieces of software through events. In practical terms, “this is my software” is simpler to describe than “here is how all my software connects together”. Eventing consequently has a larger surface area, with more document types, than Serving does.

Earlier in the article you learned that in the middle of the Eventing world is where Triggers and Brokers live. The Trigger exists to map from an event filter to a target. The Broker exists to manage the flow of events based on Triggers.

But that’s the headline description, light on detail. For example: how does a CloudEvent get into the Broker? It turns out, there are multiple possibilities. The most powerful and idiomatic of these is a Source. These represent configuration information about some kind of emitter of events and a Broker to which they should be sent. A Source can be more or less anything: Github webhooks, direct HTTP requests, you name it. So long as it emits CloudEvents to a Broker, it can be a Source.

What kinds of events are there? That’s where the Event Registry comes along, providing a catalogue of EventTypes. At a command line you can quickly discover what events you’ll be able to react to.

Great! You’re probably already composing event-processing graphs in your head, and it won’t be long before you get tired of writing Trigger upon Trigger. It’d be handy if you had a simple way to do things in order. This is what Sequences can express for youthat A runs before B. Or maybe you want to do more than one thing at a time. That’s what Parallel does, allowing you to express that A and B can run independently.

Analogous to how Serving provides the convenience of Service, Sequence and Parallel constructed from the same concepts that you can use directly. They’re a convenience, not a constraint. They’ll enable you to assemble event flows with much less YAML than handwiring with the equivalent Triggers.

Beneath these smooth surfaces lies a fair amount of plumbing: Channels, Subscribers, Reply, Addressable and Callable. Right now, these aren’t important. We’ll get to them in a due time. Meanwhile you can do most of what you need to do with some mix of Source, Trigger, Broker, Sequence and Parallel.

Serving and Eventing

By design, you don’t need Serving to use Eventing and you don’t need Eventing to use Serving. But they do mesh pretty well together. For example, if you have long processing pipelines, it’s nice if idle instances don’t sit around burning money waiting on upstream work to finish. Or, if there’s a bottleneck, it’s helpful if that part of the pipeline is scaled up. That’s Eventing gaining a superpower from Serving.

And it works the other way. Serving’s focus is on request/reply designs, the simple, robust but sometimes slow blocking approach. By itself this favors adding functionality to existing services instead of creating new ones. Blocking is still blocking, but blocking on threads is faster than blocking on HTTP. You can easily drift back from microservices to monoliths in costume.

Eventing relieves some of that design pressure. You can now offload a lot of work that doesn’t need to block, or which should react to events instead of following commands. Encouraging smaller units of logic and behavior allows Serving to really shine: autoscaling the GigantoServ™ is better than nothing. But it’s wasteful to burn 100Gb of RAM on a system with three hundred endpoints when only two of them are seeing any kind of traffic surge.

Figure 11. Serving runs the services, Eventing wires them together

In the hit counter system above, I put both Serving and Eventing to work. Serving handles the business of homepage, hit counter and image renderer. Eventing handles the Broker and Services receive and emit events without direct coordination.

That’s all for this article. If you want to learn more about the book, you can preview its contents on our browser-based liveBook platform here.