From API Security in Action by Neil Madden

In this article, you’ll implement capability-based access control techniques that enable secure sharing by taking the principle of least authority (POLA) to its logical conclusion and allowing fine-grained control over access to individual resources. Along the way, you’ll see how capabilities prevent a general category of attacks against APIs known as confused deputy attacks.

Take 37% off API Security in Action. Just enter fccmadden into the discount code box at checkout at

Sometimes identity-based access controls come into conflict with other principles of secure API design. For example, if a Natter (the on-going toy social network that is developed throughout the book) user wishes to share a message that they wrote with a wider audience, they want to copy a link to it, but this won’t work unless the users they’re sharing the link with are also members of the Natter social space it was posted to. Otherwise they won’t be granted access. The only way to grant those users access to that message is to either make them members of the space, which violates the principle of least authority (because they now have access to all the messages in that space), or else to copy and paste the whole message into a different system.

People naturally share resources and delegate access to others to achieve their goals, and an API security solution should make this simple and secure; otherwise, your users will find insecure ways to do it anyway.

DEFINITION  A confused deputy attack occurs when a component of a system with elevated privileges can be tricked by an attacker into carrying out actions that the attacker themselves aren’t be allowed to perform.


Capability-based security

A capability is an unforgeable reference to an object or resource together with a set of permissions to access that resource. To illustrate how capability-based security differs from identity-based security, consider the following two ways to copy a file on UNIX[1] systems:

  • cp a.txt b.txt
  • cat <a.txt >b.txt

The first, using the cp command, takes as input the name of the file to copy and the name of the file to copy it to. The second, using the cat command, instead takes as input two file descriptors; one opened for reading and the other opened for writing. It then reads the data from the first file descriptor and writes it to the second.

Definition  A file descriptor is an abstract handle that represents an open file along with a set of permissions on that file. File descriptors are a type of capability.


If you think about the permissions that each of these commands needs, the cp command needs to be able to open any file which you can name for both reading and writing. To allow this, UNIX runs the cp command with the same permissions as your own user account, allowing it to do anything you can do, including deleting all your files and emailing your private photos to a stranger. This violates POLA (the principle of least authority, also known as the principle of least privilege, says that all users and processes in a system should be given only those permissions that they need to do their job—no more, and no less) , because the command is given far more permissions than it needs. The cat command, on the other hand, needs to read from its input and write to its output. It doesn’t need any permissions at all (but UNIX gives it all your permissions anyway). A file descriptor is an example of a capability, because it combines a reference to some resource along with a set of permissions to act on that resource.

Compared with the more dominant identity-based access control techniques, capabilities have several differences:

  • Access to resources is via unforgeable references to those objects that also grant authority to access that resource. In an identity-based system anybody can attempt to access a resource, but they might be denied access. In a capability-based system it’s impossible to send a request to a resource if you don’t have a capability to access it. For example, it’s impossible to write to a file descriptor that doesn’t exist.
  • Capabilities provide fine-grained access to individual resources, and often support POLA more naturally than identity-based systems. It’s easier to delegate a small part of your authority to somebody else by giving them some capabilities without giving them access to your whole account.
  • The ability to easily share capabilities can make it harder to determine who has access to which resources via your API. In practice this is often true for identity-based systems too, as people share access in other ways (such as by sharing passwords).
  • Some capability-based systems don’t support revoking capabilities after they’ve been granted. When revocation is supported, revoking a widely shared capability may deny access to more people than was intended.

One of the reasons why capability-based security is less widely used than identity-based security is due to the widespread belief that capabilities are hard to control due to easy sharing and the apparent difficulty of revocation. In fact, these problems are solved by real-world capability systems as discussed in the paper Capability Myths Demolished ( by Mark S. Miller, Ka-Ping Yee, and Jonathan Shapiro. To take one example, it’s often assumed that capabilities can be used only for discretionary access control, because the creator of an object (such as a file) can share capabilities to access that file with anyone. In a pure capability system, communications between people are also controlled by capabilities (as is the ability to create files in the first place), and if Alice creates a new file, she can share a capability to access this file with Bob only if she has a capability allowing her to communicate with Bob. Nothing prevents Bob from asking Alice in person to perform actions on the file, but this is a problem that no access control system can prevent.

A brief history of capabilities

Capability-based security was first developed in the context of operating systems in the 1970s and has been applied to programming languages and network protocols. The IBM System/38, which was the predecessor of the successful AS/400 (now IBM i), used capabilities for managing access to objects. In the 1990s, the E programming language ( combined capability-based security with object-oriented programming to create object-capability-based security (or ocaps), where capabilities are normal object references in a memory-safe OO programming language. Object-capability–based security fits well with conventional wisdom regarding good OO design and design patterns, because both emphasize eliminating global variables and avoiding static methods that perform side-effects.

E also included a secure protocol for making method calls across a network using capabilities. This protocol has been adopted and updated by the Cap’n Proto ( framework, which provides an efficient binary protocol for implementing APIs based on remote procedure calls. Capabilities are also now making an appearance on popular websites and REST APIs, including those from Google and Dropbox.


Capabilities and REST

The examples this far have been based on operating-system security, but capability-based security can also be applied to REST APIs available over HTTP. For example, suppose you’ve developed a Natter iOS app which allows the user to select a profile picture and you want to allow users to upload a photo from their Dropbox account. Dropbox supports OAuth2 for third-party apps, but the access allowed by OAuth2 scopes is relatively broad; typically, a user can grant access only to all their files or else create an app-specific folder separate from the rest of their files. This can work well when the application needs regular access to lots of your files, but in this case your app needs only temporary access to download a single file chosen by the user. It violates POLA to have to grant permanent read-only access to your entire Dropbox to upload one photo. Although OAuth scopes are great for restricting permissions granted to third-party apps, they tend to be static and applicable to all users. Even if you had a scope for each individual file, the app must already know which file it needed access to at the point of making the authorization request.[2]

To support this use case, Dropbox developed the Chooser and Saver APIs (see and, which allow an app developer to ask the user for one-off access to specific files in their Dropbox. Rather than starting an OAuth flow, the app developer calls an SDK function which displays a Dropbox-provided file selection UI as shown in figure 1. Because this UI is implemented as a separate browser window running on and not as part of the third-party app, it can show all the user’s files. When the user selects a file, Dropbox returns a capability to the application that allows it to access the file that the user selected for a short period of time (four hours currently for the Chooser API).

Figure 1. The Dropbox Chooser UI allows a user to select individual files to share with an application. The app’s given time-limited read-only access to the files the user selected.

The Chooser and Saver APIs provide a number of advantages over a normal OAuth2 flow for this simple file sharing use case:

  • The app author doesn’t have to decide ahead of time what resource it needs to access. Instead they tell Dropbox that they need a file to open or to save data and Dropbox lets the user decide which file to use. The app never gets to see a list of the user’s other files at all.
  • Because the app isn’t requesting long-term access to the user’s account, there’s no need for a consent page to ensure the user knows what access they’re granted. Selecting a file in the UI implicitly indicates consent and because the scope is fine-grained, the risks of abuse are much lower.
  • The UI is implemented by Dropbox and it’s consistent for every app and web page that uses the API. Little details like the “Recent” menu item work consistently across all apps.

For these use cases, capabilities provide an intuitive and natural user experience which is significantly more secure than the alternatives. It’s often assumed that there’s a natural trade-off between security and usability: the more secure a system is, the harder it must be to use. Capabilities seem to defy this conventional wisdom, because moving to a more fine-grained management of permissions allows more convenient patterns of interaction. The user chooses the files they want to work with, and the system grants the app access to only those files, without needing a complicated consent process.

Confused deputies and ambient authority

Many common vulnerabilities in APIs and other software are variations on what is known as a confused deputy attack, but many kinds of injection attack and XSS are also caused by the same issue. The problem occurs when a process is authorized to act with your authority (as your “deputy”), but an attacker can trick that process to carry out malicious actions. The original confused deputy ( was a compiler running on a shared computer. Users could submit jobs to the compiler and provide the name of an output file to store the result to. The compiler also keeps a record of each job for billing purposes. Somebody realized that they could provide the name of the billing file as the output file and the compiler happily overwrites it, losing all records of who had done what. The compiler had permissions to write to any file and this could be abused to overwrite a file that the user themselves could not access.

In CSRF (cross-site request forgery) the deputy is your browser which has been given a session cookie after you logged in. When you make requests to the API from JavaScript, the browser automatically adds the cookie to authenticate the requests. The problem is if a malicious website makes requests to your API, then the browser also attaches the cookie to those requests, unless you take additional steps to prevent that. Session cookies are an example of ambient authority: the cookie forms part of the environment in which a web page runs and is transparently added to requests. Capability-based security aims to remove all sources of ambient authority and instead require that each request is specifically authorized according to POLA.

DEFINITION  When the permission to perform an action is automatically granted to all requests that originate from a given environment this is known as ambient authority. Examples of ambient authority include session cookies and allowing access based on the IP address a request comes from. Ambient authority increases the risks of confused deputy attacks and should be avoided whenever possible.


Capabilities as URIs

File descriptors rely on special regions of memory which can be altered only by privileged code in the operating system kernel to ensure that processes can’t tamper or create fake file descriptors. Capability-secure programming languages are also able to prevent tampering by controlling the runtime in which code runs. For a REST API, this isn’t an option because you can’t control the execution of remote clients, and another technique needs to be used to ensure that capabilities can’t be forged or tampered with. Using unguessable large random strings or using cryptographic techniques to authenticate the tokens, you can create unforgeable tokens.[3] You can reuse authentication token formats to create capability tokens, but there are several important differences:

  • Token-based authentication conveys the identity of a user, from which their permissions can be looked up. A capability directly conveys some permissions and doesn’t identify a user at all.
  • Authentication tokens are designed to be used to access many resources under one API, and aren’t tied to any one resource. Capabilities are instead directly coupled to a resource and can be used to access only that resource. You use different capabilities to access different resources.
  • A token is typically short-lived as it conveys wide-ranging access to a user’s account. A capability on the other hand can live longer as it has a much narrower scope for abuse.

REST already has a standard format for identifying resources, the URI, and this is the natural representation of a capability for a REST API. A capability represented as a URI is known as a capability URI. Capability URIs are widespread on the web, in the form of links sent in password reset emails, GitHub Gists, and document sharing as in the Dropbox example.

DEFINITION  A capability URI (or capability URL) is a URI that both identifies a resource and conveys a set of permissions to access that resource. Typically, a capability URI encodes an unguessable token into some part of the URI structure.


To create a capability URI, you can combine a normal URI with a security token. Several ways can accomplish this, as shown in figure 2.

Figure 2. You can encode a security token into a URI in numerous ways. You can encode it into the resource path, or you can provide it using a query parameter. More sophisticated representations encode the token into the fragment or userinfo elements of the URI, but these require some client-side parsing.

A commonly used approach is to encode a random token into the path component of the URI, which is what the Dropbox Chooser API does, returning URIs like the following:

In the Dropbox case the random token is encoded into a prefix of the file path. Although this is a natural representation, it means that the same resource may be represented by URIs with completely different paths depending on the token, and a client that receives access to the same resource through different capability URIs may not be aware that they refer to the same resource. An alternative is to pass the token as a query parameter, in which case the Dropbox URI looks like the following:

A standard form for such URIs when the token is an OAuth2 token is defined by RFC 6750 ( using the parameter name access_token. This is often the simplest approach to implement because it requires no changes to existing resources, but it shares some security weaknesses with the path-based approach:

  • Both URI paths and query parameters are frequently logged by web servers and proxies, which can make the capability available to anybody who has access to the logs. Using TLS prevents proxies from seeing the URI, but a request may still pass through several servers unencrypted in a typical deployment.
  • The full URI may be visible to third parties through the HTTP Referer header or the window.referrer variable exposed to content running in an HTML iframe. You can use the Referrer-Policy header and rel=”noreferrer” attribute on links in your UI to prevent this leakage. See for details.
  • URIs used in web browsers may be accessible to other users by looking at your browser history.

To harden capability URIs against these threats you can encode the token into the fragment component or the URI or even the userinfo part that was originally designed for storing HTTP Basic credentials in a URI. Neither the fragment nor the userinfo component of a URI are sent to a web server by default, and they’re both stripped from URIs communicated in Referer headers.

Credentials in URIs: a lesson from history

The desire to share access to private resources by sharing a URI is not new. For a long time, browsers supported encoding a username and password into a HTTP URL in the form When such a link was clicked, the browser sends the username and password using HTTP Basic authentication. Though convenient, this is widely considered to be a security disaster. For a start, sharing a username and password provides full access to your account to anybody who sees the URI. Secondly, attackers soon realized that this could be used to create convincing phishing links such as An unsuspecting user sees the domain at the start of the link and assume it was genuine, when in fact this is only a username and they’ll be sent to a fake login page on the attacker’s site. To prevent these attacks browser vendors have stopped supporting this URI syntax and most now aggressively remove login information when displaying or following such links. Although capability URIs are significantly more secure than directly sharing a password, you should still be aware of any potential for misuse if you display URIs to users.



  1. Which of the following are good places to encode a token into a capability URI?
    1. The fragment
    2. The hostname
    3. The scheme name
    4. The port number
    5. The path component
    6. The query parameters
    7. The userinfo component
  1. Which of the following are differences between capabilities and token-based authentication?
    1. Capabilities are bulkier than authentication tokens
    2. Capabilities can’t be revoked, but authentication tokens can
    3. Capabilities are tied to a single resource, but authentication tokens are applicable to all resources in an API
    4. Authentication tokens are tied to an individual user identity, but capability tokens can be shared between users
    5. Authentication tokens are short-lived, but capabilities often have a longer lifetime.

That’s all for this article.

If you want to learn more about the book, check it out on our browser-based liveBook reader here and see this slide deck.


[1]This example is taken from “Paradigm Regained: Abstraction Mechanisms for Access Control”, see

[2]Proposals that make OAuth work better for these kinds of transactional one-off operations, such as, still require the app to know what resource it wants to access before it begins the flow.

[3]The E language distributed protocol, and Cap’n Proto, instead use per-connection tables of capabilities so that a capability can only be used over the communication channel that it was originally transmitted. This violates the principle of statelessness in REST and in practice relies on the cryptographic integrity of the communication channels anyway, and we’ll ignore that solution here in favor of cryptographic capabilities.