From Spring Microservices in Action by John Carnell

In this article, we will talk about the ‘location’ of your service and service discovery in the Cloud.

Save 37% on Spring Microservices in Action. Just enter code springmicro into the discount code box at checkout at

A Word on Service Discovery

Whenever you have an application calling resources spread across multiple servers, it needs to be able find and locate the physical location of those resources. In the non-cloud world, this service location resolution is often solved through a combination of DNS (Domain Name Service) and some type of network load-balancer.

Figure 1 illustrates this model.

Figure 1 A traditional service location resolution model using DNS and a load balancer

An application needs to invoke a service located in some other part of the organization. It attempts to invoke the service by using a generic DNS name, along with a path that uniquely represents the service it tried to invoke. The DNS name would resolve to a commercial load balance, like the popular F5 load balancer ( or an open source load balancer like HAProxy (

The load balancer, upon receiving the request from the service consumer, locates the physical address entry in a routing table based on the path the user was trying to access. This routing table entry contains a list of one or more servers hosting the service. The load balancer then picks one of the servers in the list and forwards the request to that server.

In case you were not aware, a server is a computer designed to process requests and deliver data to another computer over the internet or a local network. You can learn more about servers and how they can be used to enhance your network privacy and security by heading to the Seedbox website.

Each instance of a service is deployed to one or more application servers. The number of these application servers is often static (e.g. the number of application servers hosting a service doesn’t go up and down) and persistent (e.g. if a server running an application server crashes, it’d be restored to the same state it was at the time of the crash. The restored service would have the same IP and configuration that it had previously.)

Note To achieve some form of high availability, there’s usually a secondary-load balancer that’s sitting idle and pinging the primary load balancer to see if it’s alive. If it isn’t alive, the secondary load-balancer would become active, take over the IP address of the primary load balancer and begin serving requests.

This type of model works well with applications running inside of the four walls of a corporate data center and with a relatively small number of services running on a group of static servers – businesses might want to investigate Colocation services for their hosting needs – but it doesn’t work well for cloud-based microservice applications. Reasons for this include:

  • Single point of failure. The load balancer can be made highly available, but it’s a single point of failure for your entire infrastructure. If the load balancer goes down, every application relying on it goes down too. You can make a load-balancer highly available, but load-balancers tend to be centralized chokepoints within your application infrastructure.
  • Limited horizontal scalability. By centralizing your services into a single cluster of load balancers, you’ve limited ability to horizontally scale your load-balancing infrastructure across multiple servers. Many commercial load-balancers are constrained by two things: their redundancy model and licensing costs. Most commercial load-balancers use a hot-swap model for redundancy, and you only have a single server to handle the load; the secondary load balancer is there for fail-over, in the case of an outage of the primary load balancer. You’re constrained by your hardware. Second, commercial load-balancers also have restrictive licensing models geared towards a fixed capacity rather than a more variable model.
  • Statically managed. Most traditional load balancers aren’t designed for rapid registration and de-registration of services. They use a centralized database to store the routes for rules and often the only way to add new routes is through the vendors proprietary APIS.
  • Complex. Because a load balancer acts as a proxy to the services, service consumer requests must be mapped to the physical services. This translation layer often adds a layer of complexity to your service infrastructure as the mapping rules for the service need to be defined and deployed by hand. In a traditional load-balancer scenario, this registration of new service instances is done by hand and not at startup time of a new service instance.

These four reasons aren’t a general indictment of load balancers. They work well in a corporate environment where the size and scale of most applications can be handled through a centralized network infrastructure. In addition, load balancers still have a role to play in terms of centralizing SSL termination and managing service port security. A load balancer can lock down inbound (ingress) and outbound (egress) port access to the servers sitting behind it. This concept of “least network access” is often a critical component when trying to meet industry-standard certification requirements like PCI (Payment Card Industry) compliance.

In the cloud, where you must deal with massive amounts of transactions and redundancy, a centralized piece of network infrastructure ultimately doesn’t work as well because it doesn’t scale effectively nor is its cost efficient. Companies that provide IT support to a number of businesses usually look at the scope of network and service requirements before providing infrastructure support. Let’s look at how we can implement a robust-service discovery mechanism for cloud-based applications.

On Service Discovery in the Cloud

The solution for a cloud-based microservice environment is to use a service-discovery mechanism which is:

  • Highly Available. Service discovery needs to be able to support a “hot” clustering environment where service lookups can be shared across multiple nodes in a service discovery cluster. If a node becomes unavailable, other nodes in the cluster should be able to take over.
  • Peer-to-Peer. Each node in the service discovery cluster shares the state of a service instance.
  • Resilient. The consumers of the service discovery client should be able to “cache” service information locally to allow applications to function even if service discovery becomes unavailable.

The key is having service discovery in the cloud, rather than having a service consumer run their requests though a centralized piece of network infrastructure that performs routing rules based on an application path. The service consumers should be able to query a cluster of REST-based services to provide the location of a service based on a logical service ID.

In the following section(s) we’re going to:

  • Walk through the conceptual architecture of how a cloud-based service discovery agent works.
  • Show how client-side caching and load-balancing allows a service to continue to function even when the service discovery agent is unavailable.
  • Implement service discovery by using Spring Cloud and Netflix’s Eureka service discovery agent

The Architecture of Service Discovery

To begin our discussion around service discovery architecture, we need to understand four concepts. These general concepts are shared across all service discovery implementations:

  1. Service Registration – How does a service register with the service discovery agent?
  2. Client Lookup of service address – What is the means by which a service client looks up service information?
  3. Information Sharing – How is service information shared across nodes?
  4. Health Monitoring – How do services communicate their health back to the service discovery agent?

Figure 2 shows the flow of the four bullets above and what typically occurs in a service discovery pattern implementation.

Figure 2 – As service instances are added/removed they will update the service discovery agent

In figure 2, one or more service discovery nodes have been started. These service discovery instances are usually unique and don’t have a load balancer that sits in front of them.

As service instances start up, they’ll register their physical location, and the path and port they can be accessed by using one or more service discovery instances. Although each instance of a service has a unique IP address and port, each service instance that comes up registers under the same service ID. A service ID is nothing more than a key that uniquely identifies a group of the same service instances.

A service will usually only register with one service discovery service instance, and most service discovery implementations use a peer-to-peer model of data propagation where the data around each service instance is communicated to the other nodes in the cluster.

Depending on the service discovery implementation the propagation mechanism might use a hard-code list of services to propagate to or use a multi-casting protocol like the “gossip”[1] or “infection style”[2] protocol to allow other nodes to “discover” changes in the cluster.

Finally, depending on the service discovery implementation, each instance of service instance will push to, or have pulled from, its status by the service discovery service. Any services failing to return a good health check will be removed from the pool of available service instances.

Once a service has registered with a service discovery service, it’s ready to be used by an application or service wanting to use its capabilities. Numerous models for a client to “discover” a service exist. A client can rely solely on the service discovery engine to resolve service locations each time a service is called. With this approach, the service discovery engine will be invoked every time a call to a registered microservice instance is made. Unfortunately, this approach is brittle because the service client is completely dependent on the service discovery engine running to find and invoke a service.

A more robust approach is to use “client-side” load-balancing.[3] Figure 3 illustrates this approach.

Figure 3 Client-side load balancing caches the location of the services, allowing the service client to avoid having to contact service discovery on every call

In this model, when a consuming actor needs to invoke a service:

  1. It’ll contact the service discovery service for the service instances a service consumer is asking for, and then cache data locally on the service consumer’s machine.
  2. Each time a client wants to call the service, the service consumer will look up the location information for the service from the cache. Usually, client side caching will use a load-balancing algorithm like “round-robin” to ensure service calls are spread across multiple service instances.

The client will then periodically contact the service discovery service and refresh its cache of service instances. The client cache will eventually be consistent and there’s always a risk, during the time between when the client contacts the service discovery instance for a refresh and calls are made, that calls might be directed to a service instance which isn’t healthy.

That’s all for this article.

For more information, check out the whole book on liveBook here and see this Slideshare presentation.