banker_cover150To understand how MongoDB’s sharding works, you need to know about all the components that make up a sharded cluster and the role of each component in the context of the cluster as a whole. In this article, I’ll help you understand the components of a sharded custer. This article is excerpted from MongoDB in Action. Save 39% on MongoDB in Action with code 15dzamia at manning.com. To understand how MongoDB’s sharding works, you need to know about all the components that make up a sharded cluster and the role of each component in the context of the cluster as a whole. A sharded cluster consists of shards, mongos routers, and config servers, as shown in figure 1.

(image)

Let’s examine each component in figure 1:

  • Shards (upper left) store the application data. In a sharded cluster, only the mongos routers or system administrators should be connecting directly to the shards. Like an unsharded deployment, each shard can be a single node for development and testing, but should be a Replica Set in production.
  • mongos routers (center) cache the cluster metadata and use it to route operations to the correct shard or shards.
  • Config servers (upper right) persistently store metadata about the cluster, including which shard has what subset of the data.

Now, let’s discuss in more detail the role each of these components plays in the cluster as a whole.

Shards: storage of application data

A shard, shown at the upper left of figure 1, is either a single mongod server or a replica set that stores a partition of the application data. In fact, the shards are the only places where the application data gets saved in a sharded cluster. For testing, a shard can be a single mongod server but should be deployed as a replica set in production, because then it will have its own replication mechanism and can fail over automatically. You can connect to an individual shard just as you would to a single node or a replica set, but if you try to run operations on that shard directly, you’ll see only a portion of the cluster’s total data. Mongos router: router of operations Because each shard contains only part of the cluster’s data, you need something to route operations to the appropriate shards. That’s where mongos comes in. The mongos process, shown in the center of figure 1, is a router that directs all reads, writes, and commands to the appropriate shard. In this way, mongos provides clients with a single point of contact with the cluster, which is what enables a sharded cluster to present the same interface as an unsharded one.

Mongos router: router of operations

Because each shard contains only part of the cluster’s data, you need something to route operations to the appropriate shards. That’s where mongos comes in. The mongos process, shown in the center of figure 1, is a router that directs all reads, writes, and commands to the appropriate shard. In this way, mongos provides clients with a single point of contact with the cluster, which is what enables a sharded cluster to present the same interface as an unsharded one.

mongos processes are lightweight and nonpersistent1. Because of this, they are often deployed on the same machines as the application servers, ensuring that only one network hop is required for requests to any given shard. In other words, the application connects locally to a mongos, and the mongos manages connections to the individual shards.

Config servers: storage of metadata

mongos processes are nonpersistent, which means something must durably store the metadata needed to properly manage the cluster. That’s the job of the config servers, shown in the top right of figure 1. This metadata includes the global cluster configuration; the locations of each database, collection, and the particular ranges of data therein; and a change log preserving a history of the migrations of data across shards.

The metadata held by the config servers is central to the proper functioning and upkeep of the cluster. For instance, every time a mongos process is started, the mongos fetches a copy of the metadata from the config servers. Without this data, no coherent view of the shard cluster is possible. The importance of this data, then, informs the design and deployment strategy for the config servers.

If you examine figure 1, you’ll see there are three config servers, but they’re not deployed as a replica set. They demand something stronger than asynchronous replication; when the mongos process writes to them, it does so using a two-phase commit. This guarantees consistency across config servers. You must run exactly three config servers in any production deployment of sharding, and these servers must reside on separate machines for redundancy.2

Now you know what a shard cluster consists of, but you’re probably still wondering about the sharding machinery itself. How is data actually distributed? We’ll explain that in the next section, first introducing the two ways to shard in MongoDB, and then covering the core sharding operations.

Distributing data in a sharded cluster

Before discussing the different ways to shard, let’s first discuss how data is grouped and organized in MongoDB. This topic is relevant to a discussion of sharding, because it illustrates the different boundaries on which we can partition our data.

To illustrate this, we’ll use a Google Docs–like application. Figure 2 shows how the data for such an application would be structured in MongoDB.

(image)

Looking at the figure from the innermost box moving outward, you can see there are four different levels of granularity in MongoDB: document, chunk, collection. and database.

These four levels of granularity represent the units of data in MongoDB:

  • Document—The smallest unit of data in MongoDB. A document represents a single object in the system and can’t be divided further. You can compare this to a row in a relational database. Note that we consider a document and all its fields to be a single atomic unit. In the innermost box in figure 2, you can see a document with a username field with a value of “hawkins”.
  • Chunk—A group of documents clustered by values on a field. A chunk is a concept that exists only in sharded setups. This is a logical grouping of documents based on their values for a field or set of fields, known as a shard key. We’ll cover the shard key when we go into more detail about chunks later in this article. The chunk shown in figure 2 contains all the documents that have the field username with values between “bakkum” and “verch”.
  • Collection—A named grouping of documents within a database. To allow users to separate a database into logical groupings that make sense for the application, MongoDB provides the concept of a collection. This is nothing more than a named grouping of documents, and it must be explicitly specified by the application to run any queries. In figure 2, the collection name is spreadsheets. This collection name essentially identifies a subgroup within the cloud-docs database, which we’ll discuss next.
  • Database—Contains collections of documents. This is the top-level named grouping in the system. Because a database contains collections of documents, a collection must also be specified to perform any operations on the documents themselves. In figure 2, the database name is cloud-docs. To run any queries, the collection must also be specified—spreadsheets in our example. The combination of database name and collection name together is unique throughout the system, and is commonly referred to as the namespace. It is usually represented by concatenating the collection name and database together, separated by a period character. For the example shown in figure 2, that would look like cloud-docs.spreadsheets.

Ways data can be distributed in a sharded cluster

So now you know the different ways in which data is logically grouped in MongoDB. The next question is, how does this interact with sharding? On which of these groupings can we partition our data? The quick answer to this question is that data can be distributed in a sharded cluster on two of these four groupings:

  • On the level of an entire database, where each database along with all its collections is put on its own shard
  • On the level of partitions or chunks of a collection, where the documents within a collection itself are divided up and spread out over multiple shards, based on values of a field or set of fields called the shard key in the documents

You may wonder why MongoDB does partitioning based on chunks rather than on individual documents. It seems like that would be the most logical grouping, because a document is the smallest possible unit. But when you consider the fact that not only do we have to partition the data, but we also have to be able to find it again, you’ll see that if we partition on a document level—for example, by allowing each spreadsheet in our Google Docs–like application to be independently moved around—we need to store metadata on the config servers, keeping track of every single document independently. If you imagine a system with small documents, half of your data may end up being metadata on the config servers just keeping track of where your actual data is stored.

Granularity jump from database to partition of collection

You may also wonder why there’s a jump in granularity from an entire database to a partition of a collection. Why isn’t there an intermediate step where we can distribute on the level of whole collections, without partitioning the collections themselves?
The real answer to this question is that it’s completely theoretically possible. It just hasn’t been implemented yet.3 Fortunately, because of the relationship between databases and collections, there’s an easy workaround. If you’re in a situation where you have different collections, say files.spreadsheets and files.powerpoints, that you want to be put on separate servers, just store them in separate databases. For example, you could store spreadsheets in files_spreadsheets.spreadsheets and PowerPoint files in files_powerpoints.powerpoints. Because files_spreadsheets and files_powerpoints are two separate databases, they’ll be distributed, and thus so will the collections.

Now we’ll discuss distributing entire databases.

Distributing databases to shards

As you create new databases in a sharded cluster, each database is assigned to a different shard. If you do nothing else, a database and all its collections will live forever on the shard where they were created. The databases themselves don’t even need to be sharded.

Because the name of a database is specified by the application, you can think of this as a kind of manual partitioning. MongoDB has nothing to do with how well partitioned your data is. To see why this is manual, consider using this method to shard the spreadsheets collection in our documents example. To shard this two ways using database distribution, you’d have to make two databases—say files1 and files2 —and evenly divide the data between the files1.spreadsheets and the files2.spreadsheets collections. It’s completely up to you to decide how to decide which spreadsheet goes in which collection and come up with a scheme to query the appropriate database to find them later. This is a difficult problem, which is why we don’t recommend this approach for this type of application.

When is the database distribution method really useful? One example of a real application for database distribution is MongoDB as a service. In one implementation of this model, customers can pay for access to a single MongoDB database. On the backend, each database is created in a sharded cluster. This means that if each client uses roughly the same amount of data, the distribution of the data will be optimal simply due to the distribution of the databases throughout the cluster.

Sharding within collections

Now, we will see the more powerful form of MongoDB sharding: sharding an individual collection. This is what the phrase automatic sharding refers to, since this is the form of sharding in which MongoDB itself makes all the partitioning decisions, without any direct intervention from the application.

To allow for partitioning of an individual collection, MongoDB defines the idea of a chunk, which as you saw earlier is a logical grouping of documents, based on the values of a predetermined field or set of fields called a shard key. It’s the user’s responsibility to choose the shard key.

For example, consider the following document from a spreadsheet management application:

{
_id: ObjectId("4d6e9b89b600c2c196442c21") 
filename: "spreadsheet-1",
updated_at: ISODate("2011-03-02T19:22:54.845Z"), 
username: "banks",
data: "raw document data"
}

If all the documents in our collection have this format, we can, for example, choose a shard key of the _id field and the username field. MongoDB will then use that information in each document to determine what chunk the document belongs to.

How does MongoDB make this determination? At its core, MongoDB’s sharding is range based; this means that each “chunk” represents a range of shard keys. When MongoDB is looking at a document to determine what chunk it belongs to, it first extracts the values for the shard key, and then finds the chunk whose shard key range contains the given shard key values.

To give a concrete example of what this looks like, imagine that we chose a shard key of username for this spreadsheets collection, and we have two shards, “A” and “B.” Our chunk distribution may look something like that shown in table 1.

(image)

Looking at the table, it becomes a bit clearer what purpose chunks serve in a sharded cluster. If we gave you a document with a username field of “Babbage”, you would immediately know that it should be on shard A, just by looking at the table above. In fact, if we gave you any document that had a username field, which in this case is our shard key, you’d be able to use table 1 to determine which chunk the document belonged to, and from there determine which shard it should be sent to.