From Software Telemetry by Jamie Riedesel
This article is about identifying a cardinality problem in your telemetry storage systems.
The broadest measure that you might have a cardinality problem is slow search performance, but that can be caused by many things. Symptoms of cardinality problems depend on your storage system, but include
- Slow search performance. This is the symptom most people notice, because search performance is a primary feature of Presentation Stage systems. Unfortunately, many things can cause this without there being a cardinality problem. It’s a sign to look for one, though.
- Increased memory usage for normal operations. Many storage systems keep indexes in memory. As those indexes grow, memory usage does as well. Most of these storage systems allow reading indexes from disk if they won’t fit in memory, which greatly slows down search performance. Relational databases like MySQL are famous for this pattern.
- Increased memory usage for routine scheduled operations. Scheduled optimization procedures in some storage systems are impacted by cardinality. InfluxDB (as of version 2.0) performs compaction operations regularly, and high cardinality leads to increased (often much increased) memory usage versus the rest of the time.
- Decreases in ability to insert new data. As indexes get larger, they need to be updated. Indexing efficiency varies by storage engine, and not all are as good at it. The overhead of handling inserting new values into the indexes can, for some systems, scale up as the unique-value count increases, which in turn reduces the ability to insert new data into the system.
- Increased time to allow querying after starting the database. Some storage systems need to load indexes into memory before being able to query. The larger the indexes, the longer this process takes. Because stateful systems like these are restarted infrequently, and this problem is one that can surprise you at a bad time.
- Increases in consumed disk-space that scales higher than your ingestion rate. Some storage systems keep indexes in separate files from table data. Each time you insert new data with a new unique value, the storage system needs to update the table-data with the new value as well as update any indexes and their files. In other systems, such as Elasticsearch, every new piece of data gets all fields, even if those fields have a null value. Therefore, if you’ve ten thousand fields, and a new event is inserted with fifteen fields, that new event will have 9,985 nulled fields on it.
We’ll go over two broad types of storage system, and the ways cardinality impacts each. Cardinality in time-series databases is covered first, followed by cardinality in logging databases. The third Pillar of Observability, traces, is currently dominated by Software-as-a-Service platforms. The dominant self-hosted platform is Jaeger, which sits on top of either Cassandra or Elasticsearch and inherits the cardinality problems of those platforms (which are covered here).
Cardinality in time-series databases
Time-series databases are optimized for serving data organized by time, which is why time-series databases form the foundation of many metrics-style telemetry systems. The common design goal of time-series databases is to enable fast searches of recent data (the most common type of search in a metrics presentation system), and make aggregating data over time easier. The four time-series databases I cover here are:
- OpenTSDB, which was the first time-series database to break out, and was created by StumbledUpon and based on Hadoop.
- KairosDB, an open-source time-series database based on Cassandra.
- Prometheus, part of a larger monitoring platform, and made famous by SoundCloud and it’s a member of the Cloud Native Computing Foundation.
- InfluxDB, another stand-alone metrics datastore which is now part of a suite of utilities put out by InfluxData.
OpenTSDB has a strict cardinality limit per field of sixteen million unique values. Although this is generous, using a field to store highly unique information such as IP address or the container ID in a Kubernetes pod quickly runs your key out of space. You’ll notice this happening by not getting new values in your metrics.
The other area cardinality comes into play with OpenTSDB is in the queries themselves. Queries that require a lot of fields, or involve fields that have a lot of cardinality are slower than queries involving little cardinality. You can speed things up by using multiple region servers (a Hadoop concept) in your OpenTSDB cluster, which allows OpenTSDB to partition or shard data across the many region servers. When split up this way, queries are submitted to each region, processed in parallel, and then reassembled. Parallel processing makes the query much faster. Figure 1 demonstrates this process.
Figure 1 How OpenTSDB splits up queries for high cardinality data. The Time Series Daemon receives the query from the client and then redistributes it to the Region Servers. The Region Servers then query their local storage and deliver any local results to the TSD. The TSD then reassembles all the results to return to the client. A slow performing high cardinality query is made faster through parallel processing.
KairosDB doesn’t have any explicit cardinality limits, but how it operates provides some defacto limits to cardinality. Four main pieces of data for a metric are:
- The metric name
- The timestamp
- The value
- Any key=value pairs associated with the metric
A critical thing to understand with KairosDB is that the key=value pairs aren’t indexed. Using them in a query slows your query down, and the more unique values in a specific key, the slower your query gets. To use an SQL example of what’s happening, let us look at how KairosDB returns results for a metric named pdf_pages and with the tag datacenter=euc1 set. The SQL-like first query KairosDB performs when processing a user-query is:
SELECT metric_value, tags FROM metrics_table WHERE metric_name = 'pdf_pages' AND timestamp > abc AND timestamp < xyz
Which returns a list of all metrics with the name pdf_pages in it. KairosDB then walks through the entire result-set looking for tags fields that contain datacenter=euc1. The more unique values in the tags fields, the more data KairosDB can throw away before returning results, which in turn means queries cause a lot of storage I/O for few results. Figure 2 shows this process.
Figure 2 The two passes KairosDB takes when responding to a query using tags. The first pass uses Cassandra and its indexing, and it‘s pretty fast. The second pass goes over the result set one by one looking for rows with the right tags. For tags with high cardinality (high uniqueness) most of the result-set is discarded and few rows match. For tags with low cardinality a large percentage of the returned rows l match. The storage charge for the query is the same for high and low cardinality queries, but high cardinality queries require more I/O capacity to resolve.
If your KairosDB infrastructure is consuming a lot of storage I/O capacity and feels slow, you likely have a cardinality problem in your tags. In general, only use tags to indicate low cardinality data. Our example uses datacenter as tag, which has fewer than ten values for this example company. Keep in mind that the first phase of the query, done on Cassandra, has its own cardinality issues; too many metrics slows that query down as well. How much is too much? Keep an eye on your query performance, because this is the only way to tell.
Prometheus says in its documentation to not put too many metrics in it, and the limiting factor is memory on the Prometheus server. Right now (2021), a Prometheus server with two million time-series (metric name, times each key-value) needs 8GB of memory to ingest and process queries. If you find your Prometheus server is running out of memory, you likely have a cardinality problem.
InfluxDB (up to version 2.0) has an explicit cardinality limit that must be set, which applies per database. When this cardinality limit is reached, InfluxDB won’t accept metrics that attempt to increase cardinality. You’ll notice this limit has been reached by not seeing new data in the affected database. Setting the limit high works to a point, but when InfluxDB performs shard compaction (a shard is the metrics data for a regular length of time) the amount of memory this takes is directly related to cardinality in the database.
InfluxDB lets you declare multiple databases, which is one way to get around the cardinality problems. Keep in mind that shard-compaction needs to happen for each database, and when those events overlap quite significant memory can be required to complete. You can tell you have exceeded a database-server’s ability to support the InfluxDB databases when the InfluxDB process runs out of memory. On Linux systems, the OOMKiller terminates the InfluxDB process unless you’ve configured the kernel to avoid the process.
Cardinality in logging databases
This section is about the databases used to host centralized logging data. The most famous of these databases is the E in ELK Stack: Elasticsearch. Another document-oriented database that sees a lot of use for centralized logging is MongoDB. We look at both to show the similarities.
Cardinality in Elasticsearch behaves rather different than we saw with the time-series databases earlier. An index in Elasticsearch contains a list of fields, and Elasticsearch further indexes each field to enable searching. Because Elasticsearch was initially built for searching plain language, making for highly unique fields, it’s less bothered by individual field cardinality than the time-series databases. Also, Elasticsearch shards its database by default, and you get parallelization working for you when resolving queries similar to how Cassandra and OpenTSDB work (see figure 1 for that process).
Elasticsearch experiences cardinality pressure in two areas:
- Disk-space consumption by field indexes. The indexes for each field take up space. Depending on the mapping type set for each field, and the text-analysis settings you chose, the amount of storage consumed can be quite variable. Large files slow down search performance due to having to sift through lots of disk to assemble results.
- Average document size. Every field in an index is present on every document, and if an index has fifteen thousand fields and a document only has fifteen fields defined, 14,985 fields are set to null. Such a small record, only fifteen fields, likely has most of its space consumed by all of those nulled fields. Large document sizes slow down search performance, as Elasticsearch has to move large documents.
When it comes to Elasticsearch, there are two metrics to pay attention to for cardinality:
- Average document size. Take the total size of the index and divide it by the number of documents in the index. This gives you the size per document. A slowly increasing value even though you’re not increasing the size of the documents being fed in is a sign that you’ve a creeping index problem.
- Count of fields. This is a direct measure of cardinality, and the best case for your index is all documents have all fields defined on them. If you find that the count of fields steadily increases, check your inputs to see if someone is using something with high cardinality as a field-key rather than a field-value.
When Elasticsearch is used with time-based indexes — such as one index per day or week, which is common with Centralized Logging telemetry systems — you can easily track both of these metrics over time. Search performance generally scales with the size of the shards inside an index, which means you need to track that as well. Time-based indexes also let you see progress you’ve made in fixing problems.
MongoDB is another document-oriented database. Unlike Elasticsearch, which provides an index for every field, MongoDB relies on externally defined indexes. The implication of having to define your own indexes is that the data going into a given collection (similar to an index in Elasticsearch) all looks the same. This design makes MongoDB less flexible in the face of variable inputs, but gives you more direct control over search performance. As with relational databases, searching for things in unindexed fields kills search performance.
All is not lost, though. MongoDB supports creating a single index for every string-type field in the collection. This more closely matches the design of Elasticsearch in that all text searches is done through the index. Using this feature for centralized logging makes the index of the collection significantly larger than the collection itself. This isn’t always a bad thing, it certainly improves search performance, but it changes where you look for problems.
MongoDB supports sharding, splitting a collection up onto different database servers. This sort of splitting is a great way to allow more write capacity, and also has the impact of reducing the absolute size of your index and data files on each given shard-server. Similar to how OpenTSDB (see figure 1) and Elasticsearch handle shards, queries on a sharded collection benefit from parallelization.
That’s all for this article. If you want more, check out the book on Manning’s liveBook platform here.