M3 is the shiny new thing in time series databases. In this post, we’ll compare it to some more established players and highlight the differences.
Let’s start things off with a tip: if you’re new to time series databases, you might want to start with our blog post An introduction to time series databases or An introduction to M3. And if you’re interested in using M3 as part of an observability solution, go ahead and read our case study about how Aiven did it internally.
M3: A Time series data platform
M3, birthed from the massive metrics scalability needs at Uber, bills itself as a metrics engine. More than just being a standalone TSDB, it is a time series platform that includes a distributed TSDB. M3’s main components include the M3 Coordinator, the M3DB (which contains the native TSDB), and the M3 Query Engine.
The M3 Coordinator
The M3 Coordinator functions as a bridge between the collection agent and storage in the M3DB. For users familiar with Prometheus, remote and scalable storage of collected data was a hurdle. M3 lowered the bar of entry for those using Prometheus as a collection agent by building the M3 Coordinator as a Prometheus sidecar. In a sense, the M3 Coordinator is a collection agent: it can collect, process, aggregate, and down-sample data, and then send that data to M3DB for ingestion. Early on though, the M3 Coordinator can work alongside Prometheus to ease the transition to the M3 platform.
Though the M3 Coordinator can handle aggregation as it processes data on the way to M3DB, M3 also has a dedicated metrics aggregator: M3 Aggregator. The M3 Coordinator sends its data to M3 Aggregator, which handles stream-based downsampling of data, before passing it on to M3DB for storage. M3 Aggregator is designed to be highly available, supporting clustering and replication.
M3DB is M3’s native distributed time series database, written in Go. The cost of storing time series data at scale — and in Uber’s case, this meant roughly 8.5 billion data points per second in 2018 — can be a business killer. That’s why they dedicated their efforts to building a database that compressed time series data to yield enormous storage (and financial) savings.
M3Query is M3’s querying and aggregation engine. Just as storing a massive amount of time series data can explode disk needs, querying across that data can explode memory needs. M3Query performs just-in-time compression in its querying algorithm, to reduce memory usage. It also works with a columnar data format in M3DB, which lets query executions run in parallel, yielding huge speed gains. To get a deep-dive into M3Query’s architecture, Uber Engineering put out an excellent presentation.
For querying languages, M3Query supports PromQL (from Prometheus) and Graphite functions, but the M3 team is working on releasing its own querying language (M3QL).
M3 in the time series ecosystem
We’ve seen M3, as a platform, occupy every layer of the stack from collection agent all the way to query engine, with a native TSDB for storage in the middle. It plays well with other collection agents as well as alert managers or monitoring agents.
M3 arrived (in 2018) to the Time Series ecosystem later than others, but with that comes the advantage of designing to address the explosion of metrics availability in more recent years. Its optimization and compression work to massively reduce storage and memory usage is looking to be a game changer for the ecosystem.
Though originally developed at Uber, it has been spun off as an open-source project, deployable as Kubernetes clusters. For those averse to DIY, there is Aiven for M3, a fully-managed M3 solution, deployable to AWS or Google Cloud.
M3 and friends
Let’s take a look at how M3 differs from the other databases in the same space.
M3 versus InfluxDB
The most established player in the time series space is InfluxDB. InfluxDB is a TSDB with a query engine, task engine, and visualization. InfluxDB’s task engine allows the scheduling of tasks for querying, analyzing, and modifying data. InfluxDB includes its own UI for building custom dashboards to visualize the data. For querying, InfluxDB uses its own querying language, Flux.
In a typical use case, InfluxDB works alongside Telegraf, a sister product which serves as the collection agent. InfluxDB does, however, support remote reads and writes from Prometheus. This means that, similar to M3, InfluxDB can also serve as the remote data store for a Prometheus collection agent.
Both InfluxDB and Telegraf are open-source and downloadable for organizations to deploy and manage on their own. It’s important to note, though, that high availability and clustering is only available through InfluxData’s commercial products.
M3 versus TimescaleDB
TimescaleDB is “the only open source time-series database that supports full SQL.” While M3 is purpose-built and designed from the ground up as a TSDB, TimescaleDB is an extension built on top of PostgreSQL, a relational database. Querying is done with plain old SQL, just like one would do with PostgreSQL.
In regards to the time series tech stack, TimescaleDB covers the storage layer and the query engine. Storage scales the same way one might scale PostgreSQL. For querying, TimescaleDB is optimized for fast inserts and complex time aware queries. Similar to M3, TimescaleDB does not have its own visualization engine. Charts and dashboards require the use of tools like Grafana (or others that support PostgreSQL).
TimescaleDB is available as open source, but multi-node and distributed deployments seem limited only to those using the TimescaleCloud commercial product.
M3 versus Druid
Apache Druid is a real-time analytics database, and it is optimized for data that is paired with a timestamp. Technically, Druid claims to be an analytics engine, not a time series database. Still, because it uses columnar storage (like M3DB) and partitions its data by time, it boasts fast querying when a time filter is involved.
Unlike M3, however, Druid’s fit into the time series tech stack is not as clean and conventional. Data sources typically stream their raw data into Druid through Apache Kafka or a Hadoop Distributed File System. In this sense, Druid feels much more like a Big Data analytics engine than a time series database, which is just as advertised.
Druid is open source and typically deployed on Linux machines in the cloud. Again, not being a traditional TSDB, Druid is not built to integrate as seamlessly with other popular tools in the time series tech stack.
M3 versus OpenTSDB
OpenTSDB consists of a Time Series Daemon (TSD) — which functions like a collection agent and a query engine — sitting on top of an Apache HBase database for storage. OpenTSDB bundles in multiple command-line utilities for working with the TSD. Data can be queried with these tools or through an HTTP API. OpenTSDB even has its own local web browser GUI for visualizations, but its documentation plainly states: “A much nicer GUI can be found in the form of the open source Grafana.”
The latest version of OpenTSDB is 2.4 was released in December 2018. Although a 3.0 version is supposedly in the works, it doesn’t look like this open-source solution has been getting much forward movement in recent years.
M3 versus Graphite
Graphite emphasizes its function as a monitoring agent, but actually it consists of three components: a collection agent (Carbon), a TSDB (Whisper), and a querying and visualization engine (Graphite-Web). When it comes to scalability, the limiting factor in Graphite is the Whisper database. Graphite’s own documentation states: “Whisper is somewhat inefficient in its usage of disk space because of certain design choices.” When it comes to speed, “Whisper is fast enough for most purposes.”
On the plus side, Graphite seems quickly and easily deployable as a single Docker image.
If an organization’s scaling needs begin to surpass what Graphite can offer, M3 supports ingesting metrics from Graphite via the Carbon plaintext protocol. This integration makes scaling up from Graphite to M3 all the more seamless.
M3 versus Prometheus
M3 recognized early on that Prometheus was (and perhaps is) the go-to open-source TSDB platform tool for organizations that aren’t yet capturing time series data at massive scale. Prometheus is a collection agent, a (in memory and local on disk) storage agent, a query engine (using PromQL), and integrates well with monitoring and visualization engines like Grafana.
M3’s own prescribed first-use case connects M3 Coordinator as a Prometheus Sidecar, writing data that Prometheus collects to a remote instance of M3DB. For many organizations, this setup provides enough to meet their scaling needs.
Time series data is not just for the financial sector anymore. As the amount of available time series data has exploded in recent years, so has the need for fast and robust tools which don’t break the bank when it comes to resource usage.
The tech stack in the time series ecosystem — collection agent and aggregator, the actual TSDB for storage, query engine, monitoring and alert agents — is neither simple nor simply covered by a single all-in-one tool.
In the center of the stack, with its focus on optimal compression specific to time series data, is M3. The M3DB and M3Query components yield massively scalable and highly efficient storage and querying. It’s built to integrate well with collection agents like Prometheus, and it integrates well with Grafana for visualizations.
At what point will the amount of second-by-second data outpace what your TSDB can store cost-effectively? When might the complexity or scope of your queries and aggregations cripple your server’s memory resources? Is it more cost effective to have plenty of room to scale quickly, or to scale when you need to? As we’ve looked at where M3 stands in the stack relative to other players in the space, ultimately the question of which tool(s) will hinge on these questions of scaling and cost.
Originally published at https://aiven.io.