The data continuum, explained

Aiven
3 min readJul 11, 2019
Representation of the data continuum, entitled, “Where on the spectrum does your data lie?”

Imagine all of the possible formats into which you could collect data. And then, imagine what you could do with it. Think of a continuum — on one end is completely structured data, on the other, completely unstructured data.

Coupled with this comes the flexibility of searching that data, but this is where things get interesting. You’d think that the more structured your data is, the more flexible the means of searching it — but then you’d be wrong.

To understand all of this better, let’s examine the data continuum: from fully-structured to unstructured. From there, we’ll look at the data stores that handle each kind of data. And we’ll consider questions to ask when choosing a data store: even when you may need more than one in your pipeline. But first, let’s start on the structured end of the spectrum.

Relational database management systems

On one end of the continuum, you have rigorously-structured data: think of RDBMSes like PostgreSQL or MySQL. On this end, you’ll find transactional data, where data format is held to the strictest requirements, and records cannot afford to be lost under any circumstances.

If an incoming event doesn’t strictly meet the criteria, the event will not be stored but rejected, and the database or client will throw an error (a type incompatibility is when data in a specific field doesn’t match the predefined format for that field). So, for example, if a field in a PostgreSQL schema specifies an integer and an incoming event has a float in that space, the incoming event will be rejected.

RDBMSes require ACID transactions, meaning that all transactions within the database are atomic, consistent, isolated, and durable. This explains the rigidity of the SQL query language: transactions cannot break or introduce inconsistencies into other database records, even if it is distributed.

Also, once specific columns are defined for a table, when inserting data using insert into, in all rows, these fixed columns will be automatically populated to contain at least a NULL value.

NOTE: These days, Postgres and MySQL support JSON quite well, making them competitive in some ways with other data stores for semi-structured data.

Wide-column stores

Moving to the right on the continuum, the rules of engagement relax slightly. A wide-column storelike Apache Cassandra or ScyllaDB allows rows (as minimal units of replication, analogous to rows as records in Postgres) to store a great, but most importantly variable number of columns…

We hope you’ve enjoyed so far! To read the full article, visit this link at aiven.io.

--

--