Introduction to Apache Flink

A squirrel bites into a nut, trying to get to the protein-rich core of Apache Flink

Apache Flink is an open source framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Sound like a mouthful? Read on for a comprehensive overviews of this powerful software solution, and a look at how companies use Flink to expand the way they process data.

What is Apache Flink?

The name Flink derives from the German word flink which means fast or agile (hence the logo, which is a red squirrel — a common sight in Berlin, where Apache Flink was partially created). Flink sprung from Stratosphere, a research project conducted by several European universities between 2010 and 2014.

Flink is part of a new class of systems that enable rapid data streaming, along with Apache Spark, Apache Storm, Apache Flume, and Apache Kafka. The open source tool is helping countless businesses transition away from batch processing in use cases where it makes sense to do so. Flink is now widely used in many leading applications, which we will explain further in this post.

With Flink — which is written in Java and Scala — companies can receive event-at-a-time processing and dataflow programming, using data parallelism and pipelining.

Up next, let’s take a deep dive and explore what you can do with this powerful open source program.

What Can Apache Flink Do?

1. Facilitate simultaneous streaming and batch processing

“As the original creators of Flink, we have always believed that it is possible to have a runtime that is state-of-the-art for stream processing and batch processing use cases simultaneously; a runtime that is streaming-first, but can exploit just the right amount of special properties of bounded streams to be as fast for batch use cases as dedicated batch processors,” Hueske and Krettek write.

This is arguably the best feature of Flink. Its network stack can support low-latency and high-throughput streaming data transfers along with high-throughput batch shuffles — all from a single platform.

This can drastically simplify operations, helping organizations save time and money along the way.

2. Process millions of records per minute

Here’s how it works: Flink consumes an event from the source, processes it, and sends it to a sink. Then it goes on to process the next event immediately; it doesn’t wait while aggregating a batch of events.

With this functionality, Flink can process tons of events with ultra-low latency. As a result, you can to increase the throughput of your applications while having the ability to scale your systems to multiple machines.

3. Power applications at scale

One user, WalmartLabs Software Engineer Khartik Khare, says he has given Flink jobs with more than 10 million RPM, with no more than 20 cores.

Flink can also scale effectively by minimizing garbage collection and data limiting transfers across network nodes. In addition, Flink uses buffering and credit-based flow control for handling backpressure.

Add it all up, and Flink helps ensure powerful applications deliver modern user experiences at scale.

4. Utilize in-memory performance

Wrapping up

Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!

In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.

Originally published at https://aiven.io.

Your database in the cloud, www.aiven.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store