Introduction to Apache Flink

Apache Flink is an open source framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Sound like a mouthful? Read on for a comprehensive overviews of this powerful software solution, and a look at how companies use Flink to expand the way they process data.

What is Apache Flink?

Flink is an open source framework and distributed, fault tolerant, stream processing engine built by the Apache Flink Community, a subset of the Apache Software Foundation. Flink, which is now at version 1.11.0, is operated by a team of roughly 25 committers and is maintained by more than 340 contributors around the world.

The name Flink derives from the German word flink which means fast or agile (hence the logo, which is a red squirrel — a common sight in Berlin, where Apache Flink was partially created). Flink sprung from Stratosphere, a research project conducted by several European universities between 2010 and 2014.

Flink is part of a new class of systems that enable rapid data streaming, along with Apache Spark, Apache Storm, Apache Flume, and Apache Kafka. The open source tool is helping countless businesses transition away from batch processing in use cases where it makes sense to do so. Flink is now widely used in many leading applications, which we will explain further in this post.

With Flink — which is written in Java and Scala — companies can receive event-at-a-time processing and dataflow programming, using data parallelism and pipelining.

Up next, let’s take a deep dive and explore what you can do with this powerful open source program.

What Can Apache Flink Do?

Here are some of the ways that organizations use Apache Flink today.

1. Facilitate simultaneous streaming and batch processing

As creators Fabian Hueske and Aljoscha Krettek explain in a DZone post, Flink is built around the idea of “streaming first, with batch as a special case of streaming.” This, in turn, reduces the complexity of data infrastructure.

“As the original creators of Flink, we have always believed that it is possible to have a runtime that is state-of-the-art for stream processing and batch processing use cases simultaneously; a runtime that is streaming-first, but can exploit just the right amount of special properties of bounded streams to be as fast for batch use cases as dedicated batch processors,” Hueske and Krettek write.

This is arguably the best feature of Flink. Its network stack can support low-latency and high-throughput streaming data transfers along with high-throughput batch shuffles — all from a single platform.

This can drastically simplify operations, helping organizations save time and money along the way.

2. Process millions of records per minute

Since Flink uses an event-at-a-time processing schematic, it can process millions of events per minute/second.

Here’s how it works: Flink consumes an event from the source, processes it, and sends it to a sink. Then it goes on to process the next event immediately; it doesn’t wait while aggregating a batch of events.

With this functionality, Flink can process tons of events with ultra-low latency. As a result, you can to increase the throughput of your applications while having the ability to scale your systems to multiple machines.

3. Power applications at scale

One of the top reasons why developers use Flink is because it can run stateful streaming applications that can support just about any workload that you feed it. Applications are parallelized into thousands of tasks, distributed and concurrently executed in a cluster, allowing applications to use virtually any amount of memory, CPU, disk, and network IO.

One user, WalmartLabs Software Engineer Khartik Khare, says he has given Flink jobs with more than 10 million RPM, with no more than 20 cores.

Flink can also scale effectively by minimizing garbage collection and data limiting transfers across network nodes. In addition, Flink uses buffering and credit-based flow control for handling backpressure.

Add it all up, and Flink helps ensure powerful applications deliver modern user experiences at scale.

4. Utilize in-memory performance

Flink produces ultra-low processing latencies by utilizing local and in-memory states for all computations. This way it can process events in real time instead of aggregating it in batches. The software also enables exactly-once state consistency, checkpointing local states to durable storage.

Wrapping up

Now that you have the initial lowdown on Flink, stay tuned for more content and news coming up on this topic!

Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!

In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.

Originally published at https://aiven.io.

--

--

--

Your database in the cloud, www.aiven.io

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Baiscs of Kali Linux OS

How to Add a Messenger-like Chess Game to Your Flutter Chat App

How to build your own Hyperledger Fabric network using minifabric?

Member Spotlight: Project Lead Sherry Li

Hello, Vino

Meet the Team at Gousto Tech — Sara Sipione — Junior iOS Engineer

Photo of Sara Sipione

How to install Flutter on my PC . Step by step guide .

Reflect and Recharge (Self Driving Car Nanodegree) Part 1

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aiven

Aiven

Your database in the cloud, www.aiven.io

More from Medium

Video: Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction (Hebrew)

Stream avro data from kafka over ssl to Apache pinot

Streaming Analytics With KSQL vs. a Real-Time Analytics Database

Streaming Analytics vs Real-Time Analytics Database

What Should You Watch Out For When Building Distributed Data Systems?