72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						|
title: Flink
 | 
						|
---
 | 
						|
## Flink
 | 
						|
 | 
						|
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.
 | 
						|
 | 
						|
The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. 
 | 
						|
Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.
 | 
						|
Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics. 
 | 
						|
Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment.
 | 
						|
 | 
						|
Flink does not provide its own data storage system and provides data source and sink connectors to systems such as Amazon Kinesis, Apache Kafka, HDFS, Apache Cassandra, and ElasticSearch.
 | 
						|
 | 
						|

 | 
						|
 | 
						|
**What Is New in Apache Flink?**
 | 
						|
* Flink implements actual streaming processing and not imitates it with micro-batch processing. In Spark streaming is a special case of batching, while in Flink batching is a special case of streaming (stream of a finite size)
 | 
						|
* Flink has better support for cyclical and iterative processing 
 | 
						|
* Flink has lower latency and higher throughput
 | 
						|
* Flink has more powerful windows operators
 | 
						|
* Flink implements lightweight distributed snapshots that has low overhead and only-once processing guarantees in stream processing, without using micro batching as Spark does
 | 
						|
* Flink supports mutable state in stream processing
 | 
						|
 | 
						|
### Features
 | 
						|
 | 
						|
* A streaming-first runtime that supports both batch processing and data streaming programs
 | 
						|
* Elegant and fluent APIs in Java and Scala
 | 
						|
* A runtime that supports very high throughput and low event latency at the same time
 | 
						|
* Support for *event time* and *out-of-order* processing in the DataStream API, based on the *Dataflow Model*
 | 
						|
* Flexible windowing (time, count, sessions, custom triggers) accross different time semantics (event time, processing time)
 | 
						|
* Fault-tolerance with *exactly-once* processing guarantees
 | 
						|
* Natural back-pressure in streaming programs
 | 
						|
* Libraries for Graph processing (batch), Machine Learning (batch), and Complex Event Processing (streaming)
 | 
						|
* Built-in support for iterative programs (BSP) in the DataSet (batch) API
 | 
						|
* Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms
 | 
						|
* Compatibility layers for Apache Hadoop MapReduce and Apache Storm
 | 
						|
* Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem
 | 
						|
 | 
						|
 | 
						|
### Flink Usage
 | 
						|
 | 
						|
Prerequisites for building Flink:
 | 
						|
 | 
						|
* Unix-like environment (We use Linux, Mac OS X, Cygwin)
 | 
						|
* git
 | 
						|
* Maven (we recommend version 3.0.4)
 | 
						|
* Java 7 or 8
 | 
						|
 | 
						|
```
 | 
						|
git clone https://github.com/apache/flink.git
 | 
						|
cd flink
 | 
						|
mvn clean package -DskipTests # this will take up to 10 minutes
 | 
						|
```
 | 
						|
 | 
						|
## Developing Flink
 | 
						|
 | 
						|
The Flink committers use IntelliJ IDEA to develop the Flink codebase.
 | 
						|
We recommend IntelliJ IDEA for developing projects that involve Scala code.
 | 
						|
 | 
						|
Minimal requirements for an IDE are:
 | 
						|
* Support for Java and Scala (also mixed projects)
 | 
						|
* Support for Maven with Java and Scala
 | 
						|
 | 
						|
 | 
						|
#### More Information:
 | 
						|
* Flink website: <a href='https://flink.apache.org/' target='_blank' rel='nofollow'>Apache Flink</a>
 | 
						|
* Flink documentation: <a href='https://ci.apache.org/projects/flink/flink-docs-release-1.3/' target='_blank' rel='nofollow'>flinkdocs</a>
 | 
						|
* Quick flink tutorial: <a href='https://www.linkedin.com/pulse/introduction-apache-flink-quickstart-tutorial-malini-shukla/' target='_blank' rel='nofollow'>quick start</a>
 | 
						|
* How to guide: <a href='https://data-artisans.com/blog/kafka-flink-a-practical-how-to' target='_blank' rel='nofollow'>howto</a>
 | 
						|
* Flink vs Spark: <a href='http://www.developintelligence.com/blog/2017/02/comparing-contrasting-apache-flink-vs-spark/' target='_blank' rel='nofollow'>comparison</a>
 |