72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								title: Flink
							 | 
						||
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								## Flink
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. 
							 | 
						||
| 
								 | 
							
								Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.
							 | 
						||
| 
								 | 
							
								Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics. 
							 | 
						||
| 
								 | 
							
								Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Flink does not provide its own data storage system and provides data source and sink connectors to systems such as Amazon Kinesis, Apache Kafka, HDFS, Apache Cassandra, and ElasticSearch.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**What Is New in Apache Flink?**
							 | 
						||
| 
								 | 
							
								* Flink implements actual streaming processing and not imitates it with micro-batch processing. In Spark streaming is a special case of batching, while in Flink batching is a special case of streaming (stream of a finite size)
							 | 
						||
| 
								 | 
							
								* Flink has better support for cyclical and iterative processing 
							 | 
						||
| 
								 | 
							
								* Flink has lower latency and higher throughput
							 | 
						||
| 
								 | 
							
								* Flink has more powerful windows operators
							 | 
						||
| 
								 | 
							
								* Flink implements lightweight distributed snapshots that has low overhead and only-once processing guarantees in stream processing, without using micro batching as Spark does
							 | 
						||
| 
								 | 
							
								* Flink supports mutable state in stream processing
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								### Features
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* A streaming-first runtime that supports both batch processing and data streaming programs
							 | 
						||
| 
								 | 
							
								* Elegant and fluent APIs in Java and Scala
							 | 
						||
| 
								 | 
							
								* A runtime that supports very high throughput and low event latency at the same time
							 | 
						||
| 
								 | 
							
								* Support for *event time* and *out-of-order* processing in the DataStream API, based on the *Dataflow Model*
							 | 
						||
| 
								 | 
							
								* Flexible windowing (time, count, sessions, custom triggers) accross different time semantics (event time, processing time)
							 | 
						||
| 
								 | 
							
								* Fault-tolerance with *exactly-once* processing guarantees
							 | 
						||
| 
								 | 
							
								* Natural back-pressure in streaming programs
							 | 
						||
| 
								 | 
							
								* Libraries for Graph processing (batch), Machine Learning (batch), and Complex Event Processing (streaming)
							 | 
						||
| 
								 | 
							
								* Built-in support for iterative programs (BSP) in the DataSet (batch) API
							 | 
						||
| 
								 | 
							
								* Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms
							 | 
						||
| 
								 | 
							
								* Compatibility layers for Apache Hadoop MapReduce and Apache Storm
							 | 
						||
| 
								 | 
							
								* Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								### Flink Usage
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Prerequisites for building Flink:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Unix-like environment (We use Linux, Mac OS X, Cygwin)
							 | 
						||
| 
								 | 
							
								* git
							 | 
						||
| 
								 | 
							
								* Maven (we recommend version 3.0.4)
							 | 
						||
| 
								 | 
							
								* Java 7 or 8
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								git clone https://github.com/apache/flink.git
							 | 
						||
| 
								 | 
							
								cd flink
							 | 
						||
| 
								 | 
							
								mvn clean package -DskipTests # this will take up to 10 minutes
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## Developing Flink
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The Flink committers use IntelliJ IDEA to develop the Flink codebase.
							 | 
						||
| 
								 | 
							
								We recommend IntelliJ IDEA for developing projects that involve Scala code.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Minimal requirements for an IDE are:
							 | 
						||
| 
								 | 
							
								* Support for Java and Scala (also mixed projects)
							 | 
						||
| 
								 | 
							
								* Support for Maven with Java and Scala
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								#### More Information:
							 | 
						||
| 
								 | 
							
								* Flink website: <a href='https://flink.apache.org/' target='_blank' rel='nofollow'>Apache Flink</a>
							 | 
						||
| 
								 | 
							
								* Flink documentation: <a href='https://ci.apache.org/projects/flink/flink-docs-release-1.3/' target='_blank' rel='nofollow'>flinkdocs</a>
							 | 
						||
| 
								 | 
							
								* Quick flink tutorial: <a href='https://www.linkedin.com/pulse/introduction-apache-flink-quickstart-tutorial-malini-shukla/' target='_blank' rel='nofollow'>quick start</a>
							 | 
						||
| 
								 | 
							
								* How to guide: <a href='https://data-artisans.com/blog/kafka-flink-a-practical-how-to' target='_blank' rel='nofollow'>howto</a>
							 | 
						||
| 
								 | 
							
								* Flink vs Spark: <a href='http://www.developintelligence.com/blog/2017/02/comparing-contrasting-apache-flink-vs-spark/' target='_blank' rel='nofollow'>comparison</a>
							 |