72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			72 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | --- | ||
|  | title: Flink | ||
|  | --- | ||
|  | ## Flink
 | ||
|  | 
 | ||
|  | Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. | ||
|  | 
 | ||
|  | The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner.  | ||
|  | Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively. | ||
|  | Flink provides a high-throughput, low-latency streaming engine as well as support for event-time processing and state management. Flink applications are fault-tolerant in the event of machine failure and support exactly-once semantics.  | ||
|  | Programs can be written in Java, Scala, Python, and SQL and are automatically compiled and optimized into dataflow programs that are executed in a cluster or cloud environment. | ||
|  | 
 | ||
|  | Flink does not provide its own data storage system and provides data source and sink connectors to systems such as Amazon Kinesis, Apache Kafka, HDFS, Apache Cassandra, and ElasticSearch. | ||
|  | 
 | ||
|  |  | ||
|  | 
 | ||
|  | **What Is New in Apache Flink?** | ||
|  | * Flink implements actual streaming processing and not imitates it with micro-batch processing. In Spark streaming is a special case of batching, while in Flink batching is a special case of streaming (stream of a finite size) | ||
|  | * Flink has better support for cyclical and iterative processing  | ||
|  | * Flink has lower latency and higher throughput | ||
|  | * Flink has more powerful windows operators | ||
|  | * Flink implements lightweight distributed snapshots that has low overhead and only-once processing guarantees in stream processing, without using micro batching as Spark does | ||
|  | * Flink supports mutable state in stream processing | ||
|  | 
 | ||
|  | ### Features
 | ||
|  | 
 | ||
|  | * A streaming-first runtime that supports both batch processing and data streaming programs | ||
|  | * Elegant and fluent APIs in Java and Scala | ||
|  | * A runtime that supports very high throughput and low event latency at the same time | ||
|  | * Support for *event time* and *out-of-order* processing in the DataStream API, based on the *Dataflow Model* | ||
|  | * Flexible windowing (time, count, sessions, custom triggers) accross different time semantics (event time, processing time) | ||
|  | * Fault-tolerance with *exactly-once* processing guarantees | ||
|  | * Natural back-pressure in streaming programs | ||
|  | * Libraries for Graph processing (batch), Machine Learning (batch), and Complex Event Processing (streaming) | ||
|  | * Built-in support for iterative programs (BSP) in the DataSet (batch) API | ||
|  | * Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms | ||
|  | * Compatibility layers for Apache Hadoop MapReduce and Apache Storm | ||
|  | * Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem | ||
|  | 
 | ||
|  | 
 | ||
|  | ### Flink Usage
 | ||
|  | 
 | ||
|  | Prerequisites for building Flink: | ||
|  | 
 | ||
|  | * Unix-like environment (We use Linux, Mac OS X, Cygwin) | ||
|  | * git | ||
|  | * Maven (we recommend version 3.0.4) | ||
|  | * Java 7 or 8 | ||
|  | 
 | ||
|  | ``` | ||
|  | git clone https://github.com/apache/flink.git | ||
|  | cd flink | ||
|  | mvn clean package -DskipTests # this will take up to 10 minutes | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Developing Flink
 | ||
|  | 
 | ||
|  | The Flink committers use IntelliJ IDEA to develop the Flink codebase. | ||
|  | We recommend IntelliJ IDEA for developing projects that involve Scala code. | ||
|  | 
 | ||
|  | Minimal requirements for an IDE are: | ||
|  | * Support for Java and Scala (also mixed projects) | ||
|  | * Support for Maven with Java and Scala | ||
|  | 
 | ||
|  | 
 | ||
|  | #### More Information:
 | ||
|  | * Flink website: <a href='https://flink.apache.org/' target='_blank' rel='nofollow'>Apache Flink</a> | ||
|  | * Flink documentation: <a href='https://ci.apache.org/projects/flink/flink-docs-release-1.3/' target='_blank' rel='nofollow'>flinkdocs</a> | ||
|  | * Quick flink tutorial: <a href='https://www.linkedin.com/pulse/introduction-apache-flink-quickstart-tutorial-malini-shukla/' target='_blank' rel='nofollow'>quick start</a> | ||
|  | * How to guide: <a href='https://data-artisans.com/blog/kafka-flink-a-practical-how-to' target='_blank' rel='nofollow'>howto</a> | ||
|  | * Flink vs Spark: <a href='http://www.developintelligence.com/blog/2017/02/comparing-contrasting-apache-flink-vs-spark/' target='_blank' rel='nofollow'>comparison</a> |