当前位置:网站首页>Flink's important basics
Flink's important basics
2022-04-23 04:41:00 【Z-hhhhh】
One 、Flink Basic knowledge
1.1、Flink Introduce
Apache Flink Is a framework and distributed processing engine , Used for stateful computation on unbounded and bounded data streams .
Flink Official website :https://flink.apache.org/
Flink Chinese official website :https://flink.apache.org/zh/
1.2、 Bounded and unbounded
1.2.1、 Bounded data sets
Start off and finish . The data processed must be within a certain time range , It could be one day , It could be a minute , A data set with a beginning and an end like this , It's called a bounded data set . The processing of bounded data sets is called batch processing .
1.2.2、 Unbounded data set
having a beginning but no end . Data from the beginning of a steady stream of new data . The processing of unbounded data sets is called stream processing .
Bounded and unbounded are relative concepts , Mainly according to the time range .
At present, the open source big data processing framework , Capable of supporting batch flow calculation at the same time , Namely Spark and Flink.

1.2.3、 Stream processing and batch processing
Stream processing is like an elevator in a mall , People don't have to wait , You can go straight to .
Batch processing is like a vertical ladder in a community , Need to wait for a while , Can take more than one person at a time .
1.3、Flink contrast Spark
- The nature of processing data is different :
Spark and Flink All hope to unify stream batch processing with a set of Technology , But the essence of implementation is different ,Spark It is based on batch processing technology , Cut the data into small batches to realize streaming processing , and Flink It's completely streaming , As long as the data comes , It will be dealt with immediately .
-
Data patterns are different :
Spark use RDD,SparkStreaming Medium DStream That is, groups of small batches RDD.
Flink The basic data model is data flow , And events (Event). -
The runtime architecture is different :
Spark It's batch calculation , speak DAG Cut into several Stage, One Stage finish , To calculate the next .
Flink Is the flow execution model , After an event is processed by one node, it is directly sent to the next node for processing .
1.4、Flink The advantages of
1.4.1、 advantage
-
At the same time, it supports high throughput 、 Low latency 、 High performance
Flink It is the only big data framework to achieve high throughput by Amoy 、 Low latency 、 High performance distributed data processing framework ;Spark Due to the pseudo stream processing based on batch processing , Low latency cannot be guaranteed ;Storm Low latency and high performance are guaranteed , However, high throughput cannot be achieved . -
Support Event Time
In most data processing frameworks , System events are used , Instead of the event generated by the event , and Flink Support use Event Time Do window calculation , Avoid the disorder of data sequence caused by network transmission . -
Support stateful Computing
state : It refers to saving the intermediate result data in the calculation process in memory or file system . And support accurate once (exactly-once) State consistency assurance of . -
Highly flexible window Computing
Flink Three window modes are supported : Time 、 Number 、 conversation . -
Support lightweight distributed snapshots
By using distributed snapshots Checkpoints, Persist the state information during execution , Node downtime encountered 、 When there is a network transmission problem , It can be done by Checkpoints Automatic recovery of tasks . -
be based on JVM Achieve independent memory management
Flink It realizes the mechanism of memory management , Reduce... As much as possible JVM and GC Impact on the system . -
Save it
Our stream computing is likely to be running all the time , The termination of a period of time may lead to data loss or calculation error , Such as : Upgrade of cluster version 、 Shutdown maintenance operation , You can use Save Point Save the snapshot of task execution on storage media , When the task is restarted , Can be restored to the original calculated state . -
Continuous virtual model with back pressure
When the downstream operator cannot keep up , The signal can be sent to the upstream operator through flow void , The upstream operator then passes the signal to source, So as to achieve sink Reverse to source, control source The rate of consumption , Ensure the stable operation of the system .
1.4.2、 characteristic
Stateful computation : Replace the data stored in relational data with the state in local memory . But the data saved to memory is easy to lose , So we use Checkpoint Do disaster recovery . It's equivalent to taking a snapshot of the State , Save it to a remote storage space .


1.5、Flink The role of

Flink Also follow the principle of master-slave , The primary node is JobManager, Slave node is TaskManager.
- client
Submit the task to JobManager, And on and on JobManager Perform task interaction to obtain task execution status . - JobManager
Responsible for task scheduling and resource management . be responsible for Checkpoint The coordination process .
After getting the task of the client , According to the cluster TaskManager On TaskSlot Usage situation , Assign the corresponding... To the submitted task TaskSlots resources , And order TaskManager start-up .
JobManager In the course of carrying out the task , Will trigger Checkpoints operation , Every TaskManager received Checkpoint After the instruction , complete Checkpoint operation . After completing the task ,Flink The results will be fed back to the client , And release it TaskManager The resource . - TaskManager
Responsible for the execution of the task . Responsible for resource application and management on each node of the task .
TaskManager from JobManager After receiving the task , Use Slot Resource launch Task, Start receiving and processing data . - ResourceManager
ResourceManager be responsible for Flink The resources in the cluster provide 、 Recycling 、 Distribute 、 management task slots.Flink For different environmental and resource providers ( Such as :YARN,Mesos,Kubernetes and standalone Deploy ) The corresponding ResourceManager. stay standalone Setting up ,ResourceManager Refers to the ability to allocate available TaskManager Of slots, Instead of starting a new TaskManager. - Dispatcher
Dispacher Provides a REST Interface , To submit Flink Application execution , Not every submitted job starts a new JobMaster. It also runs Flink WebUI Used to provide job execution information . - JobMaster
JobMaster Responsible for managing individual JobGraph Implementation .Flink Multiple jobs can run simultaneously in a cluster , Each assignment has its own JobMaster.
版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122682.html
边栏推荐
- Basic operation of sequence table
- leetcode009--用二分查找在数组中搜索目标值
- MySQL queries users logged in for at least N consecutive days
- How to regulate intestinal flora? Introduction to common natural substances, probiotics and prebiotics
- 阿里十年技术专家联合打造“最新”Jetpack Compose项目实战演练(附Demo)
- [AI vision · quick review of today's sound acoustic papers, issue 3] wed, 20 APR 2022
- Apache Bench(ab 压力测试工具)的安装与使用
- 做数据可视化应该避免的8个误区
- Go 语言中的 logger 和 zap 日志库
- 重剑无锋,大巧不工
猜你喜欢

【论文阅读】【3d目标检测】Improving 3D Object Detection with Channel-wise Transformer

指纹Key全国产化电子元件推荐方案

The 14th issue of HMS core discovery reviews the long article | enjoy the silky clip and release the creativity of the video

上海航芯技术分享 | ACM32 MCU安全特性概述

补充番外14:cmake实践项目笔记(未完待续4/22)

Eight misunderstandings that should be avoided in data visualization

QML advanced (V) - realize all kinds of cool special effects through particle simulation system

Inverse system of RC low pass filter

Bacterial infection and antibiotic use

Improving 3D object detection with channel wise transformer
随机推荐
优麒麟 22.04 LTS 版本正式发布 | UKUI 3.1开启全新体验
顺序表的基本操作
[timing] empirical evaluation of general convolution and cyclic networks for sequence modeling based on TCN
Kotlin. The binary version of its metadata is 1.6.0, expected version is 1.1.15.
Mysql---数据读写分离、多实例
What is a data island? Why is there still a data island in 2022?
重剑无锋,大巧不工
IDE Idea 自动编译 与 On Upate Action 、 On Frame Deactivation 的配置
SQL statement for adding columns in MySQL table
leetcode004--罗马数字转整数
Leetcode006 -- find the longest common prefix in the string array
Small volume Schottky diode compatible with nsr20f30nxt5g
Go反射—Go语言圣经学习笔记
Installation of zynq platform cross compiler
補:注解(Annotation)
leetcode007--判断字符串中的括号是否匹配
383. Ransom letter
AWS EKS添加集群用户或IAM角色
数据孤岛是什么?为什么2022年仍然存在数据孤岛?
那些年我面试过的Android开发岗总结(附面试题+答案解析)