当前位置:网站首页>Flink's important basics
Flink's important basics
2022-04-23 04:41:00 【Z-hhhhh】
One 、Flink Basic knowledge
1.1、Flink Introduce
Apache Flink Is a framework and distributed processing engine , Used for stateful computation on unbounded and bounded data streams .
Flink Official website :https://flink.apache.org/
Flink Chinese official website :https://flink.apache.org/zh/
1.2、 Bounded and unbounded
1.2.1、 Bounded data sets
Start off and finish . The data processed must be within a certain time range , It could be one day , It could be a minute , A data set with a beginning and an end like this , It's called a bounded data set . The processing of bounded data sets is called batch processing .
1.2.2、 Unbounded data set
having a beginning but no end . Data from the beginning of a steady stream of new data . The processing of unbounded data sets is called stream processing .
Bounded and unbounded are relative concepts , Mainly according to the time range .
At present, the open source big data processing framework , Capable of supporting batch flow calculation at the same time , Namely Spark and Flink.
1.2.3、 Stream processing and batch processing
Stream processing is like an elevator in a mall , People don't have to wait , You can go straight to .
Batch processing is like a vertical ladder in a community , Need to wait for a while , Can take more than one person at a time .
1.3、Flink contrast Spark
- The nature of processing data is different :
Spark and Flink All hope to unify stream batch processing with a set of Technology , But the essence of implementation is different ,Spark It is based on batch processing technology , Cut the data into small batches to realize streaming processing , and Flink It's completely streaming , As long as the data comes , It will be dealt with immediately .
-
Data patterns are different :
Spark use RDD,SparkStreaming Medium DStream That is, groups of small batches RDD.
Flink The basic data model is data flow , And events (Event). -
The runtime architecture is different :
Spark It's batch calculation , speak DAG Cut into several Stage, One Stage finish , To calculate the next .
Flink Is the flow execution model , After an event is processed by one node, it is directly sent to the next node for processing .
1.4、Flink The advantages of
1.4.1、 advantage
-
At the same time, it supports high throughput 、 Low latency 、 High performance
Flink It is the only big data framework to achieve high throughput by Amoy 、 Low latency 、 High performance distributed data processing framework ;Spark Due to the pseudo stream processing based on batch processing , Low latency cannot be guaranteed ;Storm Low latency and high performance are guaranteed , However, high throughput cannot be achieved . -
Support Event Time
In most data processing frameworks , System events are used , Instead of the event generated by the event , and Flink Support use Event Time Do window calculation , Avoid the disorder of data sequence caused by network transmission . -
Support stateful Computing
state : It refers to saving the intermediate result data in the calculation process in memory or file system . And support accurate once (exactly-once) State consistency assurance of . -
Highly flexible window Computing
Flink Three window modes are supported : Time 、 Number 、 conversation . -
Support lightweight distributed snapshots
By using distributed snapshots Checkpoints, Persist the state information during execution , Node downtime encountered 、 When there is a network transmission problem , It can be done by Checkpoints Automatic recovery of tasks . -
be based on JVM Achieve independent memory management
Flink It realizes the mechanism of memory management , Reduce... As much as possible JVM and GC Impact on the system . -
Save it
Our stream computing is likely to be running all the time , The termination of a period of time may lead to data loss or calculation error , Such as : Upgrade of cluster version 、 Shutdown maintenance operation , You can use Save Point Save the snapshot of task execution on storage media , When the task is restarted , Can be restored to the original calculated state . -
Continuous virtual model with back pressure
When the downstream operator cannot keep up , The signal can be sent to the upstream operator through flow void , The upstream operator then passes the signal to source, So as to achieve sink Reverse to source, control source The rate of consumption , Ensure the stable operation of the system .
1.4.2、 characteristic
Stateful computation : Replace the data stored in relational data with the state in local memory . But the data saved to memory is easy to lose , So we use Checkpoint Do disaster recovery . It's equivalent to taking a snapshot of the State , Save it to a remote storage space .
1.5、Flink The role of
Flink Also follow the principle of master-slave , The primary node is JobManager, Slave node is TaskManager.
- client
Submit the task to JobManager, And on and on JobManager Perform task interaction to obtain task execution status . - JobManager
Responsible for task scheduling and resource management . be responsible for Checkpoint The coordination process .
After getting the task of the client , According to the cluster TaskManager On TaskSlot Usage situation , Assign the corresponding... To the submitted task TaskSlots resources , And order TaskManager start-up .
JobManager In the course of carrying out the task , Will trigger Checkpoints operation , Every TaskManager received Checkpoint After the instruction , complete Checkpoint operation . After completing the task ,Flink The results will be fed back to the client , And release it TaskManager The resource . - TaskManager
Responsible for the execution of the task . Responsible for resource application and management on each node of the task .
TaskManager from JobManager After receiving the task , Use Slot Resource launch Task, Start receiving and processing data . - ResourceManager
ResourceManager be responsible for Flink The resources in the cluster provide 、 Recycling 、 Distribute 、 management task slots.Flink For different environmental and resource providers ( Such as :YARN,Mesos,Kubernetes and standalone Deploy ) The corresponding ResourceManager. stay standalone Setting up ,ResourceManager Refers to the ability to allocate available TaskManager Of slots, Instead of starting a new TaskManager. - Dispatcher
Dispacher Provides a REST Interface , To submit Flink Application execution , Not every submitted job starts a new JobMaster. It also runs Flink WebUI Used to provide job execution information . - JobMaster
JobMaster Responsible for managing individual JobGraph Implementation .Flink Multiple jobs can run simultaneously in a cluster , Each assignment has its own JobMaster.
版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122682.html
边栏推荐
- Ali's ten-year technical experts jointly created the "latest" jetpack compose project combat drill (with demo)
- [AI vision · quick review of today's sound acoustic papers, issue 2] Fri, 15 APR 2022
- Shanghai Hangxin technology sharing 𞓜 overview of safety characteristics of acm32 MCU
- Chapter 4 - understanding standard equipment documents, filters and pipelines
- MYSQL50道基础练习题
- Redis command Encyclopedia
- Go reflection - go language Bible learning notes
- Record your own dataset with d435i, run orbslam2 and build a dense point cloud
- RC低通滤波器的逆系统
- 在AWS控制台创建VPC(无图版)
猜你喜欢
229. Find mode II
Kotlin. The binary version of its metadata is 1.6.0, expected version is 1.1.15.
无线键盘全国产化电子元件推荐方案
zynq平臺交叉編譯器的安裝
Luogu p1858 [multi person knapsack] (knapsack seeking the top k optimal solution)
优麒麟 22.04 LTS 版本正式发布 | UKUI 3.1开启全新体验
[AI vision · quick review of NLP natural language processing papers today, No. 32] wed, 20 APR 2022
Coinbase: basic knowledge, facts and statistics about cross chain bridge
IDE idea automatic compilation and configuration of on update action and on frame deactivation
Installation du compilateur croisé de la plateforme zynq
随机推荐
leetcode005--原地删除数组中的重复元素
SQL statement for adding columns in MySQL table
Installation du compilateur croisé de la plateforme zynq
洛谷P1858 【多人背包】 (背包求前k优解)
Bacterial infection and antibiotic use
IEEE Transactions on industrial information (TII)
Recursive call -- Enumeration of permutations
win10, mysql-8.0.26-winx64. Zip installation
Redis 命令大全
Leetcode002 -- inverts the numeric portion of a signed integer
2020 is coming to an end, special and unforgettable.
Ali's ten-year technical experts jointly created the "latest" jetpack compose project combat drill (with demo)
AWS eks add cluster user or Iam role
程序员抱怨:1万2的工资我真的活不下去了,网友:我3千咋说
What is a data island? Why is there still a data island in 2022?
Go反射法则
IDE Idea 自动编译 与 On Upate Action 、 On Frame Deactivation 的配置
A heavy sword without a blade is a great skill
Shanghai Hangxin technology sharing 𞓜 overview of safety characteristics of acm32 MCU
Migrate from MySQL database to AWS dynamodb