当前位置:网站首页>Installation and deployment of Flink and wordcount test
Installation and deployment of Flink and wordcount test
2022-04-23 04:41:00 【Z-hhhhh】
One 、 Local mode
Simulate locally in a multithreaded manner Flink Multiple roles in .( The development environment does not )
Download address :https://flink.apache.org/downloads.html
Here is the download :flink-1.13.0-bin-scala_2.12.tgz
Upload to common location , Then decompress .
start-up :
Switch to flink Of bin Under the table of contents , perform ./start-cluster.sh, Then view the process .
Two 、Standalone Independent cluster mode
( If you do the first step first , Remember to stop the service first ,stop-cluster.sh)
- Upload 、 decompression tar package .
- Modify the configuration file
cd conf/flink-conf.yaml
Appoint Flink colony JobManager RPC mailing address .
( My environment here is three machines , Respectively master,slave1,slave2)
jobmanager.rpc.address: master
Be careful :flink-conf.yaml Middle configuration key/value It's time “:” You need a space after , Otherwise, the configuration will not take effect .
Explanation of other important configurations :
# jobmanager and taskmanager Communication port number
jobmanager.rpc.port: 6123
# JobManager Total process memory size
jobmanager.memory.process.size: 1600m
# At present taskmanager How much memory does the whole process occupy
taskmanager.memory.process.size: 1728m
# Every taskmanager What can be provided slots
taskmanager.numberOfTaskSlots: 1
# The default parallelism
parallelism.default: 1
#jobmanager Failover strategy ,1.9 New properties that appear after , Regional recovery strategy
jobmanager.execution.failover-strategy: region
#flink web The port number of the interface
rest.port: 8081
vi conf/workers
Appoint Flink colony TaskManager.
slave1
slave2
- Send to the other two nodes .
scp -r flink-1.13.0 slave1:$PWD
scp -r flink-1.13.0 slave2:$PWD
- Configure environment variables
vi /etc/profile
export FLINK_HOME=/usr/software/flink-1.13.0
export PATH=$PATH:$FLINK_HOME/bin
source /etc/profile
-
Start cluster
-
see Flink Node process
Can pass master:8081 visit Flink Of web Interface
3、 ... and 、Standalone HA
From the previous architecture, we can find that JobManager There is an obvious single point problem (SPOF,single point of failure).JobManager Responsible for task scheduling and resource allocation , once JobManager There was an accident , The consequences can be imagined .
We have the help of Zookeeper With the help of the , One Standalone Of Flink The cluster will have multiple live JobManager, Only one of them is working , Others are in Standby state . When at work JobManager After losing the connection ( Such as downtime or Crash),Zookeeper From Standby Choose a new one JobManager To take over Flink colony .
-
Cluster planning
master:JobManager+TaskManager
slave1: JobManager+TaskManager
slave2: JobManager -
Modify the configuration file
vi masters
master:8081
slave1:8081
vi worksers
master
slave1
slave2
vi flink-conf.yaml
Add the following
state.backend: filesystem
state.backend.fs.checkpointdir: hdfs://master:9000/flink-checkpoints
high-availability: zookeeper
high-availability.storageDir: hdfs://master:9000/flink/ha/
high-availability.zookeeper.quorum: master:2181,slave1:2181,slave2:2181
- Sync flink Profile directory conf To the other two nodes
scp -r conf slave1:$PWD
scp -r conf slave2:$PWD
-
modify slave1 Upper jobmanager mailing address
vi flink-conf.yaml
jobmanager.rpc.address: slave1 -
start-up
start-up HDFS
start-up Zookeeper
take flink-shaded-hadoop-2-uber-2.7.5-10.0.jar copy to flink Of lib Under the table of contents , All three nodes are copied .
start-up flink
- Check the process
7. test
yum install -y nc
nc -lk 1234
Four 、Flink on yarn
4.1.1、flink And yarn Interaction
Use... In the actual development process Flink on yarn There are many modes
4.1.2、 To configure
close yarn Memory check for ,yarn-site.xml. And distribute it to other nodes .
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
distribution
scp -r yarn-site.xml slave1:$PWD
scp -r yarn-site.xml slave2:$PWD
Whether to start a thread to check the amount of virtual memory that each task is using , If the task exceeds the assigned value , Kill it directly , The default is true. In this, we need to shut down , Because for flink Use yarn In mode , It's easy to have memory overruns , This is the time yarn Will automatically kill job.
Restart all services
dhfs
yarn
zookeeper
flink
4.2、Session Pattern
- characteristic : You need to apply for resources in advance , start-up JobManager and TaskManger
- advantage : You don't need to apply for resources every time you submit an assignment , Instead, use resources that have been applied for , So as to improve the execution efficiency
- shortcoming : After the job execution is completed , Resources will not be released , Therefore, it will always occupy system resources
- Application scenarios : It is suitable for situations where homework submission is frequent , Scenes with more small assignments
4.2.1、 Case study
yarn-session.sh( Open up resources ) + flink run( Submit tasks )
stay yarn Previous start one Flink conversation , Execute the following command :
yarn-session.sh -n 2 -tm 800 -s 1 -d flink run /usr/software/flink-1.13.0/examples/batch/WordCount.jar
Parameter description :
# -n To apply for 2 A container , This is how many taskmanager
# -tm Represent each TaskManager The memory size of
# -s Represent each TaskManager Of slots Number
# -d Indicates that the later program mode is running
stay yarn Of 8088 The interface can see Flink session cluster Running .
Task to complete
4.3、 Per-job Pattern
- characteristic : Every time you submit an assignment, you need to apply for a resource
- advantage : Job run completed , Resources will be released immediately , It will not occupy system resources all the time
- shortcoming : Every time you submit an assignment, you need to apply for resources , It will affect the execution efficiency , Because it takes time to apply for resources
- Application scenarios : Suitable for scenes with less homework 、 Big homework scene
4.3.1、 Case study
Submit directly job
flink run -m yarn-cluster -yjm 1024 -ytm 1024 /usr/software/flink-1.13.0/examples/batch/WordCount.jar
Parameter description
# -m jobmanager The address of
# -yjm 1024 Appoint jobmanager Memory information
# -ytm 1024 Appoint taskmanager Memory information
Check the specific parameter description :
flink --help
5、 ... and 、Flink test WordCount
-
install netcat
yum install -y nc -
Turn on nc
nc -lk 1314 -
Count the word frequency input by the port
flink run /usr/software/flink-1.13.0/examples/streaming/SocketWindowWordCount.jar --hostname master --port 1314
stay 1314 Random input at terminal
ctrl+c After that , The other end prompts that the task is completed
- Check the statistics
stay Flink Of TaskManager Node log Under the table of contents . View to .out Final document .
版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122651.html
边栏推荐
- Open the past and let's start over.
- 协程与多进程的完美结合
- Programmers complain: I really can't live with a salary of 12000. Netizen: how can I say 3000
- IDE idea automatic compilation and configuration of on update action and on frame deactivation
- Recommended scheme for national production of electronic components of wireless keyboard
- mysql ,binlog 日志查询
- Inverse system of RC low pass filter
- 基于英飞凌MCU GTM模块的无刷电机驱动方案开源啦
- 优麒麟 22.04 LTS 版本正式发布 | UKUI 3.1开启全新体验
- 2020 is coming to an end, special and unforgettable.
猜你喜欢
HMS Core Discovery第14期回顾长文|纵享丝滑剪辑,释放视频创作力
数据孤岛是什么?为什么2022年仍然存在数据孤岛?
AWS eks add cluster user or Iam role
win10, mysql-8.0.26-winx64. Zip installation
Supplément: annotation
指纹Key全国产化电子元件推荐方案
Programmers complain: I really can't live with a salary of 12000. Netizen: how can I say 3000
Installation du compilateur croisé de la plateforme zynq
Detailed explanation of life cycle component of jetpack
229. 求众数 II
随机推荐
兼容NSR20F30NXT5G的小体积肖特基二极管
thymeleaf th:value 为null时报错问题
Fusobacterium -- symbiotic bacteria, opportunistic bacteria, oncobacterium
leetcode005--原地删除数组中的重复元素
Leetcode001 -- returns the subscript of the array element whose sum is target
Luogu p1858 [multi person knapsack] (knapsack seeking the top k optimal solution)
递归调用--排列的穷举
数据孤岛是什么?为什么2022年仍然存在数据孤岛?
QML进阶(四)-绘制自定义控件
HMS Core Discovery第14期回顾长文|纵享丝滑剪辑,释放视频创作力
Leetcode002 -- inverts the numeric portion of a signed integer
Key points of AWS eks deployment and differences between console and eksctl creation
华为机试--高精度整数加法
Leetcode009 -- search the target value in the array with binary search
io.Platform.packageRoot; // ignore: deprecated_member_use
IEEE Transactions on systems, man, and Cybernetics: Notes for systems (TSMC)
Kotlin. The binary version of its metadata is 1.6.0, expected version is 1.1.15.
RC低通滤波器的逆系统
Improving 3D object detection with channel wise transformer
Eight misunderstandings that should be avoided in data visualization