当前位置:网站首页>Analyzes how Flink task than YARN container memory limit

Analyzes how Flink task than YARN container memory limit

2022-08-11 10:55:00 InfoQ

The author of this article is Xie Lei, software development engineer of the big data team of China Mobile Cloud Competence Center,文章针对Flink任务在YARNThe problem that the cluster will be restarted after a period of time,Perform step-by-step analysis and simulation reproduction,And give a memory leak repair program,供大家参考.

问题背景

业务的 Flink 任务在 YARN The cluster will automatically restart after a period of time,For this kind of problem in general,已经轻车熟路,There are some ideas to try:
1.Check for memory overflow(堆内/堆外)2.Occasionally in the program bug 导致3.YARN Cluster nodes go offline and offline4.FLINK Large windows are used in the program,For example hourly windows,Data exceeds memory5....
同时在我们的 YARN The cluster has the physical memory detection option turned on,When a process uses more physical memory than requested memory,YARN The cluster will be proactive kill Drop the process of the task,to ensure the stability of the cluster.
<property>
<name>yarn.nodemanager.pmem-check-enabled</name><value>true</value>
</property>

异常信息

Let's start with the exception information,通过排查 JobManager 的运行日志,We will find the following exception stack information.
2020-04-15 01:59:33,000 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_e05_1585737758019_0901_01_000003 because: Container [pid=3156625,containerID=container_e05_1585737758019_0901_01_000003] is running beyond physical memory limits. Current usage: 6.1 GB of 6 GB physical memory used; 14.5 GB of 28 GB virtual memory used. Killing container.Dump of the process-tree for container_e05_1585737758019_0901_01_000003 :|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE|- 3156625 3156621 3156625 3156625 (bash) 0 0 15441920 698 /bin/bash -c /usr/java/default/bin/java -Xms4148m -Xmx4148m -XX:MaxDirectMemorySize=1996m -javaagent:lib/aspectjweaver-1.9.1.jar -Dlog.file=/data_sdh/nodemanager/log/application_1585737758019_0901/container_e05_1585737758019_0901_01_000003/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> /data_sdh/nodemanager/log/application_1585737758019_0901/container_e05_1585737758019_0901_01_000003/taskmanager.out 2> /data_sdh/nodemanager/log/application_1585737758019_0901/container_e05_1585737758019_0901_01_000003/taskmanager.err|- 3156696 3156625 3156625 3156625 (java) 12263 1319 15553892352 2119601 /usr/java/default/bin/java -Xms4148m -Xmx4148m -XX:MaxDirectMemorySize=1996m -javaagent:lib/aspectjweaver-1.9.1.jar -Dlog.file=/data_sdh/nodemanager/log/application_1585737758019_0901/container_e05_1585737758019_0901_01_000003/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner --configDir .
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
from the exception message,关键信息 is running beyond physical memory limits. Current usage: 6.1 GB of 6 GB physical memory used; 14.5 GB of 28 GB virtual memory used. Killing container,Displays physical memory exceeded 6GB 内存,被 YARN detection mechanism KILL 掉了.
Since the memory is over,先看看 Flink The approximate execution of the task execution,简单从 UI 分析,Such a simple procedure!
null

A few questions to be resolved

1.Such a simple procedure,Why does it cause the process to exceed 6GB(RSS),不可思议
2.YARN What is the memory detection mechanism of ?How to get process memory information?
3.JVM 参数 -Xms4148m -Xmx4148m -XX:MaxDirectMemorySize=1996m 堆内存 + 堆外内存 = 6GB,Why didn't I see it in the log OutofMemory 相关的异常信息?
4.RSS 怎么会超过 6GB 呢?

分析过程

YARN内存检测机制
通过HadoopSource code analysis can know,YARN 是解析 /proc/<pid>/stat文件,获取 RSS 的值和 container Compare the memory requested during initialization,如果 RSS The value exceeds the requested value,则 KILL 进程,and print out the information.
JVM related routine testing
GC
The test only passes GC 日志状态,very normal state,没问题.
[[email protected] ~]$ jstat -gcutil 12984 1000S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
99.96 0.00 79.06 4.78 94.92 89.38 2 0.164 0 0.000 0.16499.96 0.00 86.77 4.78 94.92 89.38 2 0.164 0 0.000 0.16499.96 0.00 94.48 4.78 94.92 89.38 2 0.164 0 0.000 0.1640.00 99.98 1.95 10.24 94.93 89.38 3 0.255 0 0.000 0.2550.00 99.98 9.77 10.24 94.93 89.38 3 0.255 0 0.000 0.2550.00 99.98 17.58 10.24 94.93 89.38 3 0.255 0 0.000 0.2550.00 99.98 25.40 10.24 94.93 89.38 3 0.255 0 0.000 0.2550.00 99.98 35.16 10.24 94.93 89.38 3 0.255 0 0.000 0.2550.00 99.98 41.02 10.24 94.93 89.38 3 0.255 0 0.000 0.255
Dump 内存
象征性 dump memory check,不能使用 -dump:live 会做一次 FullGC,The result does not necessarily reflect the correct memory.
jmap -dump:format=b,file=heap1.bin 12984
Don't pass first Eclipse-MAT 打开,Just look at the file size.
[[email protected] ~]$ ll -lh heap1.bin
-rw------- 1 dcadmin datacentergroup 1016M Apr 15 09:15 heap1.bin
同时看下 linux,How much physical memory is actually used by this program top -p 12984
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie%Cpu(s): 0.6 us, 0.1 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem : 16394040 total, 5676652 free, 8114832 used, 2602556 buff/cacheKiB Swap: 0 total, 0 free, 0 used. 7523032 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12984 dcadmin 20 0 16.258g 7.053g 15104 S 13.3 45.1 1:27.92 java
划重点!
•JVM 的堆内存 1GB 以内,RSS 用掉了 7.053GB,粗浅的理解,That is about out-of-heap memory usage 6GB 左右,看上去没毛病,但别忘了 JVM 启动时,包含参数-XX:MaxDirectMemorySize=1996m 呃,没报 OutofMemoryError.
•RSS Memory is still growing slowly,没有下降趋势!

重点分析 RSS 内存的问题

•分析 JVM Detailed allocation of off-heap memory
•分析 RSS 中 7GB What exactly is in memory
Analyze off-heap memory allocations in
JVM 启动参数添加 -XX:NativeMemoryTracking=summary Analyze off-heap memory leaks.In fact, nothing can be analyzed from the following basic analysis,因为没有报 OutofMemoryError,所以需要从 RSS Start with memory.
[[email protected] ~]$ jcmd 21567 VM.native_memory summary
21567:
Native Memory Tracking:
Total: reserved=5815901KB, committed=4521641KB
- Java Heap (reserved=4247552KB, committed=4247552KB)
(mmap: reserved=4247552KB, committed=4247552KB) 


- Class (reserved=1076408KB, committed=26936KB)
 (classes #1206)
 (malloc=19640KB #792) 
 (mmap: reserved=1056768KB, committed=7296KB) 

- Thread (reserved=42193KB, committed=42193KB)
 (thread #42)
 (stack: reserved=42016KB, committed=42016KB)
 (malloc=129KB #225) 
 (arena=48KB #82)

- Code (reserved=250040KB, committed=5252KB)
 (malloc=440KB #1026) 
 (mmap: reserved=249600KB, committed=4812KB)

- GC (reserved=177105KB, committed=177105KB)
 (malloc=21913KB #164) 
 (mmap: reserved=155192KB, committed=155192KB) 

- Compiler (reserved=150KB, committed=150KB)
 (malloc=19KB #61) 
 (arena=131KB #3)

- Internal (reserved=19864KB, committed=19864KB)
 (malloc=19832KB #2577) 
 (mmap: reserved=32KB, committed=32KB)

- Symbol (reserved=2285KB, committed=2285KB)
 (malloc=1254KB #255) 
 (arena=1031KB #1)

- Native Memory Tracking (reserved=88KB, committed=88KB)
 (malloc=6KB #64) 
 (tracking overhead=83KB)

- Arena Chunk (reserved=215KB, committed=215KB)
 (malloc=215KB)
分析 RSS 中 7GB What exactly is in memory
分析 linux The Swiss Army Knife of Memory,gdb 工具.
yum install -y gdb
通过 pmap Command to view and sort,The memory address space is shown below,RSS 占用等信息.
[[email protected] ~]$ pmap -x 21567 | sort -n -k3 | more
---------------- ------- ------- ------- 
0000000000400000 0 0 0 r-x-- java
0000000000600000 0 0 0 rw--- java
0000000000643000 0 0 0 rw--- [ anon ]
00000006bcc00000 0 0 0 rw--- [ anon ]
00000007c00e0000 0 0 0 ----- [ anon ]
...
...
00007fb2ec000000 65508 36336 36336 rw--- [ anon ]
00007fb3c4000000 65536 41140 41140 rw--- [ anon ]
00007fb2d8000000 65508 46692 46692 rw--- [ anon ]
00007fb2e4000000 65508 47640 47640 rw--- [ anon ]
00007fb2e0000000 65508 48596 48596 rw--- [ anon ]
00007fb2dc000000 65512 49088 49088 rw--- [ anon ]
00007fb2cc000000 65508 50380 50380 rw--- [ anon ]
00007fb2d4000000 65508 53476 53476 rw--- [ anon ]
00007fb238000000 131056 59668 59668 rw--- [ anon ]
00000006bcc00000 4248448 1866536 1866536 rw--- [ anon ]
通过上面的信息,Only the memory start address,There is no termination address,需要到 maps 文件中.
[[email protected] ~]$ cat /proc/21567/maps | grep 7fb2dc
7fb2dbff9000-7fb2dc000000 ---p 00000000 00:00 0
7fb2dc000000-7fb2dfffa000 rw-p 00000000 00:00 0
通过 gdb 命令 attach 进程,并 dump 内存.
gdb attach 21567
dump memory mem.bin 0x7fb2dc000000 0x7fb2dfffa000
通过 strings 命令查看 mem.bin.
strings mem.bin | more
The whole screen is swiping content similar to the configuration file.
...
xcx.userprofile.kafkasource.bootstrap.servers=xxx-01:9096,xxx-02:9096,xxx-03:9096
xcx.userprofile.kafkasource.topic=CENTER_search_trajectory_xcx
xcx.userprofile.kafkasource.group=search_flink_xcx_userprofile_online
app.userprofile.kafkasource.bootstrap.servers=xxx-01:9096,xxx-02:9096,xxx-03:9096
app.userprofile.kafkasource.topic=CENTER_search_trajectory_app
app.userprofile.kafkasource.group=search_flink_app_userprofile_online
xcx.history.kafkasource.bootstrap.servers=xxx-01:9096,xxx-02:9096,xxx-03:9096
xcx.history.kafkasource.topic=CENTER_search_trajectory_xcx
...

分析/简化业务代码
From the results of the above analysis,There should be a place in the code that keeps loading the configuration file into memory,Simplify the business code.Welcome to new questions,Although the code should not load the configuration file every time,But not to consume physical memory 6GB 吧,So there is no escape analysis here,Theory will actively release,为什么没有呢?
public class StreamingJob {
 public static void main(String[] args) throws Exception {
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 DataStream<String> text = env.socketTextStream(&quot;10.101.52.18&quot;, 9909);

 text.map(new MapFunction<String, String>() {
 @Override
 public String map(String value) throws Exception {
 TestFun testFun = new TestFun();
 testFun.update();
 return null;
 }
 });

 env.execute();
 }

 static class TestFun {
 Properties properties;
 public TestFun() throws IOException {
 properties = new Properties();
 properties.load(TestFun.class.getClassLoader().getResourceAsStream(&quot;application.properties&quot;));
 }

 public void update() {}
 }
}

模拟复现

不管怎样,虽然怀疑,试试吧,看上去没有问题.
Java 程序模拟
public class TestJar {

 public static void main(String[] args) throws Exception {

 while (true) {

 Properties properties = new Properties();

 properties.load(TestJar.class.getClassLoader().getResourceAsStream(&quot;application.properties&quot;));

 }

 }

}
启动测试程序
java -Xmx512m -Xms512m -XX:MaxDirectMemorySize=1996m -cp test.jar com.ly.search.job.TestJob
观察程序的 RSS Memory consumption can be seen RSS The memory is gradually increased,Not a drop at all,So here is the problem,泄漏了.

Memory leak fixes and extensions

修复方案
•及时关闭 stream
public class TestJar {
 public static void main(String[] args) throws Exception {
 while (true) {
 Properties properties = new Properties();
 InputStream inStream = TestJar.class.getClassLoader().getResourceAsStream(&quot;application.properties&quot;)
 properties.load(inStream);
 inStream.close();
 }
 }
}
•通过 System.gc() 也有效果,及时清理 Finalizer(This is not advisable,For discussion only), 参考《JVM 源码分析之 FinalReference 完全解读》

public class TestJar {
 public static void main(String[] args) throws Exception {
 while (true) {
 Properties properties = new Properties();
 InputStream inStream = TestJar.class.getClassLoader().getResourceAsStream(&quot;application.properties&quot;)
 properties.load(inStream);
 System.gc();
 }
 }
}

null
扩展研究
•XXX.class.getClassLoader().getResourceAsStream 底层是 URLClassLoader + JarURLConnection,即

// 等价于,内存溢出
ClassLoader classLoader = TestJar.class.getClassLoader();
URL resource = classLoader.getResource(&quot;application.properties&quot;);
URLConnection urlConnection = resource.openConnection();
urlConnection.getInputStream();

// 等价于,内存溢出
URL url = new URL(&quot;jar:file:/home/dcadmin/test.jar!/com/ly/search/job/StreamingJob.class&quot;);
JarURLConnection conn = (JarURLConnection) url.openConnection();
conn.getInputStream();

// 不等价于,Memory does not overflow
URL url = new URL(&quot;jar:file:/home/dcadmin/test.jar!/com/ly/search/job/StreamingJob.class&quot;);
JarURLConnection conn = (JarURLConnection) url.openConnection();
conn.setDefaultUseCaches(false);
conn.getInputStream();

// 不等价于,Memory does not overflow
URL fileURL = new File(&quot;test.jar&quot;).toURI().toURL();

FileURLConnection fileUrlConn = (FileURLConnection) fileURL.openConnection();
fileUrlConn.connect();
fileUrlConn.getInputStream();
•JarURLConnection 和 FileURLConnection 的区别在于 JarURLConnection 底层需要调用 JarFile 打开 ZipFile 的 inputstream,Involves calls to the underlying system,So physical memory is consumed,导致 RSS 增大
•General inspection of off-heap memory DirectByteBufferBut in this case,There is indeed no overflow off the heap,It's just that the OS crashed
•堆 / There is no excess memory outside the heap,所以触发不了 GC,The leak slowly spread,until the machine memory is exceeded
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/223/202208111041509125.html