当前位置:网站首页>Spark FAQ sorting - must see before interview
Spark FAQ sorting - must see before interview
2022-04-23 04:41:00 【Z-hhhhh】
One 、job、stage、Task What is the relationship between ?
-
One job It can contain more than one stage
-
One stage Contains multiple task
Two 、job、stage、Task What is the relationship between ?
- Every time a task is submitted , It creates a job, That is to call action Operator will create job【 When the operator is called, the return value is not RDD Type can be classified as Action operator 】
- Divide according to wide dependence and narrow dependence stage, If it's broad dependence , Just add a new one stage
- Task The number is actually the number of partitions
3、 ... and 、 What is wide dependence 、 Narrow dependence ?
- If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】 If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】
- If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】 If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】
- abstract class Dependencyabstract class Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
- class OneToOneDependency extend NarrowDependency Implement abstract methods getParents()
- class RangeDependency extend NarrowDependency Implement abstract methods getParents()class RangeDependency extend NarrowDependency Implement abstract methods getParents()
- class ShuffleDependency extend Dependencyclass ShuffleDependency extend Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
Four 、Action Operator and Transformation What is an operator , List some ?
-
Action The operator will create job, Will be executed immediately . for example :take ,first,collect,foreach,foreachPartition.
-
Transformation Not immediately , But there will be some dependencies recorded , And functions . for example :map,filter,flatMap,reduceByKey,groupByKey wait .
5、 ... and 、reduceByKey and groupByKey What's the difference? ?
reduceByKey:reduceByKey The results will be sent to reducer I've been talking to everyone before mapper In Ben
To carry out merge, It's a bit like in MapReduce Medium combiner. The advantage of doing so is ,
stay map Do it once reduce after , The amount of data will be greatly reduced , This reduces transmission , Guarantee reduce
The end can calculate the result faster .
groupByKey:groupByKey For each RDD Medium value Values are aggregated to form a sequence
(Iterator), This operation occurred at reduce End , Therefore, it is bound to transmit all data through the network ,
Cause unnecessary waste . At the same time, if the amount of data is very large , It may also cause OutOfMemoryError.
Conclusion :
So we're doing a lot of data reduce It is recommended to use reduceByKey. It can not only raise the speed
degree , It can also prevent the use of groupByKey Memory overflow caused by .
6、 ... and 、RDD Five attributes
7、 ... and 、Spark The architecture and job submission process of
版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122610.html
边栏推荐
- /etc/bash_completion.d目录作用(用户登录立刻执行该目录下脚本)
- Kotlin. The binary version of its metadata is 1.6.0, expected version is 1.1.15.
- 383. Ransom letter
- PIP3 installation requests Library - the most complete pit sorting
- win10, mysql-8.0.26-winx64. Zip installation
- Leetcode008 -- implement strstr() function
- leetcode004--罗马数字转整数
- Nature medicine reveals individual risk factors of coronary artery disease
- eksctl 部署AWS EKS
- leetcode009--用二分查找在数组中搜索目标值
猜你喜欢
补:注解(Annotation)
Coinbase: basic knowledge, facts and statistics about cross chain bridge
补充番外14:cmake实践项目笔记(未完待续4/22)
Installation of zynq platform cross compiler
Experience summary and sharing of the first prize of 2021 National Mathematical Modeling Competition
做数据可视化应该避免的8个误区
QML advanced (IV) - drawing custom controls
Supplement 14: cmake practice project notes (to be continued 4 / 22)
What is a data island? Why is there still a data island in 2022?
The perfect combination of collaborative process and multi process
随机推荐
Apache Bench(ab 压力测试工具)的安装与使用
What is the thirty-six plan
The last day of 2021 is the year of harvest.
Key points of AWS eks deployment and differences between console and eksctl creation
兼容NSR20F30NXT5G的小体积肖特基二极管
Leetcode001 -- returns the subscript of the array element whose sum is target
Chlamydia infection -- causes, symptoms, treatment and Prevention
Shanghai Hangxin technology sharing 𞓜 overview of safety characteristics of acm32 MCU
优麒麟 22.04 LTS 版本正式发布 | UKUI 3.1开启全新体验
做数据可视化应该避免的8个误区
KVM error: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock‘
Leetcode009 -- search the target value in the array with binary search
Inverse system of RC low pass filter
PIP3 installation requests Library - the most complete pit sorting
[paper reading] [3D target detection] point transformer
那些年我面试过的Android开发岗总结(附面试题+答案解析)
顺序表的基本操作
QML advanced (IV) - drawing custom controls
Migrate from MySQL database to AWS dynamodb
QML进阶(五)-通过粒子模拟系统实现各种炫酷的特效