当前位置:网站首页>Spark FAQ sorting - must see before interview
Spark FAQ sorting - must see before interview
2022-04-23 04:41:00 【Z-hhhhh】
One 、job、stage、Task What is the relationship between ?
-
One job It can contain more than one stage
-
One stage Contains multiple task
Two 、job、stage、Task What is the relationship between ?
- Every time a task is submitted , It creates a job, That is to call action Operator will create job【 When the operator is called, the return value is not RDD Type can be classified as Action operator 】
- Divide according to wide dependence and narrow dependence stage, If it's broad dependence , Just add a new one stage
- Task The number is actually the number of partitions
3、 ... and 、 What is wide dependence 、 Narrow dependence ?
- If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】 If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】
- If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】 If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】
- abstract class Dependencyabstract class Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
- class OneToOneDependency extend NarrowDependency Implement abstract methods getParents()
- class RangeDependency extend NarrowDependency Implement abstract methods getParents()class RangeDependency extend NarrowDependency Implement abstract methods getParents()
- class ShuffleDependency extend Dependencyclass ShuffleDependency extend Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
Four 、Action Operator and Transformation What is an operator , List some ?
-
Action The operator will create job, Will be executed immediately . for example :take ,first,collect,foreach,foreachPartition.
-
Transformation Not immediately , But there will be some dependencies recorded , And functions . for example :map,filter,flatMap,reduceByKey,groupByKey wait .
5、 ... and 、reduceByKey and groupByKey What's the difference? ?
reduceByKey:reduceByKey The results will be sent to reducer I've been talking to everyone before mapper In Ben
To carry out merge, It's a bit like in MapReduce Medium combiner. The advantage of doing so is ,
stay map Do it once reduce after , The amount of data will be greatly reduced , This reduces transmission , Guarantee reduce
The end can calculate the result faster .
groupByKey:groupByKey For each RDD Medium value Values are aggregated to form a sequence
(Iterator), This operation occurred at reduce End , Therefore, it is bound to transmit all data through the network ,
Cause unnecessary waste . At the same time, if the amount of data is very large , It may also cause OutOfMemoryError.
Conclusion :
So we're doing a lot of data reduce It is recommended to use reduceByKey. It can not only raise the speed
degree , It can also prevent the use of groupByKey Memory overflow caused by .
6、 ... and 、RDD Five attributes

7、 ... and 、Spark The architecture and job submission process of

版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122610.html
边栏推荐
- leetcode009--用二分查找在数组中搜索目标值
- QML advanced (V) - realize all kinds of cool special effects through particle simulation system
- SQL statement for adding columns in MySQL table
- Recommended scheme for national production of electronic components of wireless keyboard
- AWS EKS添加集群用户或IAM角色
- Eksctl deploying AWS eks
- C language: Advanced pointer
- 阿里十年技术专家联合打造“最新”Jetpack Compose项目实战演练(附Demo)
- IDE Idea 自动编译 与 On Upate Action 、 On Frame Deactivation 的配置
- A lifetime of needs, team collaboration can play this way on cloud nailing applet
猜你喜欢

Supplement: Annotation

/etc/bash_completion.d目录作用(用户登录立刻执行该目录下脚本)

Improving 3D object detection with channel wise transformer

阿里十年技术专家联合打造“最新”Jetpack Compose项目实战演练(附Demo)

Apache Bench(ab 压力测试工具)的安装与使用

Inverse system of RC low pass filter

Recommended scheme for national production of electronic components for wireless charging

229. Find mode II

Summary of Android development posts I interviewed in those years (attached test questions + answer analysis)

针对NFT的网络钓鱼
随机推荐
IEEE Transactions on Systems, Man, and Cybernetics: Systems(TSMC)投稿须知
mysql ,binlog 日志查询
2020 is coming to an end, special and unforgettable.
基于英飞凌MCU GTM模块的无刷电机驱动方案开源啦
MYSQL去重方法汇总
从MySQL数据库迁移到AWS DynamoDB
华为机试--高精度整数加法
Error occurs when thymeleaf th: value is null
QML advanced (V) - realize all kinds of cool special effects through particle simulation system
Coinbase: basic knowledge, facts and statistics about cross chain bridge
Recommended scheme for national production of electronic components of wireless keyboard
程序员抱怨:1万2的工资我真的活不下去了,网友:我3千咋说
383. Ransom letter
Eksctl deploying AWS eks
做数据可视化应该避免的8个误区
IEEE Transactions on Industrial Informatics(TII)投稿须知
Leetcode002 -- inverts the numeric portion of a signed integer
Effects of antibiotics on microbiome and human health
win10, mysql-8.0.26-winx64. Zip installation
1个需求的一生,团队协作在云效钉钉小程序上可以这么玩