当前位置:网站首页>Spark FAQ sorting - must see before interview
Spark FAQ sorting - must see before interview
2022-04-23 04:41:00 【Z-hhhhh】
One 、job、stage、Task What is the relationship between ?
-
One job It can contain more than one stage
-
One stage Contains multiple task
Two 、job、stage、Task What is the relationship between ?
- Every time a task is submitted , It creates a job, That is to call action Operator will create job【 When the operator is called, the return value is not RDD Type can be classified as Action operator 】
- Divide according to wide dependence and narrow dependence stage, If it's broad dependence , Just add a new one stage
- Task The number is actually the number of partitions
3、 ... and 、 What is wide dependence 、 Narrow dependence ?
- If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】 If a father RDD The partition is divided into several subdomains RDD The use of , It's just wide dependence 【 Superbirth 】
- If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】 If a father RDD The partition is only used by one child RDD Partition use , It's narrow dependence 【 only 】
- abstract class Dependencyabstract class Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
- class OneToOneDependency extend NarrowDependency Implement abstract methods getParents()
- class RangeDependency extend NarrowDependency Implement abstract methods getParents()class RangeDependency extend NarrowDependency Implement abstract methods getParents()
- class ShuffleDependency extend Dependencyclass ShuffleDependency extend Dependency
- abstract class NarrowDependency extend Dependency There is an abstract way ,getParents()
Four 、Action Operator and Transformation What is an operator , List some ?
-
Action The operator will create job, Will be executed immediately . for example :take ,first,collect,foreach,foreachPartition.
-
Transformation Not immediately , But there will be some dependencies recorded , And functions . for example :map,filter,flatMap,reduceByKey,groupByKey wait .
5、 ... and 、reduceByKey and groupByKey What's the difference? ?
reduceByKey:reduceByKey The results will be sent to reducer I've been talking to everyone before mapper In Ben
To carry out merge, It's a bit like in MapReduce Medium combiner. The advantage of doing so is ,
stay map Do it once reduce after , The amount of data will be greatly reduced , This reduces transmission , Guarantee reduce
The end can calculate the result faster .
groupByKey:groupByKey For each RDD Medium value Values are aggregated to form a sequence
(Iterator), This operation occurred at reduce End , Therefore, it is bound to transmit all data through the network ,
Cause unnecessary waste . At the same time, if the amount of data is very large , It may also cause OutOfMemoryError.
Conclusion :
So we're doing a lot of data reduce It is recommended to use reduceByKey. It can not only raise the speed
degree , It can also prevent the use of groupByKey Memory overflow caused by .
6、 ... and 、RDD Five attributes

7、 ... and 、Spark The architecture and job submission process of

版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122610.html
边栏推荐
- Recursive call -- Enumeration of permutations
- 520.检测大写字母
- 华为机试--高精度整数加法
- Create VPC in AWS console (no plate)
- 无线键盘全国产化电子元件推荐方案
- AWS EKS 部署要点以及控制台与eksctl创建的差异
- mysql ,binlog 日志查询
- Chapter 4 - understanding standard equipment documents, filters and pipelines
- 协程与多进程的完美结合
- Huawei machine test -- high precision integer addition
猜你喜欢
![[paper reading] [3D target detection] point transformer](/img/c5/b1fe5f206b5fe6e4dcd88dce11592d.png)
[paper reading] [3D target detection] point transformer

无线充电全国产化电子元件推荐方案

Coinbase: basic knowledge, facts and statistics about cross chain bridge

基于英飞凌MCU GTM模块的无刷电机驱动方案开源啦

MySQL queries users logged in for at least N consecutive days

Ali's ten-year technical experts jointly created the "latest" jetpack compose project combat drill (with demo)

【论文阅读】【3d目标检测】Improving 3D Object Detection with Channel-wise Transformer

Kotlin. The binary version of its metadata is 1.6.0, expected version is 1.1.15.

MYSQL去重方法汇总

QML进阶(五)-通过粒子模拟系统实现各种炫酷的特效
随机推荐
三十六计是什么
2020 is coming to an end, special and unforgettable.
MYSQL去重方法汇总
Experience summary and sharing of the first prize of 2021 National Mathematical Modeling Competition
Supplement: Annotation
Small volume Schottky diode compatible with nsr20f30nxt5g
shell wc (统计字符数量)的基本使用
How to regulate intestinal flora? Introduction to common natural substances, probiotics and prebiotics
Mysql, binlog log query
229. Find mode II
leetcode008--实现strStr()函数
Summary of Android development posts I interviewed in those years (attached test questions + answer analysis)
QML进阶(四)-绘制自定义控件
AWS EKS 部署要点以及控制台与eksctl创建的差异
Coinbase:关于跨链桥的基础知识、事实和统计数据
Last day of 2017
Chlamydia infection -- causes, symptoms, treatment and Prevention
FAQ of foreign lead and alliance Manager
第四章 --- 了解标准设备文件、过滤器和管道
[paper reading] [3D object detection] voxel transformer for 3D object detection