当前位置:网站首页>Partitionby of spark operator
Partitionby of spark operator
2022-04-23 15:45:00 【Uncle flying against the wind】
Preface
In previous studies , We use groupBy The data can be processed according to the specified key Grouping rules , Imagine a scenario like this , If you want to tuple Data of type , namely key/value What should I do to group data of different types ? In response to this Spark Provides partitionBy Operator solution ;
partitionBy
Function signature
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
Function description
Set the data as specified Partitioner Repartitioning . Spark The default comparator is HashPartitioner
The case shows
Pass a set of data through partitionBy Then it is stored in multiple partition files
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - (Key - Value type )
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy Repartition the data according to the specified partition rules
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , After execution , Observe the local directory , You can see 4 Pieces of data cannot be divided into different partition files

版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587236.html
边栏推荐
- c语言---字符串+内存函数
- MetaLife与ESTV建立战略合作伙伴关系并任命其首席执行官Eric Yoon为顾问
- Code live collection ▏ software test report template Fan Wen is here
- Go并发和通道
- Advantages, disadvantages and selection of activation function
- 字符串排序
- PHP function
- Go语言数组,指针,结构体
- utils.DeprecatedIn35 因升级可能取消,该如何办
- Recommended search common evaluation indicators
猜你喜欢

网站压测工具Apache-ab,webbench,Apache-Jemeter

时序模型:门控循环单元网络(GRU)

Cap theorem

【开源工具分享】单片机调试助手(示波/改值/日志) - LinkScope

Modèle de Cluster MySQL et scénario d'application

布隆过滤器在亿级流量电商系统的应用

Multitimer V2 reconstruction version | an infinitely scalable software timer

CAP定理

现在做自媒体能赚钱吗?看完这篇文章你就明白了

Today's sleep quality record 76 points
随机推荐
Sorting and replying to questions related to transformer
Codejock Suite Pro v20.3.0
CVPR 2022 优质论文分享
WPS brand was upgraded to focus on China. The other two domestic software were banned from going abroad with a low profile
C language --- advanced pointer
腾讯Offer已拿,这99道算法高频面试题别漏了,80%都败在算法上
Go language, condition, loop, function
Deletes the least frequently occurring character in the string
Accumulation of applet knowledge points
Neodynamic Barcode Professional for WPF V11.0
Open source project recommendation: 3D point cloud processing software paraview, based on QT and VTK
IronPDF for .NET 2022.4.5455
Cookie&Session
基础贪心总结
Go语言条件,循环,函数
单体架构系统重新架构
fatal error: torch/extension. h: No such file or directory
One brush 313 sword finger offer 06 Print linked list from end to end (E)
C language --- string + memory function
Codejock Suite Pro v20. three