当前位置:网站首页>Partitionby of spark operator
Partitionby of spark operator
2022-04-23 15:45:00 【Uncle flying against the wind】
Preface
In previous studies , We use groupBy The data can be processed according to the specified key Grouping rules , Imagine a scenario like this , If you want to tuple Data of type , namely key/value What should I do to group data of different types ? In response to this Spark Provides partitionBy Operator solution ;
partitionBy
Function signature
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
Function description
Set the data as specified Partitioner Repartitioning . Spark The default comparator is HashPartitioner
The case shows
Pass a set of data through partitionBy Then it is stored in multiple partition files
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - (Key - Value type )
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy Repartition the data according to the specified partition rules
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , After execution , Observe the local directory , You can see 4 Pieces of data cannot be divided into different partition files
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587236.html
边栏推荐
猜你喜欢
Why disable foreign key constraints
Load Balancer
Cookie&Session
贫困的无网地区怎么有钱建设网络?
多级缓存使用
Large factory technology implementation | industry solution series tutorials
JVM - Chapter 2 - class loader subsystem
Pgpool II 4.3 Chinese Manual - introductory tutorial
导入地址表分析(根据库文件名求出:导入函数数量、函数序号、函数名称)
网站压测工具Apache-ab,webbench,Apache-Jemeter
随机推荐
Load Balancer
Accumulation of applet knowledge points
【自娱自乐】构造笔记 week 2
Open source project recommendation: 3D point cloud processing software paraview, based on QT and VTK
单体架构系统重新架构
Control structure (I)
Go concurrency and channel
Multi level cache usage
大型互联网为什么禁止ip直连
mysql乐观锁解决并发冲突
WPS brand was upgraded to focus on China. The other two domestic software were banned from going abroad with a low profile
移动金融(自用)
String sorting
考试考试自用
Use bitnami PostgreSQL docker image to quickly set up stream replication clusters
Explanation of redis database (IV) master-slave replication, sentinel and cluster
Upgrade MySQL 5.1 to 5.611
Go language slice, range, set
【递归之数的拆分】n分k,限定范围的拆分
Independent operation smart farm Innovation Forum