当前位置:网站首页>Spark 算子之partitionBy
Spark 算子之partitionBy
2022-04-23 15:45:00 【逆风飞翔的小叔】
前言
在之前的学习中,我们使用groupBy可以对数据按照指定的key的规则进行分组,设想这样一种场景,如果要对 tuple类型的数据,即key/value类型的数据进行分组该怎么做呢?针对这种的话Spark提供了partitionBy算子解决;
partitionBy
函数签名
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
函数说明
将数据按照指定 Partitioner 重新进行分区。 Spark 默认的分区器是 HashPartitioner
案例展示
将一组数据通过partitionBy之后存储到多个分区文件中
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO 算子 - (Key - Value类型)
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy根据指定的分区规则对数据进行重分区
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
运行上面的代码,执行完成后,观察本地的目录下,可以看到4条数据被分不到不同的分区文件中
版权声明
本文为[逆风飞翔的小叔]所创,转载请带上原文链接,感谢
https://blog.csdn.net/congge_study/article/details/124362294
边栏推荐
猜你喜欢
Neodynamic Barcode Professional for WPF V11.0
CAP定理
[leetcode daily question] install fence
Independent operation smart farm Innovation Forum
移动金融(自用)
Mobile finance (for personal use)
时序模型:门控循环单元网络(GRU)
Cookie&Session
MySQL optimistic lock to solve concurrency conflict
Large factory technology implementation | industry solution series tutorials
随机推荐
Introduction to dynamic programming of leetcode learning plan day3 (198213740)
大厂技术实现 | 行业解决方案系列教程
Modèle de Cluster MySQL et scénario d'application
cadence SPB17. 4 - Active Class and Subclass
What are the mobile app software testing tools? Sharing of third-party software evaluation
WPS brand was upgraded to focus on China. The other two domestic software were banned from going abroad with a low profile
What if the server is poisoned? How does the server prevent virus intrusion?
Codejock Suite Pro v20.3.0
ICE -- 源码分析
Deeply learn the skills of parameter adjustment
String sorting
How did the computer reinstall the system? The display has no signal
Neodynamic Barcode Professional for WPF V11. 0
移动金融(自用)
多级缓存使用
GFS distributed file system (Theory)
Temporal model: long-term and short-term memory network (LSTM)
Upgrade MySQL 5.1 to 5.611
Large factory technology implementation | industry solution series tutorials
Node.js ODBC连接PostgreSQL