当前位置:网站首页>Spark 算子之partitionBy
Spark 算子之partitionBy
2022-04-23 15:45:00 【逆风飞翔的小叔】
前言
在之前的学习中,我们使用groupBy可以对数据按照指定的key的规则进行分组,设想这样一种场景,如果要对 tuple类型的数据,即key/value类型的数据进行分组该怎么做呢?针对这种的话Spark提供了partitionBy算子解决;
partitionBy
函数签名
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
函数说明
将数据按照指定 Partitioner 重新进行分区。 Spark 默认的分区器是 HashPartitioner
案例展示
将一组数据通过partitionBy之后存储到多个分区文件中
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO 算子 - (Key - Value类型)
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy根据指定的分区规则对数据进行重分区
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
运行上面的代码,执行完成后,观察本地的目录下,可以看到4条数据被分不到不同的分区文件中
版权声明
本文为[逆风飞翔的小叔]所创,转载请带上原文链接,感谢
https://blog.csdn.net/congge_study/article/details/124362294
边栏推荐
- Open source project recommendation: 3D point cloud processing software paraview, based on QT and VTK
- Connect PHP to MSSQL via PDO ODBC
- 时序模型:长短期记忆网络(LSTM)
- php类与对象
- Advantages, disadvantages and selection of activation function
- 大厂技术实现 | 行业解决方案系列教程
- MetaLife与ESTV建立战略合作伙伴关系并任命其首席执行官Eric Yoon为顾问
- Explanation of redis database (IV) master-slave replication, sentinel and cluster
- pgpool-II 4.3 中文手册 - 入门教程
- Pgpool II 4.3 Chinese Manual - introductory tutorial
猜你喜欢
大型互联网为什么禁止ip直连
IronPDF for . NET 2022.4.5455
山寨版归并【上】
服务器中毒了怎么办?服务器怎么防止病毒入侵?
JVM - Chapter 2 - class loader subsystem
Cookie&Session
MySQL Cluster Mode and application scenario
API IX JWT auth plug-in has an error. Risk announcement of information disclosure in response (cve-2022-29266)
Why is IP direct connection prohibited in large-scale Internet
CVPR 2022 优质论文分享
随机推荐
Go concurrency and channel
删除字符串中出现次数最少的字符
Connectez PHP à MySQL via aodbc
Application of Bloom filter in 100 million flow e-commerce system
What is CNAs certification? What are the software evaluation centers recognized by CNAs?
Go language slice, range, set
Node. JS ODBC connection PostgreSQL
Why disable foreign key constraints
Go语言条件,循环,函数
Upgrade MySQL 5.1 to 5.69
单体架构系统重新架构
[leetcode daily question] install fence
pywintypes. com_ Error: (- 2147221020, 'invalid syntax', none, none)
IronPDF for . NET 2022.4.5455
控制结构(二)
What are the mobile app software testing tools? Sharing of third-party software evaluation
Advantages, disadvantages and selection of activation function
Go language, array, pointer, structure
Date date calculation in shell script
【AI周报】英伟达用AI设计芯片;不完美的Transformer要克服自注意力的理论缺陷