当前位置:网站首页>Spark 算子之partitionBy
Spark 算子之partitionBy
2022-04-23 15:45:00 【逆风飞翔的小叔】
前言
在之前的学习中,我们使用groupBy可以对数据按照指定的key的规则进行分组,设想这样一种场景,如果要对 tuple类型的数据,即key/value类型的数据进行分组该怎么做呢?针对这种的话Spark提供了partitionBy算子解决;
partitionBy
函数签名
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
函数说明
将数据按照指定 Partitioner 重新进行分区。 Spark 默认的分区器是 HashPartitioner
案例展示
将一组数据通过partitionBy之后存储到多个分区文件中
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO 算子 - (Key - Value类型)
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy根据指定的分区规则对数据进行重分区
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
运行上面的代码,执行完成后,观察本地的目录下,可以看到4条数据被分不到不同的分区文件中

版权声明
本文为[逆风飞翔的小叔]所创,转载请带上原文链接,感谢
https://blog.csdn.net/congge_study/article/details/124362294
边栏推荐
- Upgrade MySQL 5.1 to 5.611
- Code live collection ▏ software test report template Fan Wen is here
- Advantages, disadvantages and selection of activation function
- 自动化测试框架常见类型▏自动化测试就交给软件测评机构
- Timing model: gated cyclic unit network (Gru)
- php类与对象
- JVM - Chapter 2 - class loader subsystem
- utils.DeprecatedIn35 因升级可能取消,该如何办
- Mobile finance (for personal use)
- fatal error: torch/extension. h: No such file or directory
猜你喜欢

Recommended search common evaluation indicators

Configuration of multi spanning tree MSTP
![[AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention](/img/bf/2b4914276ec1083df697383fec8f22.png)
[AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention

CAP定理

Why is IP direct connection prohibited in large-scale Internet

新动态:SmartMesh和MeshBox的合作新动向

cadence SPB17. 4 - Active Class and Subclass
![[leetcode daily question] install fence](/img/4e/3ac23174f2b3ee867b45a2e748b872.png)
[leetcode daily question] install fence

c语言---字符串+内存函数

幂等性的处理
随机推荐
Multitimer V2 reconstruction version | an infinitely scalable software timer
计算某字符出现次数
One brush 313 sword finger offer 06 Print linked list from end to end (E)
cadence SPB17. 4 - Active Class and Subclass
MySQL Cluster Mode and application scenario
Neodynamic Barcode Professional for WPF V11. 0
一刷314-剑指 Offer 09. 用两个栈实现队列(e)
Do we media make money now? After reading this article, you will understand
Control structure (I)
MySQL optimistic lock to solve concurrency conflict
Cookie&Session
php函数
编译,连接 -- 笔记
Rsync + inotify remote synchronization
IronPDF for . NET 2022.4.5455
Pytorch中named_parameters、named_children、named_modules函数
pywintypes.com_error: (-2147221020, ‘无效的语法‘, None, None)
Use bitnami PostgreSQL docker image to quickly set up stream replication clusters
GFS distributed file system (Theory)
Go语言数组,指针,结构体