当前位置:网站首页>Partitionby of spark operator
Partitionby of spark operator
2022-04-23 15:45:00 【Uncle flying against the wind】
Preface
In previous studies , We use groupBy The data can be processed according to the specified key Grouping rules , Imagine a scenario like this , If you want to tuple Data of type , namely key/value What should I do to group data of different types ? In response to this Spark Provides partitionBy Operator solution ;
partitionBy
Function signature
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
Function description
Set the data as specified Partitioner Repartitioning . Spark The default comparator is HashPartitioner
The case shows
Pass a set of data through partitionBy Then it is stored in multiple partition files
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - (Key - Value type )
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy Repartition the data according to the specified partition rules
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , After execution , Observe the local directory , You can see 4 Pieces of data cannot be divided into different partition files

版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587236.html
边栏推荐
- Cookie&Session
- Mobile finance (for personal use)
- ICE -- 源码分析
- shell脚本中的DATE日期计算
- 控制结构(一)
- Do we media make money now? After reading this article, you will understand
- Why disable foreign key constraints
- Common types of automated testing framework ▏ automated testing is handed over to software evaluation institutions
- Basic greedy summary
- 多级缓存使用
猜你喜欢

MetaLife与ESTV建立战略合作伙伴关系并任命其首席执行官Eric Yoon为顾问

服务器中毒了怎么办?服务器怎么防止病毒入侵?

Configuration of multi spanning tree MSTP
![[AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention](/img/bf/2b4914276ec1083df697383fec8f22.png)
[AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention

网站压测工具Apache-ab,webbench,Apache-Jemeter

大厂技术实现 | 行业解决方案系列教程

WPS品牌再升级专注国内,另两款国产软件低调出国门,却遭禁令

导入地址表分析(根据库文件名求出:导入函数数量、函数序号、函数名称)

携号转网最大赢家是中国电信,为何人们嫌弃中国移动和中国联通?

Modèle de Cluster MySQL et scénario d'application
随机推荐
MySQL集群模式与应用场景
MySQL Cluster Mode and application scenario
基础贪心总结
开源项目推荐:3D点云处理软件ParaView,基于Qt和VTK
Upgrade MySQL 5.1 to 5.611
使用 Bitnami PostgreSQL Docker 镜像快速设置流复制集群
山寨版归并【上】
GFS distributed file system (Theory)
通過 PDO ODBC 將 PHP 連接到 MySQL
【AI周报】英伟达用AI设计芯片;不完美的Transformer要克服自注意力的理论缺陷
Pgpool II 4.3 Chinese Manual - introductory tutorial
导入地址表分析(根据库文件名求出:导入函数数量、函数序号、函数名称)
怎么看基金是不是reits,通过银行购买基金安全吗
gps北斗高精度卫星时间同步系统应用案例
Control structure (I)
Introduction to dynamic programming of leetcode learning plan day3 (198213740)
字符串最后一个单词的长度
ICE -- 源码分析
Upgrade MySQL 5.1 to 5.66
一刷313-剑指 Offer 06. 从尾到头打印链表(e)