当前位置:网站首页>Partitionby of spark operator
Partitionby of spark operator
2022-04-23 15:45:00 【Uncle flying against the wind】
Preface
In previous studies , We use groupBy The data can be processed according to the specified key Grouping rules , Imagine a scenario like this , If you want to tuple Data of type , namely key/value What should I do to group data of different types ? In response to this Spark Provides partitionBy Operator solution ;
partitionBy
Function signature
def partitionBy( partitioner: Partitioner ): RDD[(K, V)]
Function description
Set the data as specified Partitioner Repartitioning . Spark The default comparator is HashPartitioner
The case shows
Pass a set of data through partitionBy Then it is stored in multiple partition files
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PartionBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - (Key - Value type )
val rdd = sc.makeRDD(List(1, 2, 3, 4), 2)
val mapRDD: RDD[(Int, Int)] = rdd.map((_, 1))
// partitionBy Repartition the data according to the specified partition rules
val newRDD = mapRDD.partitionBy(new HashPartitioner(2)).saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , After execution , Observe the local directory , You can see 4 Pieces of data cannot be divided into different partition files
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587236.html
边栏推荐
- Configuration of multi spanning tree MSTP
- s16. One click installation of containerd script based on image warehouse
- php类与对象
- [leetcode daily question] install fence
- Temporal model: long-term and short-term memory network (LSTM)
- ICE -- 源码分析
- Open source project recommendation: 3D point cloud processing software paraview, based on QT and VTK
- Extract non duplicate integers
- Upgrade MySQL 5.1 to 5.68
- [AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention
猜你喜欢
随机推荐
Go concurrency and channel
Special analysis of China's digital technology in 2022
One brush 313 sword finger offer 06 Print linked list from end to end (E)
Date date calculation in shell script
shell脚本中的DATE日期计算
MySQL Cluster Mode and application scenario
What if the server is poisoned? How does the server prevent virus intrusion?
s16.基于镜像仓库一键安装containerd脚本
导入地址表分析(根据库文件名求出:导入函数数量、函数序号、函数名称)
Pytorch中named_parameters、named_children、named_modules函数
通过 PDO ODBC 将 PHP 连接到 MSSQL
PHP 的运算符
gps北斗高精度卫星时间同步系统应用案例
Go语言切片,范围,集合
一刷314-剑指 Offer 09. 用两个栈实现队列(e)
提取不重复的整数
字符串最后一个单词的长度
Configuration of multi spanning tree MSTP
Timing model: gated cyclic unit network (Gru)
CVPR 2022 优质论文分享