当前位置:网站首页>Sortby use of spark operator
Sortby use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
sortBy, Sort as the name suggests , stay Spark in , Use sortBy You can sort a set of data to be processed , This set of data is not limited to numbers , It can also be tuples and other types ;
sortBy
Function signature
def sortBy[K](f: (T) => K , ascending: Boolean = true , numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
Function description
This operation is used to sort the data . Before sorting , Data can be passed through f Function to process , And then according to f Function processingTo sort the results of , The default is ascending . New after sorting RDD The number of partitions is the same as the original RDD The number of partitions is oneCause . There is... In the middle shuffle The process of ;
Case presentation
Next, sort the data in a set , Save to local file directory
import org.apache.spark.{SparkConf, SparkContext}
object SortBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,6,7,9), 2)
rdd.sortBy(num => num)
rdd.saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , You can see that two files are generated in the local directory
Open separately 2 File , You can find , The data is sorted in two different files
Put... In the set tuple Data according to key Sort the output
import org.apache.spark.{SparkConf, SparkContext}
object SortBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
var rddStr = sc.makeRDD(List(
("a",3),("d",2),("e",7)
),2)
rddStr.sortBy(t => t._1)
rddStr.collect().foreach(println)
sc.stop()
}
}
Run the above code , Observe the console output
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587328.html
边栏推荐
猜你喜欢
随机推荐
Control structure (I)
s16. One click installation of containerd script based on image warehouse
Spark 算子之sortBy使用
Open source project recommendation: 3D point cloud processing software paraview, based on QT and VTK
Extract non duplicate integers
开源项目推荐:3D点云处理软件ParaView,基于Qt和VTK
负载均衡器
Upgrade MySQL 5.1 to 5.68
大厂技术实现 | 行业解决方案系列教程
Multitimer V2 reconstruction version | an infinitely scalable software timer
PHP function
shell_2
String sorting
[AI weekly] NVIDIA designs chips with AI; The imperfect transformer needs to overcome the theoretical defect of self attention
Spark 算子之partitionBy
【自娱自乐】构造笔记 week 2
CVPR 2022 quality paper sharing
小程序知识点积累
Go语言数组,指针,结构体
基于 TiDB 的 Apache APISIX 高可用配置中心的最佳实践