当前位置:网站首页>Sortby use of spark operator
Sortby use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
sortBy, Sort as the name suggests , stay Spark in , Use sortBy You can sort a set of data to be processed , This set of data is not limited to numbers , It can also be tuples and other types ;
sortBy
Function signature
def sortBy[K](f: (T) => K , ascending: Boolean = true , numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
Function description
This operation is used to sort the data . Before sorting , Data can be passed through f Function to process , And then according to f Function processingTo sort the results of , The default is ascending . New after sorting RDD The number of partitions is the same as the original RDD The number of partitions is oneCause . There is... In the middle shuffle The process of ;
Case presentation
Next, sort the data in a set , Save to local file directory
import org.apache.spark.{SparkConf, SparkContext}
object SortBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,6,7,9), 2)
rdd.sortBy(num => num)
rdd.saveAsTextFile("E:\\output")
sc.stop()
}
}
Run the above code , You can see that two files are generated in the local directory
Open separately 2 File , You can find , The data is sorted in two different files
Put... In the set tuple Data according to key Sort the output
import org.apache.spark.{SparkConf, SparkContext}
object SortBy_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
var rddStr = sc.makeRDD(List(
("a",3),("d",2),("e",7)
),2)
rddStr.sortBy(t => t._1)
rddStr.collect().foreach(println)
sc.stop()
}
}
Run the above code , Observe the console output
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587328.html
边栏推荐
- PS为图片添加纹理
- Why is IP direct connection prohibited in large-scale Internet
- leetcode-396 旋转函数
- Timing model: gated cyclic unit network (Gru)
- Pytorch中named_parameters、named_children、named_modules函数
- 实现缺省页面
- 为啥禁用外键约束
- C, calculation method and source program of bell number
- Metalife established a strategic partnership with ESTV and appointed its CEO Eric Yoon as a consultant
- Accumulation of applet knowledge points
猜你喜欢
Demonstration meeting on startup and implementation scheme of swarm intelligence autonomous operation smart farm project
ICE -- 源码分析
Basic concepts of website construction and management
Cap theorem
API IX JWT auth plug-in has an error. Risk announcement of information disclosure in response (cve-2022-29266)
MySQL optimistic lock to solve concurrency conflict
幂等性的处理
Merging of Shanzhai version [i]
多线程原理和常用方法以及Thread和Runnable的区别
现在做自媒体能赚钱吗?看完这篇文章你就明白了
随机推荐
山寨版归并【上】
C language --- advanced pointer
Vision of building interstellar computing network
Pgpool II 4.3 Chinese Manual - introductory tutorial
JS regular determines whether the port path of the domain name or IP is correct
时序模型:长短期记忆网络(LSTM)
R语言中绘制ROC曲线方法二:pROC包
新动态:SmartMesh和MeshBox的合作新动向
Fastjon2他来了,性能显著提升,还能再战十年
MySQL optimistic lock to solve concurrency conflict
大厂技术实现 | 行业解决方案系列教程
Independent operation smart farm Innovation Forum
Spark 算子之交集、并集、差集
MySQL Cluster Mode and application scenario
Upgrade MySQL 5.1 to 5.67
多级缓存使用
pywintypes.com_error: (-2147221020, ‘无效的语法‘, None, None)
dlopen/dlsym/dlclose的简单用法
MetaLife与ESTV建立战略合作伙伴关系并任命其首席执行官Eric Yoon为顾问
Named in pytoch_ parameters、named_ children、named_ Modules function