当前位置:网站首页>Distinct use of spark operator
Distinct use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
Believed to have been used mysql Yes sql In the sentence distinct Keywords are not unfamiliar , Use distinct Keyword can be used to de duplicate the queried data , stay Spark in , You can make a similar understanding ;
Function signature
def distinct()(implicit ord: Ordering[T] = null): RDD[T]def distinct( numPartitions: Int )(implicit ord: Ordering[T] = null): RDD[T]
Function description
Remove duplicate data from a dataset
Case study : De duplication of a set of numbers in a set
import org.apache.spark.{SparkConf, SparkContext}
object Distinct_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,3,5,2,2))
rdd.distinct().collect().foreach(println)
sc.stop()
}
}
Run the above program , Observe the console output , It can be found that duplicate elements are finally output only once

版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587441.html
边栏推荐
猜你喜欢

One brush 314 sword finger offer 09 Implement queue (E) with two stacks

Codejock Suite Pro v20. three

R语言中绘制ROC曲线方法二:pROC包

C language --- string + memory function

New developments: new trends in cooperation between smartmesh and meshbox

JVM - Chapter 2 - class loader subsystem

导入地址表分析(根据库文件名求出:导入函数数量、函数序号、函数名称)

MySQL Cluster Mode and application scenario

cadence SPB17. 4 - Active Class and Subclass

CVPR 2022 优质论文分享
随机推荐
字符串排序
R语言中实现作图对象排列的函数总结
PHP function
新动态:SmartMesh和MeshBox的合作新动向
Mumu, go all the way
New developments: new trends in cooperation between smartmesh and meshbox
Spark 算子之coalesce与repartition
mysql乐观锁解决并发冲突
pgpool-II 4.3 中文手册 - 入门教程
Treatment of idempotency
s16. One click installation of containerd script based on image warehouse
北京某信护网蓝队面试题目
Spark 算子之sortBy使用
Configuration of multi spanning tree MSTP
PS为图片添加纹理
携号转网最大赢家是中国电信,为何人们嫌弃中国移动和中国联通?
Pytorch中named_parameters、named_children、named_modules函数
Spark 算子之groupBy使用
大型互联网为什么禁止ip直连
Redis master-slave replication process