当前位置:网站首页>Distinct use of spark operator
Distinct use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
Believed to have been used mysql Yes sql In the sentence distinct Keywords are not unfamiliar , Use distinct Keyword can be used to de duplicate the queried data , stay Spark in , You can make a similar understanding ;
Function signature
def distinct()(implicit ord: Ordering[T] = null): RDD[T]def distinct( numPartitions: Int )(implicit ord: Ordering[T] = null): RDD[T]
Function description
Remove duplicate data from a dataset
Case study : De duplication of a set of numbers in a set
import org.apache.spark.{SparkConf, SparkContext}
object Distinct_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,3,5,2,2))
rdd.distinct().collect().foreach(println)
sc.stop()
}
}
Run the above program , Observe the console output , It can be found that duplicate elements are finally output only once

版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587441.html
边栏推荐
- Temporal model: long-term and short-term memory network (LSTM)
- CVPR 2022 quality paper sharing
- How can poor areas without networks have money to build networks?
- Best practices of Apache APIs IX high availability configuration center based on tidb
- VIM specifies the line comment and reconciliation comment
- 新动态:SmartMesh和MeshBox的合作新动向
- [open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
- [split of recursive number] n points K, split of limited range
- 北京某信护网蓝队面试题目
- C, calculation method and source program of bell number
猜你喜欢

Multi level cache usage

大厂技术实现 | 行业解决方案系列教程

What if the server is poisoned? How does the server prevent virus intrusion?

C language --- string + memory function

Cap theorem

多级缓存使用

pgpool-II 4.3 中文手册 - 入门教程

One brush 314 sword finger offer 09 Implement queue (E) with two stacks

时序模型:长短期记忆网络(LSTM)

Implement default page
随机推荐
PHP function
[open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
一刷313-剑指 Offer 06. 从尾到头打印链表(e)
s16. One click installation of containerd script based on image warehouse
大型互联网为什么禁止ip直连
字符串排序
Go语言切片,范围,集合
Spark 算子之distinct使用
ICE -- 源码分析
Upgrade MySQL 5.1 to 5.611
Interview questions of a blue team of Beijing Information Protection Network
Upgrade MySQL 5.1 to 5.67
Fastjon2 here he is, the performance is significantly improved, and he can fight for another ten years
One brush 314 sword finger offer 09 Implement queue (E) with two stacks
Spark 算子之groupBy使用
c语言---指针进阶
Cookie&Session
How do you think the fund is REITs? Is it safe to buy the fund through the bank
[section 5 if and for]
Calculate the number of occurrences of a character