当前位置:网站首页>Distinct use of spark operator
Distinct use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
Believed to have been used mysql Yes sql In the sentence distinct Keywords are not unfamiliar , Use distinct Keyword can be used to de duplicate the queried data , stay Spark in , You can make a similar understanding ;
Function signature
def distinct()(implicit ord: Ordering[T] = null): RDD[T]def distinct( numPartitions: Int )(implicit ord: Ordering[T] = null): RDD[T]
Function description
Remove duplicate data from a dataset
Case study : De duplication of a set of numbers in a set
import org.apache.spark.{SparkConf, SparkContext}
object Distinct_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,3,5,2,2))
rdd.distinct().collect().foreach(println)
sc.stop()
}
}
Run the above program , Observe the console output , It can be found that duplicate elements are finally output only once
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587441.html
边栏推荐
- 多级缓存使用
- Named in pytoch_ parameters、named_ children、named_ Modules function
- JVM-第2章-类加载子系统(Class Loader Subsystem)
- 计算某字符出现次数
- cadence SPB17. 4 - Active Class and Subclass
- Interview questions of a blue team of Beijing Information Protection Network
- WPS brand was upgraded to focus on China. The other two domestic software were banned from going abroad with a low profile
- C language --- advanced pointer
- 字符串最后一个单词的长度
- mysql乐观锁解决并发冲突
猜你喜欢
CAP定理
Do we media make money now? After reading this article, you will understand
mysql乐观锁解决并发冲突
Large factory technology implementation | industry solution series tutorials
负载均衡器
实现缺省页面
多级缓存使用
Temporal model: long-term and short-term memory network (LSTM)
What if the server is poisoned? How does the server prevent virus intrusion?
Implement default page
随机推荐
Codejock Suite Pro v20.3.0
Large factory technology implementation | industry solution series tutorials
Interview questions of a blue team of Beijing Information Protection Network
Spark 算子之filter使用
Temporal model: long-term and short-term memory network (LSTM)
s16.基于镜像仓库一键安装containerd脚本
Accumulation of applet knowledge points
Mobile finance (for personal use)
Redis master-slave replication process
Fastjon2 here he is, the performance is significantly improved, and he can fight for another ten years
[split of recursive number] n points K, split of limited range
Implement default page
C language --- string + memory function
多级缓存使用
[open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
R语言中绘制ROC曲线方法二:pROC包
s16. One click installation of containerd script based on image warehouse
C language --- advanced pointer
R语言中实现作图对象排列的函数总结
c语言---字符串+内存函数