当前位置:网站首页>Distinct use of spark operator
Distinct use of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
Believed to have been used mysql Yes sql In the sentence distinct Keywords are not unfamiliar , Use distinct Keyword can be used to de duplicate the queried data , stay Spark in , You can make a similar understanding ;
Function signature
def distinct()(implicit ord: Ordering[T] = null): RDD[T]def distinct( numPartitions: Int )(implicit ord: Ordering[T] = null): RDD[T]
Function description
Remove duplicate data from a dataset
Case study : De duplication of a set of numbers in a set
import org.apache.spark.{SparkConf, SparkContext}
object Distinct_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,3,5,2,2))
rdd.distinct().collect().foreach(println)
sc.stop()
}
}
Run the above program , Observe the console output , It can be found that duplicate elements are finally output only once
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587441.html
边栏推荐
- Interview questions of a blue team of Beijing Information Protection Network
- For examination
- 山寨版归并【上】
- MySQL optimistic lock to solve concurrency conflict
- Modèle de Cluster MySQL et scénario d'application
- 字符串最后一个单词的长度
- What if the server is poisoned? How does the server prevent virus intrusion?
- IronPDF for . NET 2022.4.5455
- Config组件学习笔记
- Fastjon2 here he is, the performance is significantly improved, and he can fight for another ten years
猜你喜欢
c语言---指针进阶
时序模型:门控循环单元网络(GRU)
使用 Bitnami PostgreSQL Docker 镜像快速设置流复制集群
Configuration of multi spanning tree MSTP
Neodynamic Barcode Professional for WPF V11.0
Import address table analysis (calculated according to the library file name: number of imported functions, function serial number and function name)
基于 TiDB 的 Apache APISIX 高可用配置中心的最佳实践
Merging of Shanzhai version [i]
CVPR 2022 quality paper sharing
时序模型:长短期记忆网络(LSTM)
随机推荐
Upgrade MySQL 5.1 to 5.611
MySQL集群模式与应用场景
leetcode-396 旋转函数
Interview questions of a blue team of Beijing Information Protection Network
Control structure (I)
【第5节 if和for】
s16.基于镜像仓库一键安装containerd脚本
Partitionby of spark operator
Spark 算子之partitionBy
Codejock Suite Pro v20.3.0
utils. Deprecated in35 may be cancelled due to upgrade. What should I do
CAP定理
Upgrade MySQL 5.1 to 5.69
WPS品牌再升级专注国内,另两款国产软件低调出国门,却遭禁令
Go language slice, range, set
实现缺省页面
[open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
Import address table analysis (calculated according to the library file name: number of imported functions, function serial number and function name)
网站压测工具Apache-ab,webbench,Apache-Jemeter
【自娱自乐】构造笔记 week 2