当前位置:网站首页>Intersection, union and difference sets of spark operators
Intersection, union and difference sets of spark operators
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
In daily development , It often involves the intersection of different sets of data , Operation of Union and difference sets , stay Spark in , Similar operators are also provided to help us deal with such business , namely double Value type Data processing ;
intersection
Function signature
def intersection(other: RDD[T]): RDD[T]
Function description
To the source RDD And parameters RDD Returns a new RDD
Case a , Find the intersection of two sets
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object DoubleValueTest {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - double Value type
// intersection , Union and difference sets require the data types of the two data sources to be consistent
// Zipper operation the types of two data sources can be different
val rdd1 = sc.makeRDD(List(1,2,3,4))
val rdd2 = sc.makeRDD(List(3,4,5,6))
val rdd7 = sc.makeRDD(List("3","4","5","6"))
// intersection : 【3,4】
val rdd3: RDD[Int] = rdd1.intersection(rdd2)
//val rdd8 = rdd1.intersection(rdd7)
println(rdd3.collect().mkString(","))
sc.stop()
}
}
Run the above code , Observe the console output effect
union
Function signature
def union(other: RDD[T]): RDD[T]
Function description
To the source RDD And parameters RDD Returns a new... After union RDD
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object DoubleValueTest {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - double Value type
// intersection , Union and difference sets require the data types of the two data sources to be consistent
// Zipper operation the types of two data sources can be different
val rdd1 = sc.makeRDD(List(1,2,3,4))
val rdd2 = sc.makeRDD(List(3,4,5,6))
val rdd7 = sc.makeRDD(List("3","4","5","6"))
// intersection : 【3,4】
/*val rdd3: RDD[Int] = rdd1.intersection(rdd2)
//val rdd8 = rdd1.intersection(rdd7)
println(rdd3.collect().mkString(","))*/
// Combine : 【1,2,3,4,3,4,5,6】
val rdd4: RDD[Int] = rdd1.union(rdd2)
println(rdd4.collect().mkString(","))
sc.stop()
}
}
subtract
Function signature
def subtract(other: RDD[T]): RDD[T]
Function description
With a RDD The main elements are , Remove two RDD Repeat elements in , Keep the other elements . Difference set
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object DoubleValueTest {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - double Value type
// intersection , Union and difference sets require the data types of the two data sources to be consistent
// Zipper operation the types of two data sources can be different
val rdd1 = sc.makeRDD(List(1,2,3,4))
val rdd2 = sc.makeRDD(List(3,4,5,6))
val rdd7 = sc.makeRDD(List("3","4","5","6"))
// intersection : 【3,4】
/*val rdd3: RDD[Int] = rdd1.intersection(rdd2)
//val rdd8 = rdd1.intersection(rdd7)
println(rdd3.collect().mkString(","))*/
// Combine : 【1,2,3,4,3,4,5,6】
/*val rdd4: RDD[Int] = rdd1.union(rdd2)
println(rdd4.collect().mkString(","))*/
// Difference set : 【1,2】
val rdd5: RDD[Int] = rdd1.subtract(rdd2)
println(rdd5.collect().mkString(","))
sc.stop()
}
}
zip
zip Also known as zipper operator , Function signature
def zip[U: ClassTag](other: RDD[U]): RDD[(T, U)]
Function description
Put two RDD The elements in , Merge in the form of key value pairs . among , Key value alignment Key For the first time 1 individual RDDThe elements in , Value For the first time 2 individual RDD Elements in the same position in
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object DoubleValueTest {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
// TODO operator - double Value type
// intersection , Union and difference sets require the data types of the two data sources to be consistent
// Zipper operation the types of two data sources can be different
val rdd1 = sc.makeRDD(List(1,2,3,4))
val rdd2 = sc.makeRDD(List(3,4,5,6))
val rdd7 = sc.makeRDD(List("3","4","5","6"))
// intersection : 【3,4】
/*val rdd3: RDD[Int] = rdd1.intersection(rdd2)
//val rdd8 = rdd1.intersection(rdd7)
println(rdd3.collect().mkString(","))*/
// Combine : 【1,2,3,4,3,4,5,6】
/*val rdd4: RDD[Int] = rdd1.union(rdd2)
println(rdd4.collect().mkString(","))*/
// Difference set : 【1,2】
/*val rdd5: RDD[Int] = rdd1.subtract(rdd2)
println(rdd5.collect().mkString(","))*/
// zipper : 【1-3,2-4,3-5,4-6】
val rdd6: RDD[(Int, Int)] = rdd1.zip(rdd2)
val rdd8 = rdd1.zip(rdd7)
println(rdd6.collect().mkString(","))
sc.stop()
}
}
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587277.html
边栏推荐
- 计算某字符出现次数
- 网站压测工具Apache-ab,webbench,Apache-Jemeter
- [open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
- shell_2
- Configuration of multi spanning tree MSTP
- Config组件学习笔记
- Metalife established a strategic partnership with ESTV and appointed its CEO Eric Yoon as a consultant
- R语言中绘制ROC曲线方法二:pROC包
- CVPR 2022 quality paper sharing
- 新动态:SmartMesh和MeshBox的合作新动向
猜你喜欢
What if the server is poisoned? How does the server prevent virus intrusion?
Config组件学习笔记
为啥禁用外键约束
MySQL optimistic lock to solve concurrency conflict
pgpool-II 4.3 中文手册 - 入门教程
Application of Bloom filter in 100 million flow e-commerce system
Spark 算子之coalesce与repartition
[open source tool sharing] MCU debugging assistant (oscillograph / modification / log) - linkscope
Codejock Suite Pro v20. three
一文读懂串口及各种电平信号含义
随机推荐
Application case of GPS Beidou high precision satellite time synchronization system
Accumulation of applet knowledge points
New developments: new trends in cooperation between smartmesh and meshbox
Large factory technology implementation | industry solution series tutorials
gps北斗高精度卫星时间同步系统应用案例
The principle and common methods of multithreading and the difference between thread and runnable
网站压测工具Apache-ab,webbench,Apache-Jemeter
Basic greedy summary
Fastjon2 here he is, the performance is significantly improved, and he can fight for another ten years
fatal error: torch/extension. h: No such file or directory
Why is IP direct connection prohibited in large-scale Internet
WPS brand was upgraded to focus on China. The other two domestic software were banned from going abroad with a low profile
c语言---字符串+内存函数
时序模型:长短期记忆网络(LSTM)
c语言---指针进阶
MySQL Cluster Mode and application scenario
pgpool-II 4.3 中文手册 - 入门教程
Codejock Suite Pro v20.3.0
实现缺省页面
Upgrade MySQL 5.1 to 5.611