当前位置:网站首页>Filter usage of spark operator
Filter usage of spark operator
2022-04-23 15:48:00 【Uncle flying against the wind】
Preface
filter, It can be understood as filtering , Intuitive, , Is to filter a group of data according to the specified rules ,filter This operator is in Java Or in other languages , It can easily help us filter the desired data from a set of data ;
Function signature
def filter(f: T => Boolean ): RDD[T]
Function description
Filter the data according to the specified rules , Consistent data retention , Data that does not conform to the rules is discarded . When the data is filtered , The partition does not change , But the data in the partition may be uneven , In the production environment , There may be Data skew ;
Case a , Filter out even numbers from a set of data
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Filter_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd = sc.makeRDD(List(1,2,3,4,5,6))
val result = rdd.filter(
item => item % 2 ==0
)
result.collect().foreach(println)
sc.stop()
}
}
Run this code , Observe the console output
Case 2 , Filter out from log file 2015 year 5 month 17 The data of
The contents of the log file are as follows :
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Filter_Test {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("Operator")
val sc = new SparkContext(sparkConf)
val rdd: RDD[String] = sc.textFile("E:\\code-self\\spi\\datas\\apache.log")
rdd.filter(
line =>{
val datas = line.split(" ")
val time = datas(3)
time.contains("17/05/2015")
}
).collect().foreach(println)
sc.stop()
}
}
Run the above code , Observe the console output ,
版权声明
本文为[Uncle flying against the wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231544587482.html
边栏推荐
- Why is IP direct connection prohibited in large-scale Internet
- pywintypes. com_ Error: (- 2147221020, 'invalid syntax', none, none)
- [self entertainment] construction notes week 2
- c语言---指针进阶
- 时序模型:长短期记忆网络(LSTM)
- MySQL集群模式與應用場景
- Spark 算子之filter使用
- One brush 314 sword finger offer 09 Implement queue (E) with two stacks
- JS regular determines whether the port path of the domain name or IP is correct
- 为啥禁用外键约束
猜你喜欢
随机推荐
腾讯Offer已拿,这99道算法高频面试题别漏了,80%都败在算法上
Redis主从复制过程
One brush 313 sword finger offer 06 Print linked list from end to end (E)
Spark 算子之groupBy使用
Import address table analysis (calculated according to the library file name: number of imported functions, function serial number and function name)
时序模型:长短期记忆网络(LSTM)
How do you think the fund is REITs? Is it safe to buy the fund through the bank
Basic concepts of website construction and management
【递归之数的拆分】n分k,限定范围的拆分
PHP function
删除字符串中出现次数最少的字符
Configuration of multi spanning tree MSTP
Upgrade MySQL 5.1 to 5.67
Mumu, go all the way
Named in pytoch_ parameters、named_ children、named_ Modules function
Go language, condition, loop, function
Timing model: gated cyclic unit network (Gru)
Cap theorem
【AI周报】英伟达用AI设计芯片;不完美的Transformer要克服自注意力的理论缺陷
Config learning notes component