当前位置:网站首页>Spark case - wordcount
Spark case - wordcount
2022-04-23 04:41:00 【Z-hhhhh】
local Local mode
add to pom rely on
<properties>
<scala.version>2.12.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
Prepare a file by yourself word.txt
// establish spark Run configuration object
val sparkConf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
// establish spark Context object ( Connection object )
val sc: SparkContext = new SparkContext(sparkConf)
// Read the file
val file: RDD[String] = sc.textFile("src/main/inputfile/word.txt")
// If the document is in hdfs On
// val file: RDD[String] = sc.textFile("hdfs://ip Address :9820/study/sparktest/word.txt")
// Calculation
file.flatMap(_.split(" "))
.map((_,1))
.reduceByKey(_+_)
.collect()
.foreach(println)
sc.stop()
If you want to save the results
sc.textFile("src/main/inputfile/word.txt")
.flatMap(_.split(" "))
.map((_, 1))
.reduceByKey(_ + _)
.repartition(1)
.saveAsTextFile(" Address ")
版权声明
本文为[Z-hhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220559122487.html
边栏推荐
- Nature medicine reveals individual risk factors of coronary artery disease
- MYSQL去重方法汇总
- 顺序表的基本操作
- Programmers complain: I really can't live with a salary of 12000. Netizen: how can I say 3000
- 阿里十年技术专家联合打造“最新”Jetpack Compose项目实战演练(附Demo)
- Go反射法则
- 383. 赎金信
- IEEE Transactions on Industrial Informatics(TII)投稿须知
- [pytoch foundation] torch Split() usage
- 兼容NSR20F30NXT5G的小体积肖特基二极管
猜你喜欢

Chlamydia infection -- causes, symptoms, treatment and Prevention

What is a data island? Why is there still a data island in 2022?

第四章 --- 了解标准设备文件、过滤器和管道

Recommended scheme of national manufactured electronic components

MYSQL查询至少连续n天登录的用户

Installation of zynq platform cross compiler

QML advanced (V) - realize all kinds of cool special effects through particle simulation system

AWS eks add cluster user or Iam role

Summary of MySQL de duplication methods

Bacterial infection and antibiotic use
随机推荐
The 14th issue of HMS core discovery reviews the long article | enjoy the silky clip and release the creativity of the video
Supplement: Annotation
test
mysql table 中增加列的SQL语句
Programmers complain: I really can't live with a salary of 12000. Netizen: how can I say 3000
zynq平台交叉编译器的安装
SQL statement for adding columns in MySQL table
KVM error: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock‘
Open the past and let's start over.
leetcode003--判断一个整数是否为回文数
C language: Advanced pointer
Record your own dataset with d435i, run orbslam2 and build a dense point cloud
leetcode008--实现strStr()函数
KVM error: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock‘
Go 语言中的 logger 和 zap 日志库
2020 is coming to an end, special and unforgettable.
三十六计是什么
无线键盘全国产化电子元件推荐方案
IDE Idea 自动编译 与 On Upate Action 、 On Frame Deactivation 的配置
Chapter 4 - understanding standard equipment documents, filters and pipelines