当前位置:网站首页>Using idea to develop Spark Program
Using idea to develop Spark Program
2022-04-23 09:55:00 【Sword Walker】
Windows Environmental Science
Download address
link :https://pan.baidu.com/s/1YczOo5novINV_MimJ9Xpqg Extraction code :psvm
edition
name |
edition |
---|---|
Scala |
2.12.15 |
Spark |
3.1.3 |
Hadoop |
2.7.7 |
Scala
download
https://www.scala-lang.org/download/2.12.15.html
Spark
https://spark.apache.org/downloads.html
Download address
https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz
Set the environment variable
Path Add
key |
value |
---|---|
Path |
D:\Tools\bigdata\spark-3.1.3-bin-hadoop2.7\bin |
SPARK_LOCAL_DIRS |
D:\Tools\bigdata\spark-3.1.3-bin-hadoop2.7\temp |
Pictured
among
SPARK_LOCAL_DIRS
Is to set the storage location of temporary files , Like running a jar file , Will first put the files in this temporary directory , Delete after use .
function
spark-shell
Operation error reporting
java.io.IOException: Failed to delete
When we submit the packaged spark Prompt during the procedure, such as reporting error .
stay windows There are such problems in the environment , It has nothing to do with our program .
If you want to eliminate the error report , Can be in %SPARK_HOME%/conf
The files under the log4j.properties
( If not, you can copy log4j.properties.template
file )
Finally, add the following information :
log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.SparkEnv=ERROR
Hadoop
https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/
To configure HADOOP_HOME&Path
key |
value |
---|---|
HADOOP_HOME |
D:\Tools\bigdata\hadoop-2.7.7 |
Path |
D:\Tools\bigdata\hadoop-2.7.7\bin |
The configuration file
D:\Tools\bigdata\hadoop-2.7.7\etc\hadoop, modify hadoop Of 4 There are... Primary profiles
modify core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/tmp</value>
<description>namenode Local hadoop Temporary folder </description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>HDFS Of URI, file system ://namenode identification : Port number </description>
</property>
</configuration>
modify hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- This parameter is set to 1, Because it's a stand-alone version hadoop -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description> Number of copies , The default configuration is 3, Should be less than datanode Number of machines </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/data</value>
<description>datanode The physical storage location of the upper data block </description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/name</value>
<description>namenode On storage hdfs Namespace metadata </description>
</property>
</configuration>
modify mapred-site.xml( If it doesn't exist, first copy mapred-site.xml.template, Then change the file name to mapred-site.xml)
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
modify yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
winutils
https://gitee.com/nkuhyx/winutils
Find the corresponding version bin The file in is overwritten with hadoop Of bin Under the table of contents
D:\Tools\bigdata\hadoop-2.7.7\bin
Create project
Create project
Project name WordCount
In the project name WordCount
Right click , Click... In the pop-up menu Add Framework Support
stay java
Right click the directory , Select from the pop-up menu Refactor
, Then select... From the pop-up menu Rename
,
then , Put... In the interface that appears java
Change the directory name to scala
.
Add the class WordCount
stay IDEA In the development interface , open pom.xml, Empty the contents , Enter the following :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.psvmc</groupId>
<artifactId>WordCount</artifactId>
<version>1.0</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<spark.version>3.1.3</spark.version>
<scala.version>2.12</scala.version>
</properties>
<repositories>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
test
Create test file wordcount.txt
D:\spark_study\wordcount.txt
good good study
day day up
then , And on again WordCount.scala Code file , Empty the contents , Enter the following :
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit = {
val inputFile = "file:///D:\\spark_study\\wordcount.txt"
val conf = new SparkConf().setAppName("WordCount").setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.foreach(println)
}
}
When running, you can see that the result is
(up,1) (day,2) (good,2) (study,1)
Packaging operation
stay IDEA The right side of the development interface , Click on Maven
Icon , Will pop up Maven
Debugging interface
stay Maven Click... In the debugging interface package
, You can package the application , Pack it up JAR package .
At this time , To IDEA In the project directory tree on the left side of the development interface , stay “target” Under the table of contents , You can see that two JAR file ,
Namely :WordCount-1.0.jar
and WordCount-1.0-jar-with-dependencies.jar
.
then , Open one Linux terminal , Execute the following command to run JAR package :
spark-submit --class WordCount D:\Project\Spark\WordCount\target\WordCount-1.0-jar-with-dependencies.jar
版权声明
本文为[Sword Walker]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230952211795.html
边栏推荐
- Odoo server setup notes
- [hdu6833] a very easy math problem
- Custom login failure handling
- P1390 sum of common divisor (Mobius inversion)
- 第一章 Oracle Database In-Memory 相关概念(IM-1.1)
- Random neurons and random depth of dropout Technology
- Interviewer: let's talk about some commonly used PHP functions. Fortunately, I saw this article before the interview
- formatTime时间戳格式转换
- AI上推荐 之 MMOE(多任务yyds)
- 第三章 启用和调整IM列存储的大小(IM-3.1)
猜你喜欢
Nvidia最新三维重建技术Instant-ngp初探
0704、ansible----01
NEC红外遥控编码说明
杰理之更准确地确定异常地址【篇】
Kernel PWN learning (3) -- ret2user & kernel ROP & qwb2018 core
構建元宇宙時代敏捷制造的九種能力
C language: expression evaluation (integer promotion, arithmetic conversion...)
元宇宙时代的职业规划与执行
Easy to understand subset DP
Introduction to sap pi / PO login and basic functions
随机推荐
Compile and debug mysql8 with clion under MacOS x
自定义登录失败处理
论文阅读《Integrity Monitoring Techniques for Vision Navigation Systems》——3背景
JS node operation, why learn node operation
使用IDEA开发Spark程序
理解作用域
Interviewer: let's talk about some commonly used PHP functions. Fortunately, I saw this article before the interview
High paid programmer & interview question series 91 limit 20000 loading is very slow. How to solve it? How to locate slow SQL?
MacOS下使用CLion编译调试MySQL8.x
Code source daily question div1 (701-707)
ABAP CDs view with association example
formatTime时间戳格式转换
面试官:说几个PHP常用函数,幸好我面试之前看到了这篇文章
SAP pi / PO function operation status monitoring and inspection
C语言:表达式求值(整型提升、算术转换 ...)
ES-aggregation聚合分析
杰理之更准确地确定异常地址【篇】
论文阅读《Integrity Monitoring Techniques for Vision Navigation Systems》——5结果
Go语言实践模式 - 函数选项模式(Functional Options Pattern)
雨生百谷,万物生长