当前位置:网站首页>Using idea to develop Spark Program
Using idea to develop Spark Program
2022-04-23 09:55:00 【Sword Walker】
Windows Environmental Science
Download address
link :https://pan.baidu.com/s/1YczOo5novINV_MimJ9Xpqg Extraction code :psvm
edition
name |
edition |
---|---|
Scala |
2.12.15 |
Spark |
3.1.3 |
Hadoop |
2.7.7 |
Scala
download
https://www.scala-lang.org/download/2.12.15.html
Spark
https://spark.apache.org/downloads.html
Download address
https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz
Set the environment variable
Path Add
key |
value |
---|---|
Path |
D:\Tools\bigdata\spark-3.1.3-bin-hadoop2.7\bin |
SPARK_LOCAL_DIRS |
D:\Tools\bigdata\spark-3.1.3-bin-hadoop2.7\temp |
Pictured
among
SPARK_LOCAL_DIRS
Is to set the storage location of temporary files , Like running a jar file , Will first put the files in this temporary directory , Delete after use .
function
spark-shell
Operation error reporting
java.io.IOException: Failed to delete
When we submit the packaged spark Prompt during the procedure, such as reporting error .
stay windows There are such problems in the environment , It has nothing to do with our program .
If you want to eliminate the error report , Can be in %SPARK_HOME%/conf
The files under the log4j.properties
( If not, you can copy log4j.properties.template
file )
Finally, add the following information :
log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.SparkEnv=ERROR
Hadoop
https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/
To configure HADOOP_HOME&Path
key |
value |
---|---|
HADOOP_HOME |
D:\Tools\bigdata\hadoop-2.7.7 |
Path |
D:\Tools\bigdata\hadoop-2.7.7\bin |
The configuration file
D:\Tools\bigdata\hadoop-2.7.7\etc\hadoop, modify hadoop Of 4 There are... Primary profiles
modify core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/tmp</value>
<description>namenode Local hadoop Temporary folder </description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>HDFS Of URI, file system ://namenode identification : Port number </description>
</property>
</configuration>
modify hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- This parameter is set to 1, Because it's a stand-alone version hadoop -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description> Number of copies , The default configuration is 3, Should be less than datanode Number of machines </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/data</value>
<description>datanode The physical storage location of the upper data block </description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/name</value>
<description>namenode On storage hdfs Namespace metadata </description>
</property>
</configuration>
modify mapred-site.xml( If it doesn't exist, first copy mapred-site.xml.template, Then change the file name to mapred-site.xml)
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
modify yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
winutils
https://gitee.com/nkuhyx/winutils
Find the corresponding version bin The file in is overwritten with hadoop Of bin Under the table of contents
D:\Tools\bigdata\hadoop-2.7.7\bin
Create project
Create project
Project name WordCount
In the project name WordCount
Right click , Click... In the pop-up menu Add Framework Support
stay java
Right click the directory , Select from the pop-up menu Refactor
, Then select... From the pop-up menu Rename
,
then , Put... In the interface that appears java
Change the directory name to scala
.
Add the class WordCount
stay IDEA In the development interface , open pom.xml, Empty the contents , Enter the following :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.psvmc</groupId>
<artifactId>WordCount</artifactId>
<version>1.0</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<spark.version>3.1.3</spark.version>
<scala.version>2.12</scala.version>
</properties>
<repositories>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
test
Create test file wordcount.txt
D:\spark_study\wordcount.txt
good good study
day day up
then , And on again WordCount.scala Code file , Empty the contents , Enter the following :
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit = {
val inputFile = "file:///D:\\spark_study\\wordcount.txt"
val conf = new SparkConf().setAppName("WordCount").setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.foreach(println)
}
}
When running, you can see that the result is
(up,1) (day,2) (good,2) (study,1)
Packaging operation
stay IDEA The right side of the development interface , Click on Maven
Icon , Will pop up Maven
Debugging interface
stay Maven Click... In the debugging interface package
, You can package the application , Pack it up JAR package .
At this time , To IDEA In the project directory tree on the left side of the development interface , stay “target” Under the table of contents , You can see that two JAR file ,
Namely :WordCount-1.0.jar
and WordCount-1.0-jar-with-dependencies.jar
.
then , Open one Linux terminal , Execute the following command to run JAR package :
spark-submit --class WordCount D:\Project\Spark\WordCount\target\WordCount-1.0-jar-with-dependencies.jar
版权声明
本文为[Sword Walker]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230952211795.html
边栏推荐
- NEC infrared remote control coding description
- [educational codeforces round 80] problem solving Report
- Amazon cloud technology entry Resource Center, easy access to the cloud from 0 to 1
- Comparison of overloading, rewriting and hiding
- A concise course of fast Fourier transform FFT
- Chinese Remainder Theorem and extended Chinese remainder theorem that can be understood by Aunt Baojie
- How to obtain geographical location based on photos and how to prevent photos from leaking geographical location
- 第一章 Oracle Database In-Memory 相关概念(IM-1.1)
- LeetCode 1249. Minimum Remove to Make Valid Parentheses - FB高频题1
- Leetcode0587. Install fence
猜你喜欢
Random neurons and random depth of dropout Technology
实践六 Windows操作系统安全攻防
论文阅读《Integrity Monitoring Techniques for Vision Navigation Systems》——5结果
SAP CR transmission request sequence and dependency check
中控学习型红外遥控模块支持网络和串口控制
《谷雨系列》空投
High paid programmer & interview question series 91 limit 20000 loading is very slow. How to solve it? How to locate slow SQL?
Cloud identity is too loose, opening the door for attackers
面试官:说几个PHP常用函数,幸好我面试之前看到了这篇文章
Personal homepage software fenrus
随机推荐
SAP ECC connecting SAP pi system configuration
Redis 内存占满导致的 Setnx 命令执行失败
[educational codeforces round 80] problem solving Report
DBA常用SQL语句(4)- Top SQL
golang力扣leetcode 396.旋转函数
The sap export excel file opens and shows that the file format and extension of "XXX" do not match. The file may be damaged or unsafe. Do not open it unless you trust its source. Do you still want to
通过流式数据集成实现数据价值(2)
第三章 启用和调整IM列存储的大小(IM-3.1)
P1390 sum of common divisor (Mobius inversion)
Rain produces hundreds of valleys, and all things grow
2022年上海市安全员C证考试题库及答案
failureForwardUrl与failureUrl
杰理之栈溢出 stackoverflow 怎么办?【篇】
[codeforces - 208e] blood cousins
The central control learning infrared remote control module supports network and serial port control
通过流式数据集成实现数据价值(4)-流数据管道
第一章 Oracle Database In-Memory 相关概念(IM-1.1)
杰理之有时候发现内存被篡改,但是没有造成异常,应该如何查找?【篇】
SAP pi / PO soap2proxy consumption external WS example
使用IDEA开发Spark程序