当前位置：网站首页>Using idea to develop Spark Program

SPARK_LOCAL_DIRS Is to set the storage location of temporary files , Like running a jar file , Will first put the files in this temporary directory , Delete after use .

function

spark-shell

Operation error reporting

java.io.IOException: Failed to delete

When we submit the packaged spark Prompt during the procedure, such as reporting error .

stay windows There are such problems in the environment , It has nothing to do with our program .

If you want to eliminate the error report , Can be in %SPARK_HOME%/conf The files under the log4j.properties( If not, you can copy log4j.properties.template file )

Finally, add the following information ：

log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.SparkEnv=ERROR

Hadoop

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/

To configure HADOOP_HOME&Path

key	value
HADOOP_HOME	D:\Tools\bigdata\hadoop-2.7.7
Path	D:\Tools\bigdata\hadoop-2.7.7\bin

The configuration file

D:\Tools\bigdata\hadoop-2.7.7\etc\hadoop, modify hadoop Of 4 There are... Primary profiles

modify core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/tmp</value>
    <description>namenode Local hadoop Temporary folder </description>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
    <description>HDFS Of URI, file system ://namenode identification : Port number </description>
  </property>
</configuration>

modify hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <!--  This parameter is set to 1, Because it's a stand-alone version hadoop -->
  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description> Number of copies , The default configuration is 3, Should be less than datanode Number of machines </description>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/data</value>
    <description>datanode The physical storage location of the upper data block </description>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>/D:/Tools/bigdata/hadoop-2.7.7/workspace/name</value>
    <description>namenode On storage hdfs Namespace metadata </description>
  </property>
</configuration>

modify mapred-site.xml（ If it doesn't exist, first copy mapred-site.xml.template, Then change the file name to mapred-site.xml）

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>hdfs://localhost:9001</value>
  </property>
</configuration>

modify yarn-site.xml

<?xml version="1.0"?>

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

winutils

https://gitee.com/nkuhyx/winutils

Find the corresponding version bin The file in is overwritten with hadoop Of bin Under the table of contents

D:\Tools\bigdata\hadoop-2.7.7\bin

Create project

Project name WordCount

In the project name WordCount Right click , Click... In the pop-up menu Add Framework Support

stay java Right click the directory , Select from the pop-up menu Refactor, Then select... From the pop-up menu Rename,

then , Put... In the interface that appears java Change the directory name to scala.

Add the class WordCount

stay IDEA In the development interface , open pom.xml, Empty the contents , Enter the following ：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cn.psvmc</groupId>
    <artifactId>WordCount</artifactId>
    <version>1.0</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <spark.version>3.1.3</spark.version>
        <scala.version>2.12</scala.version>
    </properties>

    <repositories>
        <repository>
            <id>alimaven</id>
            <name>aliyun maven</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.4.6</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

test

Create test file wordcount.txt

D:\spark_study\wordcount.txt

good good study
day day up

then , And on again WordCount.scala Code file , Empty the contents , Enter the following ：

import org.apache.spark.{SparkConf, SparkContext}

object WordCount {
  def main(args: Array[String]): Unit = {
    val inputFile = "file:///D:\\spark_study\\wordcount.txt"
    val conf = new SparkConf().setAppName("WordCount").setMaster("local")
    val sc = new SparkContext(conf)
    val textFile = sc.textFile(inputFile)
    val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
    wordCount.foreach(println)
  }
}

When running, you can see that the result is

(up,1) (day,2) (good,2) (study,1)

Packaging operation

stay IDEA The right side of the development interface , Click on Maven Icon , Will pop up Maven Debugging interface

stay Maven Click... In the debugging interface package, You can package the application , Pack it up JAR package .

At this time , To IDEA In the project directory tree on the left side of the development interface , stay “target” Under the table of contents , You can see that two JAR file ,

Namely ：WordCount-1.0.jar and WordCount-1.0-jar-with-dependencies.jar.

then , Open one Linux terminal , Execute the following command to run JAR package ：

spark-submit --class WordCount D:\Project\Spark\WordCount\target\WordCount-1.0-jar-with-dependencies.jar

版权声明
本文为[Sword Walker]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204230952211795.html

当前位置：网站首页>Using idea to develop Spark Program

Using idea to develop Spark Program

Windows Environmental Science

Scala

Spark

Hadoop

The configuration file

winutils

Create project

test

Packaging operation

边栏推荐

猜你喜欢

随机推荐