当前位置:网站首页>Dolphin scheduler scheduling spark task stepping record
Dolphin scheduler scheduling spark task stepping record
2022-04-23 13:42:00 【Ruo Xiaoyu】
1、 About spark The scheduling of worker Deploy
I'm testing Dolphinscheduler It adopts the cluster mode , Two machines are deployed master, Two machines are deployed worker, and hadoop and spark It is deployed on other machines . In the configuration dolphinscheduler_env.sh How to set the file spark The environmental address is very confused . The first problem in test scheduling is that you can't find spark-submit file
command: line 5: /bin/spark-submit: No such file or directory
Viewing the scheduling process through the log will clearly see ,DS Need to pass through dolphinscheduler_env.sh Configured in the file SPARK_HOME1 Look for spark-submit Script . It cannot find the path on different servers .
So I think of two solutions to the problem :
1、 hold spark Get your installation package worker Next , But this may involve hadoop Of yarn Other configuration .
2、 stay spark client Deploy another one on the deployment machine Dolphinscheduler worker, In this way, only consider DS Its own configuration is enough .
I finally chose option two .
In addition to worker Copy the installation file of the node to spark client On and off the machine , Also pay attention to follow the relevant steps in the installation :
- Create the same user as other nodes , for example dolphinscheduler
- take DS The installation directory is authorized to dolphinscheduler user
- Modify the of each node /etc/hosts file
- Create secret free
- Modify each DS Node dolphinscheduler/conf/common.properties File configuration
- Create the corresponding directory according to the configuration file , And authorize , Such as /tmp/dolphinscheduler Catalog
- Reconfigure the worker Node dolphinscheduler_env.sh file , add to SPARK_HOME route .
- Restart the cluster .
2、spark-submit The problem of Execution Authority
In the process of task submission and execution , my spark The test task also involves testing hdfs The operation of , So the running tenant owns hdfs The powers of the bigdata.
function spark Failure , Tips :
/opt/soft/spark/bin/spark-submit: Permission denied
At first, I thought it was the wrong tenant to choose , But think about it bigdata and hadoop Deployed together , and bidata Users also have spark jurisdiction , Obviously it's not the user's problem . Then you should think of spark-submit It's execution authority , So give users excute jurisdiction .
- chmod 755 spark
3、 Mingming spark The task was executed successfully , however DS The interface still fails to display
In operation , Found me spark The task has written the processed file to HDFS Catalog , In line with my mission logic . Check the log , Show that the task is successful , But there is one error:
[ERROR] 2021-11-15 16:16:26.012 - [taskAppId=TASK-3-43-72]:[418] - yarn applications: application_1636962361310_0001 , query status failed, exception:{
}
java.lang.Exception: yarn application url generation failed
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationUrl(HadoopUtils.java:208)
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:418)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
This error report can be seen as DS Need to go to a yarn Query under the path of application The state of , After getting this status, show the execution results , But I didn't get it , Obviously we're going to see where he goes to get , Can you configure this address .
I check the source code , find HadoopUtils.getApplicationUrl This method
appaddress Need to get a yarn.application.status.address Configuration parameters for
Find the default configuration in the source code , Although it says HA The default mode can be retained , But watch my yarn Not installed in ds1 Upper , So here we need to change it to ourselves yarn Address .
Configure this parameter to scheduling spark Of worker node /opt/soft/dolphinscheduler/conf/common.properties file
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
版权声明
本文为[Ruo Xiaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230602186775.html
边栏推荐
- Remove the status bar
- Window function row commonly used for fusion and de duplication_ number
- [point cloud series] pointfilter: point cloud filtering via encoder decoder modeling
- 100000 college students have become ape powder. What are you waiting for?
- Feature Engineering of interview summary
- 联想拯救者Y9000X 2020
- [point cloud series] unsupervised multi task feature learning on point clouds
- Test the time required for Oracle library to create an index with 7 million data in a common way
- Oracle creates tablespaces and modifies user default tablespaces
- TCP 复位gongji原理和实战复现
猜你喜欢
[Video] Bayesian inference in linear regression and R language prediction of workers' wage data | data sharing
Logstash数据处理服务的输入插件Input常见类型以及基本使用
SAP ui5 application development tutorial 72 - animation effect setting of SAP ui5 page routing
零拷贝技术
Oracle defines self incrementing primary keys through triggers and sequences, and sets a scheduled task to insert a piece of data into the target table every second
The interviewer dug a hole for me: what's the use of "/ /" in URI?
TERSUS笔记员工信息516-Mysql查询(2个字段的时间段唯一性判断)
Tersus notes employee information 516 MySQL query (time period uniqueness judgment of 2 fields)
AI21 Labs | Standing on the Shoulders of Giant Frozen Language Models(站在巨大的冷冻语言模型的肩膀上)
MySQL and PgSQL time related operations
随机推荐
Solve the problem that Oracle needs to set IP every time in the virtual machine
Test on the time required for Oracle to delete data with delete
Antd design form verification
"Xiangjian" Technology Salon | programmer & CSDN's advanced road
Oracle database combines the query result sets of multiple columns into one row
Playwright controls local Google browsing to open and download files
Django::Did you install mysqlclient?
Summary of request and response and their ServletContext
[point cloud series] unsupervised multi task feature learning on point clouds
Solve tp6 download error course not find package topthink / think with stability stable
Usereducer basic usage
[tensorflow] sharing mechanism
sys. dbms_ scheduler. create_ Job creates scheduled tasks (more powerful and rich functions)
The interviewer dug a hole for me: how many concurrent TCP connections can a single server have?
Oracle renames objects
SAP ui5 application development tutorial 72 - animation effect setting of SAP ui5 page routing
Oracle job scheduled task usage details
ARGB transparency conversion
为什么从事云原生开发需要学习容器技术
Oracle modify default temporary tablespace