当前位置:网站首页>pyspark资源配置
pyspark资源配置
2022-08-08 21:26:00 【Code_LT】
pyhton中,想像scala一样,对spark使用资源做指定,如:
spark-submit \
--principal $principal \
--keytab $keytab \
--name Test \
--master yarn --deploy-mode cluster \
--num-executors 10 \
--executor-cores 4 \
--executor-memory 16G \
--driver-memory 16G \
--conf spark.locality.wait=10 \
--conf spark.serializer="org.apache.spark.serializer.KryoSerializer" \
--conf spark.streaming.backpressure.enabled=true \
--conf spark.task.maxFailures=8 \
--conf spark.driver.maxResultSize=8G \
--conf spark.default.parallelism=500 \
--conf spark.sql.shuffle.partitions=300 \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.sql.broadcastTimeout=3000 \
--conf spark.yarn.submit.waitAppCompletion=true \
--conf spark.yarn.report.interval=6000 \
--conf spark.driver.extraClassPath=$localroot/config \
--conf spark.executor.userClassPathFirst=true \
--conf spark.hbase.obtainToken.enabled=true \
--conf spark.yarn.security.credentials.hbase.enabled=true \
--conf spark.executor.extraJavaOptions="${executorJavaOpts:34}" \
--conf spark.yarn.cluster.driver.extraJavaOptions="${driverJavaOpts:45}" \
--conf spark.driver.userClassPathFirst=true \
--conf spark.yarn.dist.innerfiles=$SPARK_HOME/conf/log4j-executor.properties,$SPARK_HOME/conf/jaas-zk.conf,$SPARK_HOME/conf/carbon.properties,$SPARK_HOME/conf/jets3t.properties,$SPARK_HOME/conf/topology.properties,$SPARK_HOME/conf/mapred-site.xml \
--files $localroot/config/logback.xml,\
$localroot/config/hbase-site.xml,$localroot/config/kdc.conf,${uploadFiles} \
--class com.huawei.rcm.newsfeed.boxrcm.nearby.CityInfoParse \
--jars $localroot/lib/bcprov-ext-jdk15on-1.68.jar,\
$localroot/lib/CryptoUtil-1.1.5.304.jar,\
$localroot/lib/commons-pool2-2.8.1.jar,\
$localroot/lib/jedis-2.9.0.jar,$localroot/lib/com.huawei.dcs.dcsdk.core-1.6.18.101.jar,$localroot/lib/com.huawei.dcs.dcsdk.support.onejar-1.6.18.101.jar,$localroot/lib/gpaas-middleware-common-2.2.5.101.jar,\
$app_jarpath \
$arg1 \
$arg2 \
$arg3 \
但很难查到pyspark任务的资源指定配置。
其实很相似,方法如下:
keytab_path=/home/testuser/wdbkeytab/user.keytab
anaconda_archive=hdfs://teset/anaconda3.tar.gz#anaconda_pack
application_name=task_id_$1_$2
pyspark_python="./anaconda_pack/anaconda3/bin/python"
spark-submit --master yarn --deploy-mode cluster --name $application_name \
--driver-cores 2 \
--driver-memory 64G \
--queue $queue \
--num-executors 50 \
--executor-memory 3g \
--executor-cores 2 \
--principal $principal \
--keytab $keytab_path \
--archives $anaconda_archive \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=$pyspark_python \
--conf spark.executorEnv.PYSPARK_PYTHON=$pyspark_python \
--conf spark.driver.maxResultSize=10G \
--conf spark.default.parallelism=1000 \
--conf spark.speculation=true --conf spark.speculation.interval=60000 --conf spark.speculation.quantile=0.85 \
--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true \
--conf spark.security.credentials.hbase.enabled=true \
--conf spark.hadoop.validateOutputSpecs=false \
--conf spark.yarn.user.classpath.first=true \
--conf spark.executor.memoryOverhead=40960 \
--conf spark.yarn.am.waitTime=1000s \
--py-files test.py \
边栏推荐
猜你喜欢
Redis之HyperLogLog
Real-time crawler example explanation
为什么要做LiveVideoStack课程?
js写一个淘宝大小图轮播
Redis之事务、锁
IDEA Error:(1, 1) 错误: 非法字符: \65279 Error:(1, 10) 错误: 需要class, interface或enum 解决办法
MATLAB:方程组的求解
微信小程序--》数据请求和页面导航
"New Infrastructure of Cultural Digital Strategy and Ecological Construction of Cultural Art Chain" was successfully held
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Paper and Code Analysis
随机推荐
Conformer论文以及代码解析(上)
基于阿里云的基础架构设施保障(二)IAAS云存储
Move your office environment anywhere with a solid state USB drive
给定二叉搜索树和两个整数A,B (最小整数和最大整数)。如何删除不在该区间内的元素(剪枝)
Conformer papers and code parsing (below)
蚂蚁感冒,蓝桥杯,简易AC代码讲解
Redis之事务、锁
js中with作用
Contextual Transformer Networks for Visual Recognition paper and code analysis
爬虫视频教学:网页数据抓取
用Multisim对石英晶体振荡器进行仿真
如何配合代理使用cURL?
n皇后求解单一解问题
实时爬虫实例讲解
为什么要做LiveVideoStack课程?
MATLAB综合实例:时域信号的频谱分析
为什么要做LiveVideoStack课程?
深度学习-神经网络原理2
oracle数据库的数据备份导出与数据导入恢复
爬虫系列:读取 CSV、PDF、Word 文档