当前位置:网站首页>Apache seatunnel 2.1.0 deployment and stepping on the pit
Apache seatunnel 2.1.0 deployment and stepping on the pit
2022-04-23 13:42:00 【Ruo Xiaoyu】
brief introduction
SeaTunnel Original name Waterdrop, since 2021 year 10 month 12 Renamed SeaTunnel.
SeaTunnel It is a very easy to use ultra-high performance distributed data integration platform , Support real-time synchronization of massive data . It can synchronize tens of billions of data stably and efficiently every day , It's near 100 Used in the production of this company .
characteristic
- Easy to use , Flexible configuration , Low code development
- Real time streaming
- Offline multi-source data analysis
- High performance 、 Massive data processing capabilities
- Modular and plug-in mechanisms , extensible
- Supported by SQL Data processing and aggregation
- Support Spark Structured streaming media
- Support Spark 2.x
- Here we stepped on a pit , Because we tested spark The environment has been upgraded to 3.x edition , Now, SeaTunnel Only support 2.x, So we need to redeploy one 2.x Of spark
-
- Here we stepped on a pit , Because we tested spark The environment has been upgraded to 3.x edition , Now, SeaTunnel Only support 2.x, So we need to redeploy one 2.x Of spark
Workflow
install
Installation document
https://seatunnel.incubator.apache.org/docs/2.1.0/spark/installation
- Environmental preparation : install jdk and spark
- config/seatunnel-env.sh
- Download installation package
- https://www.apache.org/dyn/closer.lua/incubator/seatunnel/2.1.0/apache-seatunnel-incubating-2.1.0-bin.tar.gz
- Decompress and edit config/seatunnel-env.sh
- Specify the necessary environment configuration , for example SPARK_HOME(SPARK Download and unzip the directory )
1、 test jdbc-to-jdbc
- Create a new config/spark.batch.jdbc.to.jdbc.conf file
env {
# seatunnel defined streaming batch duration in seconds
spark.app.name = "SeaTunnel"
spark.executor.instances = 1
spark.executor.cores = 1
spark.executor.memory = "1g"
}
source {
jdbc {
driver = "com.mysql.jdbc.Driver"
url = "jdbc:mysql://0.0.0.0:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false"
table = "table_name"
result_table_name = "result_table_name"
user = "root"
password = "password"
}
}
transform {
# split data by specific delimiter
# you can also use other filter plugins, such as sql
# sql {
# sql = "select * from accesslog where request_time > 1000"
# }
# If you would like to get more information about how to configure seatunnel and see full list of filter plugins,
# please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Sql
}
sink {
# choose stdout output plugin to output data to console
# Console {}
jdbc {
# Configuration here driver Parameters , Otherwise, the data exchange will not succeed
driver = "com.mysql.jdbc.Driver",
saveMode = "update",
url = "jdbc:mysql://ip:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false",
user = "userName",
password = "***********",
dbTable = "tableName",
customUpdateStmt = "INSERT INTO table (column1, column2, created, modified, yn) values(?, ?, now(), now(), 1) ON DUPLICATE KEY UPDATE column1 = IFNULL(VALUES (column1), column1), column2 = IFNULL(VALUES (column2), column2)"
}
}
yarn Start command
./bin/start-seatunnel-spark.sh --master 'yarn' --deploy-mode client --config ./config/spark.batch.jdbc.to.jdbc.conf
Step on the pit : Run times [driver] as non-empty , Locate and find sink It needs to be set in the configuration driver Parameters
ERROR Seatunnel:121 - Plugin[org.apache.seatunnel.spark.sink.Jdbc] contains invalid config, error: please specify [driver] as non-empty
版权声明
本文为[Ruo Xiaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230602186365.html
边栏推荐
- 切线空间(tangent space)
- UEFI learning 01-arm aarch64 compilation, armplatformpripeicore (SEC)
- The interviewer dug a hole for me: what's the use of "/ /" in URI?
- Comparison and summary of applicable scenarios of Clickhouse and MySQL database
- Oracle job scheduled task usage details
- Database transactions
- Part 3: docker installing MySQL container (custom port)
- Playwright controls local Google browsing to open and download files
- Modification of table fields by Oracle
- GDB的使用
猜你喜欢
Detailed explanation of constraints of Oracle table
【重心坐标插值、透视矫正插值】原理以及用法见解
Zero copy technology
[point cloud series] multi view neural human rendering (NHR)
[point cloud series] Introduction to scene recognition
[point cloud series] unsupervised multi task feature learning on point clouds
Opening: identification of double pointer instrument panel
Set Jianyun x Feishu Shennuo to help the enterprise operation Department realize office automation
Android clear app cache
[Journal Conference Series] IEEE series template download guide
随机推荐
为什么从事云原生开发需要学习容器技术
What do the raddr and rport in webrtc ice candidate mean?
Campus takeout system - "nongzhibang" wechat native cloud development applet
Publish custom plug-ins to local server
顶级元宇宙游戏Plato Farm,近期动作不断利好频频
[point cloud series] relationship based point cloud completion
Plato farm, a top-level metauniverse game, has made frequent positive moves recently
UEFI learning 01-arm aarch64 compilation, armplatformpripeicore (SEC)
JS compares different elements in two arrays
[official announcement] Changsha software talent training base was established!
PG library checks the name
Personal learning related
Oracle modify default temporary tablespace
@Excellent you! CSDN College Club President Recruitment!
How to build a line of code with M4 qprotex
校园外卖系统 - 「农职邦」微信原生云开发小程序
Short name of common UI control
【重心坐标插值、透视矫正插值】原理以及用法见解
Common commands of ADB shell
零拷贝技术