当前位置:网站首页>Apache seatunnel 2.1.0 deployment and stepping on the pit
Apache seatunnel 2.1.0 deployment and stepping on the pit
2022-04-23 13:42:00 【Ruo Xiaoyu】
brief introduction
SeaTunnel Original name Waterdrop, since 2021 year 10 month 12 Renamed SeaTunnel.
SeaTunnel It is a very easy to use ultra-high performance distributed data integration platform , Support real-time synchronization of massive data . It can synchronize tens of billions of data stably and efficiently every day , It's near 100 Used in the production of this company .
characteristic
- Easy to use , Flexible configuration , Low code development
- Real time streaming
- Offline multi-source data analysis
- High performance 、 Massive data processing capabilities
- Modular and plug-in mechanisms , extensible
- Supported by SQL Data processing and aggregation
- Support Spark Structured streaming media
- Support Spark 2.x
- Here we stepped on a pit , Because we tested spark The environment has been upgraded to 3.x edition , Now, SeaTunnel Only support 2.x, So we need to redeploy one 2.x Of spark
-
- Here we stepped on a pit , Because we tested spark The environment has been upgraded to 3.x edition , Now, SeaTunnel Only support 2.x, So we need to redeploy one 2.x Of spark
Workflow

install
Installation document
https://seatunnel.incubator.apache.org/docs/2.1.0/spark/installation
- Environmental preparation : install jdk and spark
- config/seatunnel-env.sh
- Download installation package
- https://www.apache.org/dyn/closer.lua/incubator/seatunnel/2.1.0/apache-seatunnel-incubating-2.1.0-bin.tar.gz
- Decompress and edit config/seatunnel-env.sh
- Specify the necessary environment configuration , for example SPARK_HOME(SPARK Download and unzip the directory )
1、 test jdbc-to-jdbc
- Create a new config/spark.batch.jdbc.to.jdbc.conf file
env {
# seatunnel defined streaming batch duration in seconds
spark.app.name = "SeaTunnel"
spark.executor.instances = 1
spark.executor.cores = 1
spark.executor.memory = "1g"
}
source {
jdbc {
driver = "com.mysql.jdbc.Driver"
url = "jdbc:mysql://0.0.0.0:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false"
table = "table_name"
result_table_name = "result_table_name"
user = "root"
password = "password"
}
}
transform {
# split data by specific delimiter
# you can also use other filter plugins, such as sql
# sql {
# sql = "select * from accesslog where request_time > 1000"
# }
# If you would like to get more information about how to configure seatunnel and see full list of filter plugins,
# please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Sql
}
sink {
# choose stdout output plugin to output data to console
# Console {}
jdbc {
# Configuration here driver Parameters , Otherwise, the data exchange will not succeed
driver = "com.mysql.jdbc.Driver",
saveMode = "update",
url = "jdbc:mysql://ip:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false",
user = "userName",
password = "***********",
dbTable = "tableName",
customUpdateStmt = "INSERT INTO table (column1, column2, created, modified, yn) values(?, ?, now(), now(), 1) ON DUPLICATE KEY UPDATE column1 = IFNULL(VALUES (column1), column1), column2 = IFNULL(VALUES (column2), column2)"
}
}
yarn Start command
./bin/start-seatunnel-spark.sh --master 'yarn' --deploy-mode client --config ./config/spark.batch.jdbc.to.jdbc.conf
Step on the pit : Run times [driver] as non-empty , Locate and find sink It needs to be set in the configuration driver Parameters
ERROR Seatunnel:121 - Plugin[org.apache.seatunnel.spark.sink.Jdbc] contains invalid config, error: please specify [driver] as non-empty

版权声明
本文为[Ruo Xiaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230602186365.html
边栏推荐
- ARGB transparency conversion
- PG library checks the name
- 浅谈js正则之test方法bug篇
- [Journal Conference Series] IEEE series template download guide
- Django::Did you install mysqlclient?
- playwright控制本地谷歌浏览打开,并下载文件
- Oracle index status query and index reconstruction
- 切线空间(tangent space)
- Using open to open a file in JNI returns a - 1 problem
- Cross carbon market and Web3 to achieve renewable transformation
猜你喜欢

Stack protector under armcc / GCC

浅谈js正则之test方法bug篇

Cross carbon market and Web3 to achieve renewable transformation
![[machine learning] Note 4. KNN + cross validation](/img/a1/5afccedf509eda92a0fe5bf9b6cbe9.png)
[machine learning] Note 4. KNN + cross validation

Solve the problem that Oracle needs to set IP every time in the virtual machine

TERSUS笔记员工信息516-Mysql查询(2个字段的时间段唯一性判断)
![[point cloud series] learning representations and generative models for 3D point clouds](/img/c5/712bd448fa6c0bffc09ce57f6e56b5.png)
[point cloud series] learning representations and generative models for 3D point clouds

TIA博途中基於高速計數器觸發中斷OB40實現定點加工動作的具體方法示例

Oracle defines self incrementing primary keys through triggers and sequences, and sets a scheduled task to insert a piece of data into the target table every second

SAP ui5 application development tutorial 72 - animation effect setting of SAP ui5 page routing
随机推荐
Logstash数据处理服务的输入插件Input常见类型以及基本使用
Software test system integration project management engineer full truth simulation question (including answer and analysis)
PyTorch 21. NN in pytorch Embedding module
Oracle database combines the query result sets of multiple columns into one row
Comparison and summary of applicable scenarios of Clickhouse and MySQL database
SAP ui5 application development tutorial 72 - animation effect setting of SAP ui5 page routing
Using open to open a file in JNI returns a - 1 problem
Personal learning related
Oracle calculates the difference between two dates in seconds, minutes, hours and days
交叉碳市场和 Web3 以实现再生变革
解决tp6下载报错Could not find package topthink/think with stability stable.
Longitude and latitude position of provincial capitals in China
Detailed explanation of ADB shell top command
Cross carbon market and Web3 to achieve renewable transformation
Part 3: docker installing MySQL container (custom port)
Interface idempotency problem
【视频】线性回归中的贝叶斯推断与R语言预测工人工资数据|数据分享
[multi screen interaction] realize dual multi screen display II: startactivity mode
Lenovo Savior y9000x 2020
[point cloud series] relationship based point cloud completion