当前位置:网站首页>Dolphin scheduler configuring dataX pit records

Dolphin scheduler configuring dataX pit records

2022-04-23 13:41:00 Ruo Xiaoyu

1、tmp/dolphinscheduler/exec/process Failed to create the next file

dolphinscheduler Dispatch datax The task needs to be tmp/dolphinscheduler/exec/process Create a series of temporary directory files , But in worker In the operation log /opt/soft/dolphinscheduler/logs/dolphinscheduler-worker.log See the error report of creation failure

[taskAppId=TASK-1-10-13]:[178] - datax task failure
java.io.IOException: Directory ‘/tmp/dolphinscheduler/exec/process/1/1/10/13’ could not be created

 Insert picture description here
The permission to find this directory is root, I dolphinscheduler Is installed in dolphin Under the user , So I want to modify the machine tmp File permissions

$ sudo chown -R dolphin:dolphin tmp

2、datax Environment variable setting problem

Use dolphinscheduler Dispatch datax When the task , data source 、 Tasks can be created successfully , Is that the operation always fails , I can't see the log directly , Then log in to the running worker machine , see /opt/soft/dolphinscheduler/logs/dolphinscheduler-worker.log Log files , See the hint ERR

[INFO] 2021-11-09 11:25:35.446 - [taskAppId=TASK-1-11-14]:[138] - -> python2.7: can’t open file ‘/opt/soft/datax/bin/datax.py/bin/datax.py’: [Errno 20] Not a directory
 Insert picture description here
Express datax Path configuration error for , The file cannot be found .
see vim /opt/soft/dolphinscheduler/conf/env/
 Insert picture description here
This path is the official default before , Now you don't need to specify to bin And running files , Just go to the installation directory .
Route
export DATAX_HOME=/opt/soft/datax/bin/datax.py
Change it to
export DATAX_HOME=/opt/soft/datax

After the save , Rerun mission
 Insert picture description here
Rerun successful
 Insert picture description here

3、dolphinscheduler Dispatch Datax perform mysql To hive Data exchange of , Because the default data source selection can only be mysql Relational database , So you need to choose a custom template , Custom configuration, connection address and other information json.

 Insert picture description here
Profile templates ( This configuration is the configuration of my final successful version , Some parameters need to be configured according to your own information )

{
    
    "job": {
    
        "content": [
            {
    
                "reader": {
    
                    "name": "mysqlreader", 
                    "parameter": {
    
                        "column": [
                            "*"
                        ], 
                        "connection": [
                            {
    
                                "jdbcUrl": [
                                    "jdbc:mysql://xx.xx.xx.xx:3306/datatest?useUnicode=true&characterEncoding=utf8&useSSL=false"
                                ], 
                                "table": [
                                    "test_table_info"
                                ]
                            }
                        ], 
                        "password": "cloud", 
                        "username": "root", 
                        "where": ""
                    }
                }, 
                "writer": {
    
                    "name": "hdfswriter", 
                    "parameter": {
    
                        "column": [
                            {
    
                                "name": "order_id", 
                                "type": "string"
                            }, 
                            {
    
                                "name": "str_cd", 
                                "type": "string"
                            }, 
                            {
    
                                "name": "gds_cd", 
                                "type": "string"
                            }, 
                            {
    
                                "name": "pay_amnt", 
                                "type": "string"
                            }, 
                            {
    
                                "name": "member_id", 
                                "type": "string"
                            }, 
                            {
    
                                "name": "statis_date", 
                                "type": "string"
                            }
                        ], 
                        "compress": "", 
                        "defaultFS": "hdfs:// Yours hdfs namenode Address :9000", 
                        "fieldDelimiter": ",", 
                        "fileName": "hive_test_table_info", 
                        "fileType": "text", 
                        "path": "/hive/hive.db/hive_test_table_info", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
    
            "speed": {
    
                "channel": "1"
            }
        }
    }
}

3.1、 The first problem encountered during the execution of the task : And HDFS Appears when a connection is established IO abnormal

 [job-0] ERROR Engine -  the DataX Intelligent analysis , The most likely error cause for this task is :
com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-06], Description:[ And HDFS Appears when a connection is established IO abnormal .]. 
- java.net.ConnectException: Call From xxxxx/10.xx.xx.xx to 10.xx.1xx.1xx:8020 failed on connection exception: java.net.ConnectException: Connection refused;
- For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Because I configured it according to the online template hdfs port 8020, In fact, our port is 9000. So here I put 8020 Changed to 9000.

3.2 The second question is : And HDFS Appears when a connection is established IO abnormal . But the content of this exception is different

 ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-06], Description:[ And HDFS Appears when a connection is established IO abnormal .]. - org.apache.hadoop.security.AccessControlException: Permission denied: user=developer01, access=WRITE, inode="/hive/hive.db/hive_test_table_info":bigdata:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)

 Insert picture description here
Here, because the tenant I use when executing is developer01, and hdfs Owned by hive Table write permissions are bigdata user , Therefore, when performing tasks, you should first configure and select tenants with corresponding permissions
In the Security Center – Tenant management Menu configuration bigdata Tenant
 Tenant management
Edit this datax Mission , Select... When saving bigdata Tenant
 Insert picture description here

3.3 The third problem encountered in the implementation is mysql Connection problem , This is executing datax Of mysql To mysql I have also encountered , This custom json I didn't pay attention to the document , When I see the log, I think .

 ERROR RetryUtil - Exception when calling callable,  abnormal Msg:DataX Unable to connect to the corresponding database , Probably because :1)  Configured ip/port/database/jdbc error , Unable to connect .2)  Configured username/password error , Authentication failure . Please and DBA Confirm whether the connection information of the database is correct .

Here as long as jdbc Add... To the configuration of the connection useSSL=false that will do .
useSSL=false

版权声明
本文为[Ruo Xiaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230602186898.html