Oracle通过DataX导入StarRocks

背景

测试Oracle数据导入到StarRocks

环境准备

StarRocks:

Fe:172.26.92.139:/home/disk2/jingdan/starrocks-1.18.2-fe

Be:172.26.92.139/172.26.92.154/172.26.92.155:/home/disk2/jingdan/starrocks-1.18.2-be

Oracle:

172.26.92.139:docker exec -it oracle bash

Oracle安装

为了方便测试,本次使用docker部署

docker pull registry.aliyuncs.com/helowin/oracle_11g
docker run -d -p 1521:1521 --name oracle registry.aliyuncs.com/helowin/oracle_11g
docker exec -it oracle bash

这里root密码为helowin

声明环境变量,将以下内容追加到vi /etc/profile

export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_2
export ORACLE_SID=helowin
export PATH=$ORACLE_HOME/bin:$PATH
source /etc/profile
#退出到oracle账号,记得需要在oracle再source一下
source /etc/profile
#登录
sqlplus /nolog

创建tablespace

SQL>create tablespace demo datafile 'demo.dbf' 1024M;

创建用户并授权

SQL>create user demo identified by demo default tablespace DEMO;#这块DEMO为上一步创建的tablespace,必须为大写
SQL>grant connect,resource,dba to demo;

创建测试表并导入数据

使用上一步创建的用户登陆

sqlplus /nolog
SQL>conn demo@helowin

SQL>CREATE TABLE emp_demo(
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4,0),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2,0)
);

#写入数据
SQL>INSERT INTO emp_demo
(empno,ename,job,mgr,hiredate,sal,comm,deptno)
VALUES(7369,'SMITH','CLERK',7902,TO_DATE('1980-12-17','YYYY-MM-DD'),800,NULL,20);
INSERT INTO emp_demo
VALUES(7499,'ALLEN','SALESMAN',7698,TO_DATE('1981-2-20','YYYY-MM-DD'),1600,300,30);
INSERT INTO emp_demo
VALUES(7521,'WARD','SALESMAN',7698,TO_DATE('1981-2-22','YYYY-MM-DD'),1250,500,30);
INSERT INTO emp_demo
VALUES(7566,'JONES','MANAGER',7839,TO_DATE('1981-4-2','YYYY-MM-DD'),2975,NULL,20);
INSERT INTO emp_demo
VALUES(7564,'MARTIN','SALESMAN',7698,TO_DATE('1981-9-28','YYYY-MM-DD'),1250,1400,30);
INSERT INTO emp_demo
VALUES(7698,'BLAKE','MANAGER',7839,TO_DATE('1981-5-1','YYYY-MM-DD'),2850,NULL,30);
INSERT INTO emp_demo
VALUES(7782,'CLARK','MANAGER',7839,TO_DATE('1981-6-9','YYYY-MM-DD'),2450,NULL,10);
INSERT INTO emp_demo
VALUES(7788,'SCOTT','ANALYST',7566,TO_DATE('1987-4-19','YYYY-MM-DD'),3000,NULL,20);
INSERT INTO emp_demo
VALUES(7839,'KING','PRESIDENT',NULL,TO_DATE('1981-11-17','YYYY-MM-DD'),5000,NULL,10);
INSERT INTO emp_demo
VALUES(7844,'TURNER','SALESMAN',7698,TO_DATE('1981-9-8','YYYY-MM-DD'),1500,0,30);
INSERT INTO emp_demo
VALUES(7876,'ADAMS','CLERK',7788,TO_DATE('1987-5-23','YYYY-MM-DD'),1100,NULL,20);
INSERT INTO emp_demo
VALUES(7900,'JAMES','CLERK',7698,TO_DATE('1981-12-3','YYYY-MM-DD'),950,NULL,30);
INSERT INTO emp_demo
VALUES(7902,'FORD','ANALYST',7566,TO_DATE('1981-12-3','YYYY-MM-DD'),3000,NULL,20);
INSERT INTO emp_demo
VALUES(7934,'MILLER','CLERK',7782,TO_DATE('1982-1-23','YYYY-MM-DD'),1300,NULL,10);

Datax准备

#下载解压datax
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -xf datax.tar.gz

#下载解压starrockswriter

https://github.com/StarRocks/DataX/releases

tar -xf starrockswriter.tar.gz
#将starrockswriter放至datax插件目录
mv starrockswriter datax/plugin/writer/

模拟测试

  1. datax配置
{
    "job": {
        "setting": {
            "speed": {
                 "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0
            }
        },
        "content": [
            {
                "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "username": "demo",
                        "password": "demo",
                        "column": [ "empno","ename","job","mgr","hiredate","sal","comm","deptno"],
                        "connection": [
                            {
                                "table": [ "EMP_DEMO"],
                                "jdbcUrl": [
                                     "jdbc:oracle:thin:@127.0.0.1:1521:helowin"  #"jdbc:oracle:thin:@$hostname:$port:$instance"
                                ]
                            },
                        ]
                    }
                },
               "writer": {
                    "name": "starrockswriter",
                    "parameter": {
                        "username": "root",
                        "password": "xxx",
                        "database": "ssb",
                        "table": "emp_xiangyoulu",
                        "column": [ "empno","ename","job","mgr","hiredate","sal","comm","deptno"],
                        "preSql": [],
                        "postSql": [],
                        "jdbcUrl": "jdbc:mysql://127.0.0.1:8630/", #"jdbc:mysql://$fe_hostname:$query_port/"
                        "loadUrl": ["127.0.0.1:8230"] #"$fe_hostname:$http_port",可配置多个
                    }
                }
            }
        ]
    }
}
  1. 测试导入
python bin/datax.py --jvm="-Xms6G -Xmx6G" --loglevel=debug job/job_starrocks.json

mysql> select * from emp_xiangyoulu;
+-------+--------+-----------+------+------------+------+------+--------+
| empno | ename  | job       | mgr  | hiredate   | sal  | comm | deptno |
+-------+--------+-----------+------+------------+------+------+--------+
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800 | NULL |     20 |
|  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 | 1600 |  300 |     30 |
|  7564 | MARTIN | SALESMAN  | 7698 | 1981-09-28 | 1250 | 1400 |     30 |
|  7876 | ADAMS  | CLERK     | 7788 | 1987-05-22 | 1100 | NULL |     20 |
|  7900 | JAMES  | CLERK     | 7698 | 1981-12-03 |  950 | NULL |     30 |
|  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 | 1250 |  500 |     30 |
|  7839 | KING   | PRESIDENT | NULL | 1981-11-17 | 5000 | NULL |     10 |
|  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 | 3000 | NULL |     20 |
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300 | NULL |     10 |
|  7566 | JONES  | MANAGER   | 7839 | 1981-04-02 | 2975 | NULL |     20 |
|  7698 | BLAKE  | MANAGER   | 7839 | 1981-05-01 | 2850 | NULL |     30 |
|  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 | 2450 | NULL |     10 |
|  7788 | SCOTT  | ANALYST   | 7566 | 1987-04-18 | 3000 | NULL |     20 |
|  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 | 1500 |    0 |     30 |
+-------+--------+-----------+------+------------+------+------+--------+
14 rows in set (0.04 sec)

大表测试

234610条/s

排查路径

fe日志:fe.log

datax日志:datax/log/YYYY-MM-DD/

遇到问题

1. 表或视图不存在

原因:

刚开始没搞清楚oracle的一些概念,建表在默认实例下了,重新在helowin实例下创建就OK了

解决方案:

sqlplus /nolog
SQL>conn demo@helowin

SQL> CREATE TABLE emp_demo(
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4,0),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2,0)
);

2. Connection reset

原因:

loadurl的端口配置错误,配置为query port

解决方案

datax导入是以stream load方式导入,loadurl需要配置的http port

"loadUrl": ["127.0.0.1:8230"]

3. too many filtered rows

{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“too many filtered rows”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:“4bdffb12-e84c-4f7a-aea3-b8664b04c8fb”,“LoadBytes”:766,“StreamLoadPutTimeMs”:2,“NumberTotalRows”:1,“WriteDataTimeMs”:131,“TxnId”:373478,“LoadTimeMs”:149,“ErrorURL”:“http://172.26.92.155:8640/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c844d7d6-a0e3-e5fd-e6d0-ca10976b95bd_c844d7d6a0e3e5fd_e6d0ca10976b95bd",“ReadDataTimeMs”:0,“NumberLoadedRows”:0,"NumberFilteredRows”:1}

原因:

写starrocks的时候指定了分隔符,使用默认的即可

解决方案:

删除以下内容从导入配置中

"loadProps": {
            "column_separator": "\\x01",
            "row_delimiter": "\\x02"
}
1赞