背景
测试Oracle数据导入到StarRocks
环境准备
StarRocks:
Fe:172.26.92.139:/home/disk2/jingdan/starrocks-1.18.2-fe
Be:172.26.92.139/172.26.92.154/172.26.92.155:/home/disk2/jingdan/starrocks-1.18.2-be
Oracle:
172.26.92.139:docker exec -it oracle bash
Oracle安装
为了方便测试,本次使用docker部署
docker pull registry.aliyuncs.com/helowin/oracle_11g
docker run -d -p 1521:1521 --name oracle registry.aliyuncs.com/helowin/oracle_11g
docker exec -it oracle bash
这里root密码为helowin
声明环境变量,将以下内容追加到vi /etc/profile
export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_2
export ORACLE_SID=helowin
export PATH=$ORACLE_HOME/bin:$PATH
source /etc/profile
#退出到oracle账号,记得需要在oracle再source一下
source /etc/profile
#登录
sqlplus /nolog
创建tablespace
SQL>create tablespace demo datafile 'demo.dbf' 1024M;
创建用户并授权
SQL>create user demo identified by demo default tablespace DEMO;#这块DEMO为上一步创建的tablespace,必须为大写
SQL>grant connect,resource,dba to demo;
创建测试表并导入数据
使用上一步创建的用户登陆
sqlplus /nolog
SQL>conn demo@helowin
SQL>CREATE TABLE emp_demo(
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4,0),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2,0)
);
#写入数据
SQL>INSERT INTO emp_demo
(empno,ename,job,mgr,hiredate,sal,comm,deptno)
VALUES(7369,'SMITH','CLERK',7902,TO_DATE('1980-12-17','YYYY-MM-DD'),800,NULL,20);
INSERT INTO emp_demo
VALUES(7499,'ALLEN','SALESMAN',7698,TO_DATE('1981-2-20','YYYY-MM-DD'),1600,300,30);
INSERT INTO emp_demo
VALUES(7521,'WARD','SALESMAN',7698,TO_DATE('1981-2-22','YYYY-MM-DD'),1250,500,30);
INSERT INTO emp_demo
VALUES(7566,'JONES','MANAGER',7839,TO_DATE('1981-4-2','YYYY-MM-DD'),2975,NULL,20);
INSERT INTO emp_demo
VALUES(7564,'MARTIN','SALESMAN',7698,TO_DATE('1981-9-28','YYYY-MM-DD'),1250,1400,30);
INSERT INTO emp_demo
VALUES(7698,'BLAKE','MANAGER',7839,TO_DATE('1981-5-1','YYYY-MM-DD'),2850,NULL,30);
INSERT INTO emp_demo
VALUES(7782,'CLARK','MANAGER',7839,TO_DATE('1981-6-9','YYYY-MM-DD'),2450,NULL,10);
INSERT INTO emp_demo
VALUES(7788,'SCOTT','ANALYST',7566,TO_DATE('1987-4-19','YYYY-MM-DD'),3000,NULL,20);
INSERT INTO emp_demo
VALUES(7839,'KING','PRESIDENT',NULL,TO_DATE('1981-11-17','YYYY-MM-DD'),5000,NULL,10);
INSERT INTO emp_demo
VALUES(7844,'TURNER','SALESMAN',7698,TO_DATE('1981-9-8','YYYY-MM-DD'),1500,0,30);
INSERT INTO emp_demo
VALUES(7876,'ADAMS','CLERK',7788,TO_DATE('1987-5-23','YYYY-MM-DD'),1100,NULL,20);
INSERT INTO emp_demo
VALUES(7900,'JAMES','CLERK',7698,TO_DATE('1981-12-3','YYYY-MM-DD'),950,NULL,30);
INSERT INTO emp_demo
VALUES(7902,'FORD','ANALYST',7566,TO_DATE('1981-12-3','YYYY-MM-DD'),3000,NULL,20);
INSERT INTO emp_demo
VALUES(7934,'MILLER','CLERK',7782,TO_DATE('1982-1-23','YYYY-MM-DD'),1300,NULL,10);
Datax准备
#下载解压datax
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
tar -xf datax.tar.gz
#下载解压starrockswriter
https://github.com/StarRocks/DataX/releases
tar -xf starrockswriter.tar.gz
#将starrockswriter放至datax插件目录
mv starrockswriter datax/plugin/writer/
模拟测试
- datax配置
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0
}
},
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "demo",
"password": "demo",
"column": [ "empno","ename","job","mgr","hiredate","sal","comm","deptno"],
"connection": [
{
"table": [ "EMP_DEMO"],
"jdbcUrl": [
"jdbc:oracle:thin:@127.0.0.1:1521:helowin" #"jdbc:oracle:thin:@$hostname:$port:$instance"
]
},
]
}
},
"writer": {
"name": "starrockswriter",
"parameter": {
"username": "root",
"password": "xxx",
"database": "ssb",
"table": "emp_xiangyoulu",
"column": [ "empno","ename","job","mgr","hiredate","sal","comm","deptno"],
"preSql": [],
"postSql": [],
"jdbcUrl": "jdbc:mysql://127.0.0.1:8630/", #"jdbc:mysql://$fe_hostname:$query_port/"
"loadUrl": ["127.0.0.1:8230"] #"$fe_hostname:$http_port",可配置多个
}
}
}
]
}
}
- 测试导入
python bin/datax.py --jvm="-Xms6G -Xmx6G" --loglevel=debug job/job_starrocks.json
mysql> select * from emp_xiangyoulu;
+-------+--------+-----------+------+------------+------+------+--------+
| empno | ename | job | mgr | hiredate | sal | comm | deptno |
+-------+--------+-----------+------+------------+------+------+--------+
| 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
| 7564 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1987-05-22 | 1100 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | NULL | 30 |
| 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
| 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000 | NULL | 10 |
| 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | NULL | 10 |
| 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | NULL | 20 |
| 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1987-04-18 | 3000 | NULL | 20 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
+-------+--------+-----------+------+------------+------+------+--------+
14 rows in set (0.04 sec)
大表测试
234610条/s
排查路径
fe日志:fe.log
datax日志:datax/log/YYYY-MM-DD/
遇到问题
1. 表或视图不存在
原因:
刚开始没搞清楚oracle的一些概念,建表在默认实例下了,重新在helowin实例下创建就OK了
解决方案:
sqlplus /nolog
SQL>conn demo@helowin
SQL> CREATE TABLE emp_demo(
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4,0),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2,0)
);
2. Connection reset
原因:
loadurl的端口配置错误,配置为query port
解决方案 :
datax导入是以stream load方式导入,loadurl需要配置的http port
"loadUrl": ["127.0.0.1:8230"]
3. too many filtered rows
{“Status”:“Fail”,“BeginTxnTimeMs”:0,“Message”:“too many filtered rows”,“NumberUnselectedRows”:0,“CommitAndPublishTimeMs”:0,“Label”:“4bdffb12-e84c-4f7a-aea3-b8664b04c8fb”,“LoadBytes”:766,“StreamLoadPutTimeMs”:2,“NumberTotalRows”:1,“WriteDataTimeMs”:131,“TxnId”:373478,“LoadTimeMs”:149,“ErrorURL”:“http://172.26.92.155:8640/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c844d7d6-a0e3-e5fd-e6d0-ca10976b95bd_c844d7d6a0e3e5fd_e6d0ca10976b95bd",“ReadDataTimeMs”:0,“NumberLoadedRows”:0,"NumberFilteredRows”:1}
原因:
写starrocks的时候指定了分隔符,使用默认的即可
解决方案:
删除以下内容从导入配置中
"loadProps": {
"column_separator": "\\x01",
"row_delimiter": "\\x02"
}