HIVE外部表数据写入SR 概率性异常

异常信息为: java.sql.SQLSyntaxErrorException: hdfsOpenFile failed, file=hdfs://sdpread/apps/hive/warehouse/a003_a.db/ds_detail_discounts/tenant_id=t00002/ab_partition=hot/20220827_160741_40629_y7xsm_28599aa4-bf57-440c-870c-f3ec21704d87

概率性发生错误,这个错误信息主要集中在每天凌晨00:20分钟左右。
HDFS文件路径正常

插入sql
insert into a003_a.ds_detail_discounts select shop_id,olet_id,bday_date,chks_id,citm_id,cdit_citm_id,name_level_1,name_level_0,cdis_id,cdis_name_l1,cdis_name1,enterprisecurrency666_cdis_total,propertycurrency666_cdis_total,enterprisecurrency666_cdit_round_total,propertycurrency666_cdit_round_total,enterprisecurrency666_abs_cdit_round_total,propertycurrency666_abs_cdit_round_total,cdit_count,cdis_count,cdis_dtyp_id,chks_check_prefix_num,chks_open_loctime,chks_open_user_id,chks_print_count,propertycurrency666_chks_check_total,enterprisecurrency666_chks_check_total,chks_non_revenue,shop_name_l1,olet_name_l1,check_emplyee,bper_name,pdtp_name_l1,pdtp_seq,cdis_status,chks_status,bday_end_loctime,bday_status,discount_classfiy,dtyp_rate,citm_name,citm_code,order_type,cdis_apply_loctime,cdis_apply_time,cdis_apply_user_id,cdis_apply_user_name,tenant_id,if(ab_partition = ‘hot’,2,0) as ab_partition from hive_ext.a003_ds_detail_discounts where bday_date >= ‘2022-08-20’

您好,麻烦发一下那个时间点的fe master的info日志看下

大佬 ,fe_1.log (7.5 MB) 这是对应时间段的fe.log 报错时间主要在00:25:27左右
另外还有好些Broken pipe的问题,这个需要怎么处理

hdfs稳定吗?

稳定,都跑了三四年了,都没出过问题

单独执行selec外表能查出来吗?看了日志里面有访问超时的日志,麻烦帮忙发下fe.warn和fe.out文件吧

单独执行是没问题的,外部表都是正常的,这个导入这个是概率事件,不知道是否和fe be混部有关系,引起资源抢占。
fe.out没有这部分日志,
warn的日志
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14019697] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14019705] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023582] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023590] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023495] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023503] from backend[10009] because not found in meta
2022-08-28 00:25:23,927 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023478] from backend[10009] because not found in meta
2022-08-28 00:25:23,928 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14023486] from backend[10009] because not found in meta
2022-08-28 00:25:24,683 WARN (thrift-server-pool-219165|442741) [MasterImpl.finishTask():194] cannot find task. type: PUBLISH_VERSION, backendId: 10053, signature: 733601
2022-08-28 00:25:27,911 WARN (thrift-server-pool-218901|433144) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode INTERNAL_ERROR hdfsOpenFile failed, file=hdfs://sdpread/apps/hive/warehouse/a003_a.db/ds_detail_discounts/tenant_id=t00002/ab_partition=hot/20220827_160741_40629_y7xsm_28599aa4-bf57-440c-870c-f3ec21704d87, query_id=dc7dbf74-2624-11ed-bae0-00163e0ee64e instance_id=dc7dbf74-2624-11ed-bae0-00163e0ee653
2022-08-28 00:25:27,911 WARN (thrift-server-pool-218901|433144) [Coordinator.updateStatus():872] one instance report fail throw updateStatus(), need cancel. job id: -1, query id: dc7dbf74-2624-11ed-bae0-00163e0ee64e, instance id: dc7dbf74-2624-11ed-bae0-00163e0ee653
2022-08-28 00:25:27,911 WARN (thrift-server-pool-7690|9387) [StmtExecutor.handleDMLStmt():1182] insert failed: hdfsOpenFile failed, file=hdfs://sdpread/apps/hive/warehouse/a003_a.db/ds_detail_discounts/tenant_id=t00002/ab_partition=hot/20220827_160741_40629_y7xsm_28599aa4-bf57-440c-870c-f3ec21704d87
2022-08-28 00:25:27,912 WARN (thrift-server-pool-7690|9387) [StmtExecutor.handleDMLStmt():1258] handle insert stmt fail: insert_dc7dbf74-2624-11ed-bae0-00163e0ee64e
com.starrocks.common.DdlException: hdfsOpenFile failed, file=hdfs://sdpread/apps/hive/warehouse/a003_a.db/ds_detail_discounts/tenant_id=t00002/ab_partition=hot/20220827_160741_40629_y7xsm_28599aa4-bf57-440c-870c-f3ec21704d87
at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:80) ~[starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1183) [starrocks-fe.jar:?]
at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:438) [starrocks-fe.jar:?]
at com.starrocks.qe.ConnectProcessor.proxyExecute(ConnectProcessor.java:586) [starrocks-fe.jar:?]
at com.starrocks.service.FrontendServiceImpl.forward(FrontendServiceImpl.java:737) [starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:2071) [starrocks-fe.jar:?]
at com.starrocks.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:2051) [starrocks-fe.jar:?]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
at com.starrocks.common.SRTThreadPoolServer$WorkerProcess.run(SRTThreadPoolServer.java:310) [starrocks-fe.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
2022-08-28 00:25:29,724 WARN (thrift-server-pool-219153|442717) [MasterImpl.finishTask():194] cannot find task. type: PUBLISH_VERSION, backendId: 10009, signature: 733621
2022-08-28 00:25:29,801 WARN (thrift-server-pool-207936|420080) [MasterImpl.finishTask():194] cannot find task. type: PUBLISH_VERSION, backendId: 10009, signature: 733607
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14034203] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14034211] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14018499] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14018491] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14018743] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14018735] from backend[10054] because not found in meta
2022-08-28 00:25:33,633 WARN (Thread-40|85) [ReportHandler.addDropReplicaTask():783] delete tablet[14018675] from backend[10054] because not found in meta

您当前使用的什么版本?hdfs://sdpread/apps/hive/warehouse/a003_a.db/ds_detail_discounts/tenant_id=t00002/ab_partition=hot/20220827_160741_40629_y7xsm_28599aa4-bf57-440c-870c-f3ec21704d87这个文件中有数据吗?

SR 2.3.0 hadoop是3.1.2 里面都有数据

1赞

有个问题啊,就是为什么这个stacktrace里面不是starrocks的class, 看上去是比如 sdp.bi.datax.StarRocksDao 这样的?是你们发送SQL到starrocks集群吗?

你方便在SR集群上单独执行某个select/query语句吗?然后看看BE.INFO日志里面的错误信息?

java 通过jdbc去执行sql

1赞

同样的insert into select from 外部表 不一定会报错,目前半个多月跑了上万次这样的场景,总共就报了两次这样的问题

你能看看BE.INFO吗?搜索一下hdfsOpenFileFailed, 看看有没有相关错误能帮助定位的。是不是这个文件都是最近增加进来的?

这个问题最近又出现了,还出现的比较多,最近增加了不少外部表导入操作

SR 版本是2.3.2

文件都是组件新增加进来的,但是目前就只有这两个表问题,这两个表也没什么特殊性

BE日志信息