sr-2.3.2访问hive外表报 HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript

【详述】使用StarRocks2.3.2 读取hive外表(orc format),计算后insert into sr 内表,
报HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript 错误
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】2.3.2
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:40C/384G/千兆

  • 查询日志:
    one instance report fail errorCode INTERNAL_ERROR HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs:xxxx …
    【附件】

  • fe.log/beINFO
    2022-11-18 00:31:55,224 INFO (tablet scheduler|43) [TabletScheduler.adjustPriorities():385] adjust priority for all tablets. changed: 0, total: 0
    2022-11-18 00:31:55,349 WARN (thrift-server-pool-1657|14523) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode INTERNAL_ERROR HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb7721
    2022-11-18 00:31:55,349 WARN (thrift-server-pool-1657|14523) [Coordinator.updateStatus():872] one instance report fail throw updateStatus(), need cancel. job id: -1, query id: 5939490a-6695-11ed-97dc-f0d4e2eb7718, instance id: 5939490a-6695-11ed-97dc-f0d4e2eb7721
    2022-11-18 00:31:55,350 WARN (starrocks-mysql-nio-pool-9775|16510) [StmtExecutor.handleDMLStmt():1152] insert failed: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
    2022-11-18 00:31:55,350 INFO (thrift-server-pool-1657|14523) [Coordinator.cancelInternal():968] unfinished instance: 5939490a-6695-11ed-97dc-f0d4e2eb7719
    2022-11-18 00:31:55,351 WARN (thrift-server-pool-865|8281) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode CANCELLED Cancelled SenderQueue::get_chunk, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb7722
    2022-11-18 00:31:55,350 WARN (starrocks-mysql-nio-pool-9775|16510) [StmtExecutor.handleDMLStmt():1228] handle insert stmt fail: insert_5939490a-6695-11ed-97dc-f0d4e2eb7718
    com.starrocks.common.DdlException: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
    at com.starrocks.common.ErrorReport.reportDdlException(ErrorReport.java:80) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.handleDMLStmt(StmtExecutor.java:1153) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:430) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:321) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:439) ~[starrocks-fe.jar:?]
    at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:675) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:55) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_281]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_281]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_281]
    2022-11-18 00:31:55,353 WARN (thrift-server-pool-1919|16285) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode CANCELLED Cancelled SenderQueue::get_chunk, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb7719
    2022-11-18 00:31:55,353 WARN (thrift-server-pool-344|2986) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode CANCELLED Cancelled SenderQueue::get_chunk, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb771a
    2022-11-18 00:31:55,354 WARN (thrift-server-pool-1379|11688) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode CANCELLED Cancelled SenderQueue::get_chunk, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb771c
    2022-11-18 00:31:55,354 WARN (thrift-server-pool-1392|12675) [Coordinator.updateFragmentExecStatus():1648] one instance report fail errorCode CANCELLED Cancelled SenderQueue::get_chunk, query_id=5939490a-6695-11ed-97dc-f0d4e2eb7718 instance_id=5939490a-6695-11ed-97dc-f0d4e2eb771b
    2022-11-18 00:31:55,355 INFO (starrocks-mysql-nio-pool-9775|16510) [QeProcessorImpl.unregisterQuery():79] deregister query id 5939490a-6695-11ed-97dc-f0d4e2eb7718
    2022-11-18 00:31:55,758 INFO (thrift-server-pool-8|368) [ReportHandler.handleReport():225] receive report from be 10006. type: WORKGROUP_REPORT, current queue size: 1
    2022-11-18 00:31:56,562 INFO (thrift-server-pool-13|375) [ReportHandler.handleReport():225] receive report from be 10003. type: TASK_REPORT, current queue size: 1
    2022-11-18 00:31:56,562 INFO (Thread-48|107) [ReportHandler.taskReport():391] begin to handle task report from backend 10003
    2022-11-18 00:31:56,562 INFO (Thread-48|107) [ReportHandler.taskReport():431] finished to handle task report from backend 10003, diff task num: 0. cost: 0 ms
    2022-11-18 00:31:57,028 INFO (thrift-server-pool-12|374) [ReportHandler.handleReport():225] receive report from be 10003. type: WORKGROUP_REPORT, current queue size: 1
    2022-11-18 00:31:57,159 INFO (thrift-server-pool-6|366) [ReportHandler.handleReport():225] receive report from be 10005. type: TASK_REPORT, current queue size: 1
    2022-11-18 00:31:57,159 INFO (Thread-48|107) [ReportHandler.taskReport():391] begin to handle task report from backend 10005
    2022-11-18 00:31:57,159 INFO (Thread-48|107) [ReportHandler.taskReport():431] finished to handle task report from backend 10005, diff task num: 0. cost: 0 ms
    2022-11-18 00:31:57,419 INFO (Routine load scheduler|49) [RoutineLoadScheduler.process():70] there are 0 job need schedule
    2022-11-18 00:31:58,047 INFO (Thread-48|107) [ReportHandler.taskReport():391] begin to handle task report from backend 10002
    2022-11-18 00:31:58,047 INFO (thrift-server-pool-19|381) [ReportHandler.handleReport():225] receive report from be 10002. type: TASK_REPORT, current queue size: 1
    2022-11-18 00:31:58,047 INFO (Thread-48|107) [ReportHandler.taskReport():431] finished to handle task report from backend 10002, diff task num: 0. cost: 0 ms
    2022-11-18 00:31:58,281 INFO (thrift-server-pool-2|362) [ReportHandler.handleReport():225] receive report from be 10004. type: WORKGROUP_REPORT, current queue size: 1
    2022-11-18 00:31:58,292 INFO (thrift-server-pool-16|378) [ReportHandler.handleReport():225] receive report from be 10002. type: WORKGROUP_REPORT, current queue size: 1

  • be crash

    • be.out
      E1118 00:27:36.505091 155431 olap_scan_node.cpp:266] [TUniqueId(hi=-4683462608214027795, lo=-7503858081975273704)] Cancelled: canceled state
      W1118 00:27:36.508373 155583 fragment_mgr.cpp:180] Fail to open fragment bf00ff92-6694-11ed-97dc-f0d4e2eb7722: Cancelled: Cancelled because of runtime state is cancelled
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
      W1118 00:30:16.193965 155871 engine_storage_migration_task.cpp:100] could not migration because has unfinished txns.
      W1118 00:30:16.194016 155871 task_worker_pool.cpp:1125] storage migrate failed. status:Internal error: could not migration because has unfinished txns., signature:20099
      W1118 00:31:55.347959 155454 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
      W1118 00:31:55.348121 155618 plan_fragment_executor.cpp:185] Fail to open fragment, instance_id=5939490a-6695-11ed-97dc-f0d4e2eb7721, status=Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
      /root/starrocks/be/src/connector/hive_connector.cpp:66 _init_scanner(state)
      /root/starrocks/be/src/exec/vectorized/connector_scan_node.cpp:63 _data_source->open(state)
      /root/starrocks/be/src/exec/vectorized/aggregate/aggregate_streaming_node.cpp:53 _children[0]->get_next(state, &input_chunk, &_child_eos)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
      W1118 00:31:55.350478 155618 fragment_mgr.cpp:180] Fail to open fragment 5939490a-6695-11ed-97dc-f0d4e2eb7721: Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
      /root/starrocks/be/src/connector/hive_connector.cpp:66 _init_scanner(state)
      /root/starrocks/be/src/exec/vectorized/connector_scan_node.cpp:63 _data_source->open(state)
      /root/starrocks/be/src/exec/vectorized/aggregate/aggregate_streaming_node.cpp:53 _children[0]->get_next(state, &input_chunk, &_child_eos)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:326 _plan->get_next(_runtime_state, &_chunk, &_done)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:214 _get_next_internal_vectorized(&chunk)
      W1118 00:31:55.351557 155586 fragment_mgr.cpp:180] Fail to open fragment 5939490a-6695-11ed-97dc-f0d4e2eb7722: Cancelled: Cancelled SenderQueue::get_chunk
      /root/starrocks/be/src/exec/exchange_node.cpp:132 _stream_recvr->get_chunk(&_input_chunk)
      /root/starrocks/be/src/exec/vectorized/aggregate/aggregate_blocking_node.cpp:44 _children[0]->get_next(state, &chunk, &eos)
      /root/starrocks/be/src/runtime/plan_fragment_executor.cpp:200 _plan->open(_runtime_state)
      W1118 00:31:55.355001 155597 fragment_mgr.cpp:180] Fail to open fragment 5939490a-6695-11ed-97dc-f0d4e2eb7719: Cancelled: Cancelled SenderQueue::get_chunk
      /root/starrocks/be/src/exec/exchange_node.cpp:132 _stream_recvr->get_chunk(&_input_chunk)
      /root/starrocks/be/src/exec/vectorized/hash_join_node.cpp:227 child(1)->get_next(state, &chunk, &eos)

另外,其他表的查询也be 也存在大量 的 HdfsOrcScanner::do_open failed. reason = Failed to read 错误日志
W1117 01:05:18.972498 155444 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000029_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.972828 155419 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000020_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.972959 155427 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000010_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.973002 155422 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000063_0
W1117 01:05:18.973290 155420 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000003_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.973516 155424 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000026_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.973688 155463 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000023_0: IO error: cannot read fully
/root/starrocks/be/src/exec/vectorized/hdfs_scanner.cpp:39 _stream->read_at_fully(offset, data, size)
W1117 01:05:18.973898 155442 hdfs_scanner_orc.cpp:381] HdfsOrcScanner::do_open failed. reason = Failed to read hdfs://ns/apps/hive/warehouse/dwd.db/dwd_f_burying_point_log_last_day/000007_0: IO error: cannot read fully

补充:
建外表时,不是使用Catalog方式,而是 CREATE EXTERNAL TABLE
drop table if exists xw_external_datasource.dwd_xxxx;
CREATE EXTERNAL TABLE xw_external_datasource.dwd_xxxx (
fid bigint comment ‘’,
fmoneyuseid bigint comment ‘’,
fserviceid bigint comment ‘’,
fproductid bigint comment ‘’,
fproductinst bigint comment ‘’,
fstatus bigint comment ‘序号’
) ENGINE=HIVE
PROPERTIES (
“resource” = “xw_hive”,
“database” = “dwd”,
“table” = “dwd_xxx”
);
进行建hive外表

可以看下是不是。对应目录没有权限。

是OS的目录还是 hdfs的目录?

对应的hdfs有权限么?

对应的目录和文件是 有权限的

之前能够成功访问hive外部表么?hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0:file = hdfs://ns/apps/hive/warehouse/dwd.db/dwd_service_schema_a/000000_0
我看这个hdfs路径有点奇怪。咱们将 hdfs-site.xml 还有 core-site.xml放到对应fe/conf还有be/cof目录下了么?

fe和be的conf目录都有的

fe/conf目录下的文件
[root@sz-sr-204-43 fe]# pwd
/data1/service/bigdata/starrocks/fe

[root@sz-sr-204-43 fe]# cd conf

[root@sz-sr-204-43 conf]# ll -h
总用量 20K
lrwxrwxrwx 1 deploy deploy 30 11月 3 19:54 core-site.xml -> /etc/hadoop/conf/core-site.xml
-rw-rw-r-- 1 deploy deploy 3.9K 11月 3 19:53 fe.conf
-rw-rw-r-- 1 deploy deploy 2.9K 11月 3 19:51 fe.conf.bak
-rw-rw-r-- 1 deploy deploy 904 11月 3 19:51 hadoop_env.sh
lrwxrwxrwx 1 deploy deploy 30 11月 3 19:54 hdfs-site.xml -> /etc/hadoop/conf/hdfs-site.xml
-rw-r–r-- 1 deploy deploy 7.0K 11月 3 19:53 hive-site.xml

be/conf 目录下的文件
[root@sz-sr-204-43 starrocks]# cd be/conf/

[root@sz-sr-204-43 conf]# ll -h
总用量 20K
-rw-r–r-- 1 deploy deploy 3.0K 11月 3 20:03 be.conf
-rw-r–r-- 1 deploy deploy 1.8K 11月 3 19:51 be.conf.bak
-rw-r–r-- 1 deploy deploy 1.5K 11月 3 19:51 cn.conf
lrwxrwxrwx 1 deploy deploy 30 11月 3 19:55 core-site.xml -> /etc/hadoop/conf/core-site.xml
-rw-r–r-- 1 deploy deploy 904 11月 3 19:51 hadoop_env.sh
lrwxrwxrwx 1 deploy deploy 30 11月 3 19:55 hdfs-site.xml -> /etc/hadoop/conf/hdfs-site.xml
-rw-r–r-- 1 deploy deploy 424 11月 3 19:51 log4j.properties

放入到了所有的fe be conf下了么?将该文件放入conf之后重启集群了吗?

有的,集群是从2.0.x -> 2.1.x -> 2.2.x -> 2.3.x 升级过来的,hdfs的两个配置文件很早就有了

查询失败的看下be.out日志

@U_1645581278484_2861这个表有没有做过insert overwrite,试下refresh table是否还有问题。https://docs.starrocks.io/zh-cn/2.2/data_source/External_table#手动更新元数据缓存

Hello:

一般出现这类parse footer出错的问题,根源在于对于FE下发给BE的hdfs文件长度不对,导致读取偏移错误,所以没有办法解析出footer. 可以读取,意味着对HDFS有读取权限,权限这块应该是没有问题的。

可以试试refresh table看看是否还有问题,如果有分区表的话,还需要refresh下面的分区。

refresh的时候也看看FE日志,看看refresh操作是否成功,因为也有可能访问hive metastore失败导致我refresh不成功的case. 如果refresh不成功,看看是否有访问HMS的权限,以及HMS那边压力是否过大。

1赞

Hello, 有什么进展吗?

Hello. Is there any update?

我看BE没有crash吧,只是查询的时候报错了。

请问这个问题后面是怎么解决的呢。我在2.2.9版本,查询Hive外表遇到了相同的报错

Hive表几个小时前有过数据更新。SR第一次查询会报 HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from XXX,第二次再查就能成功,或者refresh也能成功。但不应该两个小时刷新一次元数据么,为什么还会出现该报错呢