SR 读取 hive catalog 表, HdfsOrcScanner报错

【详述】偶现HdfsOrcScanner::do_open failed. 文件特征是,基于hdfs 的文件,不是基于oss 的文件。

【背景】sr 从 3.2.8 升级到 3.3.2, 读取hive catalog ,
java.lang.ClassNotFoundException: Class com.aliyun.jindodata.oss.JindoOssFileSystem not found, 于是从 github 上, 下载了两个jindo的 jar 包。 jindo-core-6.5.4.jar, jindo-sdk-6.5.4.jar
【业务影响】
【是否存算分离】是
【StarRocks版本】3.3.2

SQL 错误 [1064] [42000]: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0

  • fe.log/beINFO/相应截图
    `2024-08-22 14:00:09.248+08:00 WARN (thrift-server-pool-20764|97849) [DefaultCoordinator.updateFragmentExecStatus():948] exec state report failed status=errorCode INTERNAL_ERROR HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0, query_id=c86487f9-604b-11ef-9493-02425594f25c, instance_id=c86487f9-604b-11ef-9493-02425594f27b, backend_id=12221

2024-08-22 14:00:09.254+08:00 WARN (starrocks-mysql-nio-pool-1165|99206) [DefaultCoordinator.getNext():813] query failed: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0

be.warning 日志

E20240822 14:00:09.206510 140514882959104 scan_operator.cpp:436] scan fragment c86487f9-604b-11ef-9493-02425594f265 driver 0 Scan tasks error: Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0
W20240822 14:00:09.233849 140515436877568 pipeline_driver.cpp:315] pull_chunk returns not ok status Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0
W20240822 14:00:09.233849 140515436877568 pipeline_driver.cpp:315] pull_chunk returns not ok status Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0
W20240822 14:00:09.233940 140515436877568 pipeline_driver_executor.cpp:168] [Driver] Process error, query_id=c86487f9-604b-11ef-9493-02425594f25c, instance_id=c86487f9-604b-11ef-9493-02425594f265, status=Internal error: HdfsOrcScanner::do_open failed. reason = Failed to parse the postscript from hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0: file = hdfs://emr-header-1.cluster-325063:9000/user/hive/warehouse/eva.db/wish_exp_num_day/day=2024-08-22/000000_0 
  • 外表查询报错
    • be.out和fe.warn.log

是不是表在更新

有可能, 有的是更新几分钟之后还不能查,这个有解决办法么

查之前手动 refresh external table 就行了