starrocks 关联hudi表死锁

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】starrocks内表死锁,stream load写入,同时读内表和hudi表关联,我的问题是读hudi表元数据的时候不需要锁内部表啊
【背景】
【业务影响】无法读取元数据,必须重启fe
【是否存算分离】是
【StarRocks版本】3.3.0
【集群规模】例如:1fe(1 follower+2observer)+3cn
【机器信息】
【联系方式】
【附件】
2024-07-11 03:08:37.269+08:00 WARN (nioEventLoopGroup-8-4|231) [LockManager.logSlowLockTrace():398] LockManager detects slow lock : {“owners”:[{“id”:323649,
“name”:“starrocks-mysql-nio-pool-1759”,
“heldFor”:6161,
“waitTime”:0,
“stack”:[“java.base@11.0.23/jdk.internal.misc.Unsafe.park(Native Method)”,
“java.base@11.0.23/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)”,
“java.base@11.0.23/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447)”,
“java.base@11.0.23/java.util.concurrent.FutureTask.get(FutureTask.java:190)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFiles(RemoteFileOperations.java:119)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFiles(RemoteFileOperations.java:85)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFileInfoForStats(RemoteFileOperations.java:154)”,
“app//com.starrocks.connector.hive.HiveStatisticsProvider.getEstimatedRowCount(HiveStatisticsProvider.java:152)”,
“app//com.starrocks.connector.hive.HiveStatisticsProvider.getTableStatistics(HiveStatisticsProvider.java:102)”,
“app//com.starrocks.connector.hudi.HudiMetadata.getTableStatistics(HudiMetadata.java:171)”,
“app//com.starrocks.connector.CatalogConnectorMetadata.getTableStatistics(CatalogConnectorMetadata.java:173)”,
“app//com.starrocks.server.MetadataMgr.lambda$getTableStatistics$10(MetadataMgr.java:691)”,
“app//com.starrocks.server.MetadataMgr$$Lambda$3483/0x0000000841451440.apply(Unknown Source)”,
“java.base@11.0.23/java.util.Optional.map(Optional.java:265)”,
“app//com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:691)”,
“app//com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:706)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeHMSTableScanNode(StatisticsCalculator.java:590)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHudiScan(StatisticsCalculator.java:558)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHudiScan(StatisticsCalculator.java:176)”,
“app//com.starrocks.sql.optimizer.operator.logical.LogicalHudiScanOperator.accept(LogicalHudiScanOperator.java:79)”]},
{“id”:323654,
“name”:“starrocks-mysql-nio-pool-1761”,
“heldFor”:6078,
“waitTime”:0,
“stack”:[“java.base@11.0.23/jdk.internal.misc.Unsafe.park(Native Method)”,
“java.base@11.0.23/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)”,
“java.base@11.0.23/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447)”,
“java.base@11.0.23/java.util.concurrent.FutureTask.get(FutureTask.java:190)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFiles(RemoteFileOperations.java:119)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFiles(RemoteFileOperations.java:85)”,
“app//com.starrocks.connector.RemoteFileOperations.getRemoteFileInfoForStats(RemoteFileOperations.java:154)”,
“app//com.starrocks.connector.hive.HiveStatisticsProvider.getEstimatedRowCount(HiveStatisticsProvider.java:152)”,
“app//com.starrocks.connector.hive.HiveStatisticsProvider.getTableStatistics(HiveStatisticsProvider.java:102)”,
“app//com.starrocks.connector.hudi.HudiMetadata.getTableStatistics(HudiMetadata.java:171)”,
“app//com.starrocks.connector.CatalogConnectorMetadata.getTableStatistics(CatalogConnectorMetadata.java:173)”,
“app//com.starrocks.server.MetadataMgr.lambda$getTableStatistics$10(MetadataMgr.java:691)”,
“app//com.starrocks.server.MetadataMgr$$Lambda$3483/0x0000000841451440.apply(Unknown Source)”,
“java.base@11.0.23/java.util.Optional.map(Optional.java:265)”,
“app//com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:691)”,
“app//com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:706)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeHMSTableScanNode(StatisticsCalculator.java:590)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHudiScan(StatisticsCalculator.java:558)”,
“app//com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHudiScan(StatisticsCalculator.java:176)”,
“app//com.starrocks.sql.optimizer.operator.logical.LogicalHudiScanOperator.accept(LogicalHudiScanOperator.java:79)”]}],
“waiter”:[{“id”:231,
“name”:“nioEventLoopGroup-8-4”,
“heldFor”:"",
“waitTime”:3000,
“locker”:“GlobalTransactionMgr.commitPreparedTransaction():309”},
{“id”:52,
“name”:“autovacuum”,
“heldFor”:"",
“waitTime”:2739,
“locker”:“AutovacuumDaemon.vacuumTable():97”}]}

关联查询现象是什么,查询卡住还是无法正常写入,问题需要描述清楚一点
另外提供下 fe.log 对应时间的日志

dead_lock.log (173.5 KB)

上面是warn的日志
下面是fe的日志
fe.log (2.3 MB)

在12点08的时候,我们有3个并发查询,join 内部表和hudi表,死锁了。
持续1-2个小时,这个表无法写入也无法读取,其他表可以

这个是偶发死锁,大概5天一次,只能通过重启fe解决,无法通过kill query解决

这个内部表是通过flink connector 实时写入starrocks