2.5.8 单个BE节点无法响应查询

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】单个BE节点的查询都被拒绝了,使用的是同一个资源组,资源组限制了sql并发,设置为了300
首先有 exceed big query cpu limit: current is 301937700373ns but limit is 300000000000ns报错,后续有Exceed concurrency limit:300 的报错,通过统计信息看着是sql并发溢出了,但是定位不到溢出的原因。
【背景】设置资源组,使用对应账号进行查询
【StarRocks版本】2.5.8
【集群规模】例如:6fe(3 follower+3observer)+40be(独立部署)
【附件】

W0726 09:18:16.325938 41021 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.325950 41002 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326011 41003 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.325968 40990 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326035 41002 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326042 40994 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326062 41047 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326066 41008 pipeline_driver.cpp:242] pull_chunk returns not ok status Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326377 41047 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326360 40994 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326382 41021 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326388 40990 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326506 41003 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:16.326517 41008 pipeline_driver_executor.cpp:149] [Driver] Process error, query_id=ef847fdd-4aec-11ef-b831-246e960501a8, instance_id=ef847fdd-4aec-11ef-b831
-246e960501b7, status=Unknown code(48): : global dict greater than DICT_DECODE_MAX_SIZE
W0726 09:18:21.734552 42463 fragment_context.cpp:28] [Driver] Canceled, query_id=ed5ee144-4aec-11ef-b831-246e960501a8, instance_id=ed5ee144-4aec-11ef-b831-246e9605020e, reason=InternalError
W0726 09:18:21.735155 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2070210, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af206e410(X) -> project_10_0x7f2af206ee10(X) -> local_sort_sink_11_0x7f2af206f810(X)] cancels operator local_sort_sink_11_0x7f2af206f810(X) with finished error runtime state is cancelled
W0726 09:18:21.735217 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2077510, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2070710(X) -> project_10_0x7f2af2071110(X) -> local_sort_sink_11_0x7f2af2071b10(X)] cancels operator local_sort_sink_11_0x7f2af2071b10(X) with finished error runtime state is cancelled
W0726 09:18:21.735229 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2079810, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2077a10(X) -> project_10_0x7f2af2078410(X) -> local_sort_sink_11_0x7f2af2078e10(X)] cancels operator local_sort_sink_11_0x7f2af2078e10(X) with finished error runtime state is cancelled
W0726 09:18:21.735250 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af207bb10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2079d10(X) -> project_10_0x7f2af207a710(X) -> local_sort_sink_11_0x7f2af207b110(X)] cancels operator local_sort_sink_11_0x7f2af207b110(X) with finished error runtime state is cancelled
W0726 09:18:21.735260 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af207de10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af207c010(X) -> project_10_0x7f2af207ca10(X) -> local_sort_sink_11_0x7f2af207d410(X)] cancels operator local_sort_sink_11_0x7f2af207d410(X) with finished error runtime state is cancelled
W0726 09:18:21.735272 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2080110, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af207e310(X) -> project_10_0x7f2af207ed10(X) -> local_sort_sink_11_0x7f2af207f710(X)] cancels operator local_sort_sink_11_0x7f2af207f710(X) with finished error runtime state is cancelled
W0726 09:18:21.735283 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2082410, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2080610(X) -> project_10_0x7f2af2081010(X) -> local_sort_sink_11_0x7f2af2081a10(X)] cancels operator local_sort_sink_11_0x7f2af2081a10(X) with finished error runtime state is cancelled
W0726 09:18:21.735291 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2084710, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2082910(X) -> project_10_0x7f2af2083310(X) -> local_sort_sink_11_0x7f2af2083d10(X)] cancels operator local_sort_sink_11_0x7f2af2083d10(X) with finished error runtime state is cancelled
W0726 09:18:21.735299 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2086a10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2084c10(X) -> project_10_0x7f2af2085610(X) -> local_sort_sink_11_0x7f2af2086010(X)] cancels operator local_sort_sink_11_0x7f2af2086010(X) with finished error runtime state is cancelled
W0726 09:18:21.735307 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2088d10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2086f10(X) -> project_10_0x7f2af2087910(X) -> local_sort_sink_11_0x7f2af2088310(X)] cancels operator local_sort_sink_11_0x7f2af2088310(X) with finished error runtime state is cancelled
W0726 09:18:21.735320 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af208b010, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2089210(X) -> project_10_0x7f2af2089c10(X) -> local_sort_sink_11_0x7f2af208a610(X)] cancels operator local_sort_sink_11_0x7f2af208a610(X) with finished error runtime state is cancelled
W0726 09:18:21.735328 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af208d310, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af208b510(X) -> project_10_0x7f2af208bf10(X) -> local_sort_sink_11_0x7f2af208c910(X)] cancels operator local_sort_sink_11_0x7f2af208c910(X) with finished error runtime state is cancelled
W0726 09:18:21.735347 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af208f610, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af208d810(X) -> project_10_0x7f2af208e210(X) -> local_sort_sink_11_0x7f2af208ec10(X)] cancels operator local_sort_sink_11_0x7f2af208ec10(X) with finished error runtime state is cancelled
W0726 09:18:21.735354 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2091910, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af208fb10(X) -> project_10_0x7f2af2090510(X) -> local_sort_sink_11_0x7f2af2090f10(X)] cancels operator local_sort_sink_11_0x7f2af2090f10(X) with finished error runtime state is cancelled
W0726 09:18:21.735365 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2093c10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2091e10(X) -> project_10_0x7f2af2092810(X) -> local_sort_sink_11_0x7f2af2093210(X)] cancels operator local_sort_sink_11_0x7f2af2093210(X) with finished error runtime state is cancelled
W0726 09:18:21.735373 40988 pipeline_driver.cpp:573] fragment_id ed5ee144-4aec-11ef-b831-246e9605020e driver query_id=ed5ee144-4aec-11ef-b831-246e960501a8 fragment_id=ed5ee144-4aec-11ef-b831-246e9605020e driver=0x7f2af2095f10, status=INPUT_EMPTY, operator-chain: [aggregate_distinct_blocking_source_9_0x7f2af2094110(X) -> project_10_0x7f2af2094b10(X) -> local_sort_sink_11_0x7f2af2095510(X)] cancels operator local_sort_sink_11_0x7f2af2095510(X) with finished error runtime state is cancelled

@jingdan 景丹老师,这个问题有时间帮看下呗,我们也碰到了。

并发溢出吗?并发溢出的话,audit日志里面搜下这个资源组对应的查询记录数

景老师,audit日志中应该搜索什么关键字来查看呢

您那边是什么版本呢,也是2.5.8的问题吗

我们这边查看 starrocks_be_resource_group_connector_scan_use_ratio 这个指标在出现问题的时间段内维持在0,不确定是不是和外表查询有一定的关系

@jingdan 景老师,有时间了麻烦帮忙看下哈

升级到2.5.22吧,大概率已修复的问题

您好,感谢回复。请问可以确定大致是哪个PR吗,我们这边基于2.5.8进行了开发,直接升级难度比较大,非常感谢。