【StarRocks版本】例如:2.2.2
【集群规模】例如:3fe(3 follower)+8be
【机器信息】CPU虚拟核/内存/网卡,例如:40C/192G/千兆
【附件】
业务反馈查询报错:query cancelled by crash of backends,一看发现是be挂了,通过dmesg -T发现内存没有溢出,compaction压力也不大。日志如下:
- fe.warn.log/be.warn.log/相应截图
be.out日志:
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
tcmalloc: large alloc 11671617536 bytes == 0x133410c000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9450119168 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9438371840 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 10044948480 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9479274496 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9412648960 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9378758656 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 13814677504 bytes == 0x160f044000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9409044480 bytes == 0xccfb2a000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 11062820864 bytes == 0x1e9faa0000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x248ca06 0x2480dab 0x25ccb4b 0x1e915f4 0x1e922f7 0x1e25f4b 0x1e2a68c 0x1e2aef1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 10045857792 bytes == 0x2331730000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x248ca06 0x2480dab 0x25ccb4b 0x1e915f4 0x1e922f7 0x1e25f4b 0x1e2a68c 0x1e2aef1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12884672512 bytes == 0x274abc8000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 10036322304 bytes == 0x12af374000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12883197952 bytes == 0x2a4b390000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12882952192 bytes == 0x2d4c1f0000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 9472483328 bytes == 0x304c814000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 10552287232 bytes == 0x32821be000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12883509248 bytes == 0x274abc8000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 13274619904 bytes == 0x12af374000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12883738624 bytes == 0x274abc8000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 11877982208 bytes == 0x2d4c1f0000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 12883116032 bytes == 0x2a4b390000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
tcmalloc: large alloc 10450452480 bytes == 0x2d4c1f0000 @ 0x548bc6f 0x571d4dc 0x1ecb0ee 0x566d755 0x1927134 0x22e7f66 0x248ca06 0x2709280 0x26892b9 0x263779d 0x262d2a1 0x1f6cc39 0x1f687ea 0x7fde69883ea5
terminate called after throwing an instance of ‘std::system_error’
what(): Resource temporarily unavailable
*** Aborted at 1660808634 (unix time) try “date -d @1660808634” if you are using GNU date ***
PC: @ 0x7fde68dd6387 __GI_raise
*** SIGABRT (@0x3e90000649f) received by PID 25759 (TID 0x7fdd86e2b700) from PID 25759; stack trace: ***
@ 0x3caf752 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fde6988b630 (unknown)
@ 0x7fde68dd6387 __GI_raise
@ 0x7fde68dd7a78 __GI_abort
@ 0x17b79ad _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x566d206 __cxxabiv1::__terminate()
@ 0x566d271 std::terminate()
@ 0x566d3c4 __cxa_throw
@ 0x17b9980 std::__throw_system_error()
@ 0x56e7669 std:
:_M_start_thread()
@ 0x3c9143a apache::thrift::server::TThreadedServer::onClientConnected()
@ 0x3c95db3 apache::thrift::server::TServerFramework::serve()
@ 0x3c91b6f apache::thrift::server::TThreadedServer::serve()
@ 0x1f24d6e starrocks::ThriftServer::ThriftServerEventProcessor::supervise()
@ 0x56e7590 execute_native_thread_routine
@ 0x7fde69883ea5 start_thread
@ 0x7fde68e9e9fd __clone
@ 0x0 (unknown)
be.info日志:
I0818 15:43:55.562958 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69d5c
I0818 15:43:55.562963 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69d78
I0818 15:43:55.562969 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69d78
I0818 15:43:55.562974 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69d82
I0818 15:43:55.562979 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69d82
I0818 15:43:55.562984 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69d8c
I0818 15:43:55.562989 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69d8c
I0818 15:43:55.562994 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69da5
I0818 15:43:55.563000 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69da5
I0818 15:43:55.563011 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69dd6
I0818 15:43:55.563017 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69dd6
I0818 15:43:55.563021 26166 plan_fragment_executor.cpp:367] cancel(): fragment_instance_id=c9ba48a1-1ec6-11ed-87a6-1866dae69cb0
I0818 15:43:55.563028 26166 fragment_mgr.cpp:528] FragmentMgr cancel worker going to cancel timeout fragment c9ba48a1-1ec6-11ed-87a6-1866dae69cb0
I0818 15:43:55.687985 26356 tablet_manager.cpp:598] Found the best tablet to compact. compaction_type=cumulative tablet_id=283342 highest_score=3
I0818 15:43:55.735074 26354 tablet_manager.cpp:598] Found the best tablet to compact. compaction_type=cumulative tablet_id=283986 highest_score=3
be.warn日志:
W0818 15:43:27.236636 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:27.237169 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:27.238998 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:32.240856 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:32.241294 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:32.243046 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:37.244937 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:37.245429 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:37.247303 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:42.249157 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:42.249636 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:42.251498 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:44.972816 8010 fragment_mgr.cpp:311] Retrying ReportExecStatus: No more data to read.
W0818 15:43:45.996809 8046 fragment_mgr.cpp:311] Retrying ReportExecStatus: No more data to read.
W0818 15:43:47.253315 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:47.253847 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:47.255585 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:49.452811 8049 fragment_mgr.cpp:311] Retrying ReportExecStatus: No more data to read.
W0818 15:43:52.257350 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.34, port=8000), status=Couldn’t open transport for 172.16.1.34:8000 (socket open() error: Connection refused)
W0818 15:43:52.257884 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.32, port=8000), status=Couldn’t open transport for 172.16.1.32:8000 (socket open() error: Connection refused)
W0818 15:43:52.259779 26319 broker_mgr.cpp:76] Create broker client failed. broker=TNetworkAddress(hostname=172.16.1.33, port=8000), status=Couldn’t open transport for 172.16.1.33:8000 (socket open() error: Connection refused)
W0818 15:43:53.036818 8052 fragment_mgr.cpp:311] Retrying ReportExecStatus: No more data to read.
- 慢查询:
-
Profile信息
-
并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
parallel_fragment_exec_instance_num 20 -
cbo是否开启:show variables like ‘%cbo%’;
cbo_cte_reuse false
cbo_enable_dp_join_reorder true
cbo_enable_greedy_join_reorder true
cbo_enable_low_cardinality_optimize true
cbo_enable_replicated_join true
cbo_max_reorder_node_use_dp 10
cbo_max_reorder_node_use_exhaustive 4
cbo_use_correlated_join_estimate true -
be节点cpu和内存使用率截图
使用率不高。
-
