be节点轮流频繁宕机

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
集群在3.1.15版本时,3台be节点频繁轮流宕机,在另一个帖子(https://forum.mirrorship.cn/t/topic/19398)中回复说是版本bug,建议打补丁或升级至3.3.20。然后进行升级处理,升级至3.5.12版本,但是运行下来,频繁宕机问题依然存在。以下是其中一台be的最近的be.out日志:
start time: Thu Mar 5 12:24:01 CST 2026, server uptime: 12:24:01 up 105 days, 14:29, 1 user, load average: 2.67, 7.15, 10.11
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1772694575 (unix time) try “date -d @1772694575” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 25255 (TID 0x14817affb700) LWP(25498) from PID 256; stack trace: ***
@ 0x1481948e720b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x1481948f0630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x1481948e8ea5 start_thread
@ 0x1481937a5b0d __clone
[1772694575.459][thread: 22546346915584] je_mallctl execute purge success
[1772694575.459][thread: 22546346915584] je_mallctl execute dontdump success
start time: Thu Mar 5 15:15:01 CST 2026, server uptime: 15:15:01 up 105 days, 17:20, 1 user, load average: 791.07, 707.17, 332.49
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
start time: Thu Mar 5 19:16:02 CST 2026, server uptime: 19:16:02 up 3 min, 2 users, load average: 0.03, 0.05, 0.01
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1772757305 (unix time) try “date -d @1772757305” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 3433 (TID 0x15325f5ee700) LWP(3680) from PID 256; stack trace: ***
@ 0x15327da3e20b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x15327da47630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x15327da3fea5 start_thread
@ 0x15327b61cb0d __clone
start time: Fri Mar 6 08:36:02 CST 2026, server uptime: 08:36:02 up 13:19, 0 users, load average: 1.68, 5.45, 6.46
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1772838004 (unix time) try “date -d @1772838004” if you are using GNU date ***
PC: @ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
*** SIGSEGV (@0xc0) received by PID 571649 (TID 0x151d72dff700) LWP(571877) from PID 192; stack trace: ***
@ 0x151d8ef1920b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x151d8ef22630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
@ 0x84bf984 starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x151d8ef1aea5 start_thread
@ 0x151d8ddd7b0d __clone
start time: Sat Mar 7 07:01:01 CST 2026, server uptime: 07:01:01 up 1 day, 11:44, 0 users, load average: 2.21, 4.46, 4.59
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1772965145 (unix time) try “date -d @1772965145” if you are using GNU date ***
PC: @ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
*** SIGSEGV (@0xc0) received by PID 1645795 (TID 0x154f9bffe700) LWP(1645964) from PID 192; stack trace: ***
@ 0x154fb524020b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x154fb5249630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
@ 0x84bf984 starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x154fb5241ea5 start_thread
@ 0x154fb40feb0d __clone
start time: Sun Mar 8 18:20:01 CST 2026, server uptime: 18:20:01 up 2 days, 23:03, 0 users, load average: 2.37, 5.92, 6.15
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
start time: Mon Mar 9 19:02:18 CST 2026, server uptime: 19:02:18 up 3 days, 23:45, 1 user, load average: 6.74, 10.83, 12.60
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1773094564 (unix time) try “date -d @1773094564” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 196438 (TID 0x147c67bfe700) LWP(196613) from PID 256; stack trace: ***
@ 0x147c8133920b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x147c81342630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x147c8133aea5 start_thread
@ 0x147c7ef17b0d __clone
start time: Tue Mar 10 06:17:01 CST 2026, server uptime: 06:17:01 up 4 days, 11:00, 0 users, load average: 1.21, 5.66, 7.53
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1773137585 (unix time) try “date -d @1773137585” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 660754 (TID 0x14db2cffa700) LWP(660923) from PID 256; stack trace: ***
@ 0x14db4672c20b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x14db46735630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x14db4672dea5 start_thread
@ 0x14db455eab0d __clone
start time: Tue Mar 10 18:14:01 CST 2026, server uptime: 18:14:01 up 4 days, 22:57, 0 users, load average: 1.24, 5.73, 6.81
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1773249394 (unix time) try “date -d @1773249394” if you are using GNU date ***
PC: @ 0x84bf7ac starrocks::RowsetWriter::build()
*** SIGSEGV (@0x100) received by PID 1310200 (TID 0x1500009fc700) LWP(1310431) from PID 256; stack trace: ***
@ 0x15001c47520b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x15001c47e630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x84bf7ac starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x15001c476ea5 start_thread
@ 0x15001b333b0d __clone
start time: Thu Mar 12 01:17:02 CST 2026, server uptime: 01:17:02 up 6 days, 6:00, 0 users, load average: 3.02, 8.13, 10.86
Run with JEMALLOC_CONF: ‘percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false’
3.5.12 RELEASE (build a2e4b58 distro centos arch x86_64)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000, plan_node_id:-1
*** Aborted at 1773283205 (unix time) try “date -d @1773283205” if you are using GNU date ***
PC: @ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
*** SIGSEGV (@0xc0) received by PID 2751228 (TID 0x14e7dcbfd700) LWP(2751399) from PID 192; stack trace: ***
@ 0x14e7f73b420b __pthread_once_slow
@ 0xbe2b194 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0x14e7f73bd630 (/usr/lib64/libpthread-2.17.so+0xf62f)
@ 0x79d8a67 starrocks::RowsetMeta::RowsetMeta(std::unique_ptr<starrocks::RowsetMetaPB, std::default_deletestarrocks::RowsetMetaPB >&)
@ 0x84bf984 starrocks::RowsetWriter::build()
@ 0x84c8ea4 starrocks::HorizontalRowsetWriter::build()
@ 0x7b489f6 starrocks::DeltaWriter::commit()
@ 0x7767cc3 starrocks::SegmentFlushTask::run()
@ 0x45d2a07 starrocks::ThreadPool::dispatch_thread()
@ 0x45c8ef0 starrocks::thread::supervise_thread(void*)
@ 0x14e7f73b5ea5 start_thread
@ 0x14e7f6272b0d __clone
start time: Thu Mar 12 10:41:01 CST 2026, server uptime: 10:41:01 up 6 days, 15:24, 0 users, load average: 2.81, 4.35, 6.25
Run with JEMALLOC_CONF: 'percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:5000,dirty_decay_ms:5000,metadata_thp:auto,background_thread:true,prof:true,prof_active:false
【背景】从3.1.15升级至3.5.12
【业务影响】
【是否存算分离】否
【StarRocks版本】3.5.12
【集群规模】3fe + 3be ,独立部署
【机器信息】fe是40核256G内存,千兆网卡。be是48核,1台256G,2台512G,千兆网卡
【联系方式】社区群24-Hanson,谢谢

宕机时间段系统mesages日志有什么异常报错吗?宕机前五分钟内的be.INFO有报错吗?

今天的be节点messages日志中有如下报错:
Mar 12 01:16:34 xxx-151 kernel: segment_flush[1310431]: segfault at 100 ip 00000000084bf7ac sp 00001500009ec360 error 6 cpu 7 in starr
ocks_be[400000+142ca000]
Mar 12 01:16:34 lf-starrocks-09-bigdata-x-x-x-151 kernel: Code: ff 90 90 00 00 00 48 83 bd 70 ff ff ff 00 0f 85 9b 07 00 00 4c 89 ef e8 72 07 e3 fb 49 8b 87 b0
02 00 00 4d 8b a7 60 01 00 00 <49> 89 84 24 00 01 00 00 49 8b 87 e0 02 00 00 49 89 84 24 40 01 00

Mar 12 10:40:05 xxx-151 kernel: segment_flush[2751399]: segfault at c0 ip 00000000079d8a67 sp 000014e7dcbed2d0 error 4 cpu 10 in starr
ocks_be[400000+142ca000]
Mar 12 10:40:05 lf-starrocks-09-bigdata-x-x-x-151 kernel: Code: 00 00 00 c5 fa 7f 47 18 c5 fa 7f 47 38 48 c7 07 00 00 00 00 48 c7 47 30 00 00 00 00 66 89 47 48
48 8b 3e 48 c7 06 00 00 00 00 <4c> 8b a7 c0 00 00 00 48 89 7b 08 4d 85 e4 0f 8e f5 03 00 00 48 b8

另外两个be节点,有一个也是一样的错误,有一个没有错误。但是节点也发生过重启。

1点16重启前后的be.info日志如下:
W20260312 01:08:04.918936 23086996846336 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=d0b85261-1d6c-11f1-9058-a0369fd7e4a8, txn_id: 105351285, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Aborted: load channel: d
0b85261-1d6c-11f1-9058-a0369fd7e4a8 was aborted at 1773248867, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 62 in PTabletWriterAddChunkRequest already process: BE:3397791: BE:3709019
W20260312 01:08:04.919096 23087038523136 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=d0b85261-1d6c-11f1-9058-a0369fd7e4a8, txn_id: 105351285, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Aborted: load channel: d
0b85261-1d6c-11f1-9058-a0369fd7e4a8 was aborted at 1773248867, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 62 in PTabletWriterAddChunkRequest already process: BE:3397791: BE:3709019
W20260312 01:08:04.919064 23087032563456 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=d0b85261-1d6c-11f1-9058-a0369fd7e4a8, txn_id: 105351285, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Aborted: load channel: d
0b85261-1d6c-11f1-9058-a0369fd7e4a8 was aborted at 1773248867, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 62 in PTabletWriterAddChunkRequest already process: BE:3397791: BE:3709019
W20260312 01:08:04.919124 23087038523136 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=d0b85261-1d6c-11f1-9058-a0369fd7e4a8, txn_id: 105351285, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Aborted: load channel: d
0b85261-1d6c-11f1-9058-a0369fd7e4a8 was aborted at 1773248867, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 62 in PTabletWriterAddChunkRequest already process: BE:3397791: BE:3709019
W20260312 01:08:04.919039 23087057434368 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=d0b85261-1d6c-11f1-9058-a0369fd7e4a8, txn_id: 105351285, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Aborted: load channel: d
0b85261-1d6c-11f1-9058-a0369fd7e4a8 was aborted at 1773248867, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 62 in PTabletWriterAddChunkRequest already process: BE:3397791: BE:3709019
W20260312 01:08:04.974992 23085603661568 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=4011499473470 fd=1361 addr=x.x.x.217:64562:8060} (0x14e1d3241b80): Unknown error 1014 [1014]
W20260312 01:08:04.975014 23085406205696 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=927712971987 fd=1465 addr=x.x.x.217:63412:8060} (0x14ad4af24980): Unknown error 1014 [1014]
W20260312 01:08:04.975019 23085341067008 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=2894807979689 fd=1727 addr=x.x.x.217:64432:8060} (0x14de4af86d00): Unknown error 1014 [1014]
W20260312 01:08:04.975036 23085364180736 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=3496103386953 fd=1558 addr=x.x.x.218:27670:8060} (0x14fdd8e36200): Unknown error 1014 [1014]
W20260312 01:08:04.975064 23085387294464 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=2688649534549 fd=1568 addr=x.x.x.217:64614:8060} (0x14fb41a7e500): Unknown error 1014 [1014]
W20260312 01:08:04.975067 23085593155328 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=953482775202 fd=2625 addr=x.x.x.217:64084:8060} (0x14d5d6cec2c0): Unknown error 1014 [1014]
W20260312 01:08:04.975084 23085404104448 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=584115572746 fd=1520 addr=x.x.x.218:27668:8060} (0x14d9eb257980): Unknown error 1014 [1014]
W20260312 01:08:04.975099 23085389395712 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=4226247846120 fd=2384 addr=x.x.x.151:63634:8060} (0x14e0c8ea7140): Unknown error 1014 [1014]
W20260312 01:08:04.975130 23085366281984 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=1039382117166 fd=1780 addr=x.x.x.151:63846:8060} (0x14ecd5affb00): Unknown error 1014 [1014]
W20260312 01:08:04.975148 23085588952832 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=1924145371708 fd=1565 addr=x.x.x.218:27684:8060} (0x14d84ee34e00): Unknown error 1014 [1014]
W20260312 01:08:04.975179 23085595256576 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=2259152806047 fd=1019 addr=x.x.x.218:27656:8060} (0x14fbbd378680): Unknown error 1014 [1014]
W20260312 01:08:04.975202 23085370484480 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=4148938428321 fd=1567 addr=x.x.x.217:64612:8060} (0x14e46c883440): Unknown error 1014 [1014]
W20260312 01:08:04.975215 23085338965760 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=1958505105617 fd=1651 addr=x.x.x.151:63802:8060} (0x14e1fd883ac0): Unknown error 1014 [1014]
W20260312 01:08:04.975237 23085353674496 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=2276332692721 fd=1628 addr=x.x.x.151:63796:8060} (0x14ecaf360a80): Unknown error 1014 [1014]
W20260312 01:08:04.975265 23085393598208 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=953482776466 fd=2303 addr=x.x.x.151:63282:8060} (0x14c9275ebec0): Unknown error 1014 [1014]
W20260312 01:08:04.975290 23085351573248 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=3418793990201 fd=1519 addr=x.x.x.151:63968:8060} (0x14f7e88d8b40): Unknown error 1014 [1014]
W20260312 01:08:05.344913 23085597357824 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=68719494710 fd=2523 addr=x.x.x.151:63554:8060} (0x14d9e4e619c0): Unknown error 1014 [1014]
I20260312 01:17:19.174515 22986065155840 data_dir.cpp:316] load tablet from meta finished, loaded tablet: 120008, error tablet: 0, path: /data/starrocks/storage duration: 13227ms
I20260312 01:17:19.764998 22986065155840 data_dir.cpp:432] load rowset from meta finished, data dir: /data/starrocks/storage error/total: 28680/30808 duration: 503ms
W20260312 01:17:56.920247 22982380508928 pipeline_driver_executor.cpp:189] [Driver] Process error, query_id=36590ea6-1d6e-11f1-9058-a0369fd7e4a8, instance_id=36590ea6-1d6e-11f1-9058-a0369fd7e4aa, status=Duplicate RPC invocation: packet_seq 41 in PTabletWriterAddChunkRequest already process: BE:3709019
W20260312 01:17:56.933127 22982468761344 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=36590ea6-1d6e-11f1-9058-a0369fd7e4a8 fragment_id=36590ea6-1d6e-11f1-9058-a0369fd7e4aa driver=driver_0_19 addr=0x14e5f2b4d810, status=OUTPUT_FULL, operator-chain: [olap_scan_0_0x14e7da8e4110(
X) { full:false iostasks:0 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e5f2b4b410(X) -> project_1_0x14e5f2b4c010(O) -> olap_table_sink_-1_0x14e5f2b4cc10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 41 in
PTabletWriterAddChunkRequest already process: BE:3709019

10点40重启前后日志如下:
E20260312 10:30:05.901843 22986073560832 segment_flush_executor.cpp:111] failed to flush segment, txn_id: 105483647, tablet id: 102638289, status: Internal error: Fail to write segment. tablet_id: 102638289, state: kCommitted
W20260312 10:30:05.915046 22980756965120 async_delta_writer.cpp:156] Failed to submit write segment, err=Internal error: Segment flush token is not ok. The status: Internal error: cancel writer because fail to run flush task, Internal error: Fail to write segment. tablet_id: 102638289, state: kCommitt
ed
W20260312 10:30:42.733198 22981172438784 runtime_filter_worker.cpp:525] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=400ms @x.x.x.217:8060
W20260312 10:38:05.195356 22980765370112 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=120259104356 fd=4668 addr=x.x.x.218:16362:8060} (0x14d7bc07fc00): Unknown error 1014 [1014]
W20260312 10:38:05.200753 22980773775104 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=128849038858 fd=2713 addr=x.x.x.218:16012:8060} (0x14d7bbff3dc0): Unknown error 1014 [1014]
W20260312 10:38:05.209656 22980817901312 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=137438972187 fd=4503 addr=x.x.x.218:16236:8060} (0x14dec1521200): Unknown error 1014 [1014]
W20260312 10:38:05.243049 22980822103808 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=154618842755 fd=4408 addr=x.x.x.217:35820:8060} (0x14d7bc188a00): Unknown error 1014 [1014]
W20260312 10:38:05.244501 22980796888832 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=180388645652 fd=4649 addr=x.x.x.217:35874:8060} (0x14e318684680): Unknown error 1014 [1014]
W20260312 10:38:05.245778 22980820002560 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=180388646582 fd=4394 addr=x.x.x.217:35816:8060} (0x14d7bc190980): Unknown error 1014 [1014]
W20260312 10:38:05.245945 22980830508800 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=137438978486 fd=4667 addr=x.x.x.218:16356:8060} (0x14c256f88240): Unknown error 1014 [1014]
W20260312 10:38:05.247178 22980822103808 baidu_rpc_protocol.cpp:280] Fail to write into Socket{id=1219770727471 fd=4430 addr=x.x.x.217:35818:8060} (0x14dba161f880): Unknown error 1014 [1014]
W20260312 10:38:05.249236 22982414128896 pipeline_driver_executor.cpp:189] [Driver] Process error, query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, instance_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9, status=Cancelled: x.x.x.151: cancel: BE:3709019
E20260312 10:38:05.249294 22982319572736 scan_operator.cpp:484] scan fragment 71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver 0 Scan tasks error: Cancelled: canceled state
W20260312 10:38:05.260310 22982451951360 pipeline_driver_executor.cpp:189] [Driver] Process error, query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, instance_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9, status=Cancelled: x.x.x.218: cancel: BE:3709019
W20260312 10:38:05.272066 22982412027648 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.272526 22982412027648 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.272560 22982412027648 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
4 addr=0x14e64daaf810, status=RUNNING, operator-chain: [olap_scan_0_0x14e5981e7d10(X) { full:false iostasks:2 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e64c08b110(X) -> project_1_0x14e64c08bd10(O) -> olap_table_sink_-1_0x14e64daae910(X)]] [operator=
olap_table_sink_-1_0x14e64daae910(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.273069 22982412027648 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_14 addr=0x14e64daaf810, status=RUNNING, operator-chain: [olap_scan_0_0x14e5981e7d10(X) {
full:false iostasks:2 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e64c08b110(X) -> project_1_0x14e64c08bd10(O) -> olap_table_sink_-1_0x14e64daae910(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.273106 22982412027648 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.276522 22982451951360 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.276665 22982370002688 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.276758 22982370002688 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.276774 22982370002688 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
6 addr=0x14e62bc7f910, status=RUNNING, operator-chain: [olap_scan_0_0x14e7df03f210(X) { full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62b4aba10(X) -> project_1_0x14e62b837610(O) -> olap_table_sink_-1_0x14e62b839710(X)]] [operator=
olap_table_sink_-1_0x14e62b839710(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.277144 22982370002688 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_16 addr=0x14e62bc7f910, status=RUNNING, operator-chain: [olap_scan_0_0x14e7df03f210(X) {
full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62b4aba10(X) -> project_1_0x14e62b837610(O) -> olap_table_sink_-1_0x14e62b839710(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.277180 22982370002688 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.277195 22982426736384 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.277310 22982451951360 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.277420 22982451951360 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
9 addr=0x14de77c12e10, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d88710(X) { full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e47d4df910(X) -> project_1_0x14e47d4e1110(X) -> olap_table_sink_-1_0x14de77c11610(X)]] [operator=
olap_table_sink_-1_0x14de77c11610(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.277608 22982451951360 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_19 addr=0x14de77c12e10, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d88710(X) {
full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e47d4df910(X) -> project_1_0x14e47d4e1110(X) -> olap_table_sink_-1_0x14de77c11610(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.279924 22982426736384 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_23 addr=0x14e62c3bf910, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d89b10(X) {
full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62c3bd510(X) -> project_1_0x14e62c3be110(X) -> olap_table_sink_-1_0x14e62c3bed10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.279984 22982426736384 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.279999 22982426736384 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.280082 22982414128896 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.280659 22982399420160 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.280695 22982456153856 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.280744 22982414128896 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.218: cancel
W20260312 10:38:05.280796 22982456153856 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.280923 22982414128896 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_17 addr=0x14e62cb84510, status=RUNNING, operator-chain: [olap_scan_0_0x14e68bf2cd10(X) {
full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62cb7b010(X) -> project_1_0x14e62cb7bc10(X) -> olap_table_sink_-1_0x14e62cb83910(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.280943 22982399420160 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.281069 22982456153856 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.281093 22982456153856 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
5 addr=0x14e62b4a9c10, status=RUNNING, operator-chain: [olap_scan_0_0x14e7df03cf10(X) { full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62f7b0210(X) -> project_1_0x14e62f7b1410(X) -> olap_table_sink_-1_0x14e62b4a9010(X)]] [operator=
olap_table_sink_-1_0x14e62b4a9010(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.281211 22982399420160 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.281261 22982399420160 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_2
0 addr=0x14dbe930a910, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d88c10(X) { full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14de800c0910(X) -> project_1_0x14de800c1510(O) -> olap_table_sink_-1_0x14de800c2a10(X)]] [operator=
olap_table_sink_-1_0x14de800c2a10(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.281492 22982399420160 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_20 addr=0x14dbe930a910, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d88c10(X) {
full:false iostasks:4 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14de800c0910(X) -> project_1_0x14de800c1510(O) -> olap_table_sink_-1_0x14de800c2a10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.281637 22982456153856 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_15 addr=0x14e62b4a9c10, status=RUNNING, operator-chain: [olap_scan_0_0x14e7df03cf10(X) {
full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62f7b0210(X) -> project_1_0x14e62f7b1410(X) -> olap_table_sink_-1_0x14e62b4a9010(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.282720 22982451951360 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.282756 22982451951360 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.282772 22982451951360 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.218: cancel
W20260312 10:38:05.283028 22982391015168 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.283853 22982391015168 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.284344 22982391015168 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.284364 22982391015168 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_2
2 addr=0x14e62f357a10, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d89610(X) { full:false iostasks:2 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62f355610(X) -> project_1_0x14e62f356210(X) -> olap_table_sink_-1_0x14e62f356e10(X)]] [operator=
olap_table_sink_-1_0x14e62f356e10(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.284609 22982391015168 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_22 addr=0x14e62f357a10, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d89610(X) {
full:false iostasks:2 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62f355610(X) -> project_1_0x14e62f356210(X) -> olap_table_sink_-1_0x14e62f356e10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.286042 22982414128896 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286101 22982414128896 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.286111 22982414128896 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286615 22982399420160 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286660 22982399420160 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286670 22982399420160 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286752 22982456153856 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286780 22982456153856 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.286791 22982456153856 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.289715 22982391015168 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.289765 22982391015168 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.289790 22982391015168 tablet_sink_sender.cpp:299] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.293368 22982449850112 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283064, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.293934 22982449850112 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.294565 22982376306432 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.294846 22982376306432 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.294872 22982376306432 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
3 addr=0x14e64c089310, status=RUNNING, operator-chain: [olap_scan_0_0x14e5981e7810(X) { full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e66bd0b710(X) -> project_1_0x14e66bd8ac10(O) -> olap_table_sink_-1_0x14e66bd8b810(X)]] [operator=
olap_table_sink_-1_0x14e66bd8b810(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.295300 22982376306432 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_13 addr=0x14e64c089310, status=RUNNING, operator-chain: [olap_scan_0_0x14e5981e7810(X) {
full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e66bd0b710(X) -> project_1_0x14e66bd8ac10(O) -> olap_table_sink_-1_0x14e66bd8b810(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.295343 22982376306432 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.296538 22982386812672 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=x.x.x.151: cancel
W20260312 10:38:05.301067 22982424635136 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3709019], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted
at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792
W20260312 10:38:05.301342 22982449850112 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_21 addr=0x14e62f353b10, status=RUNNING, operator-chain: [olap_scan_0_0x14e677d89110(X) {
full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14dbe930c710(X) -> project_1_0x14e62f352310(X) -> olap_table_sink_-1_0x14e62f352f10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.301390 22982386812672 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_18 addr=0x14e4b292d810, status=RUNNING, operator-chain: [olap_scan_0_0x14e67f022910(X) {
full:false iostasks:2 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e62cbede10(X) -> project_1_0x14e4b292c010(O) -> olap_table_sink_-1_0x14e4b292cc10(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.301536 22982386812672 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.301565 22982386812672 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.301429 22982449850112 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.305467 22982424635136 pipeline_driver.cpp:870] [Driver] failed to finish operator called by cancelling operator [fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9] [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_1
2 addr=0x14e67748d410, status=RUNNING, operator-chain: [olap_scan_0_0x14e5ab4ead10(X) { full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e6773ebd10(X) -> project_1_0x14e67748b910(O) -> olap_table_sink_-1_0x14e67748c810(X)]] [operator=
olap_table_sink_-1_0x14e67748c810(X)] [error=load channel: 71822e87-1dbc-11f1-9058-a0369fd7e4a8 was aborted at 1773283085, reason: Cancelled: Cancelled by pipeline engine, reason: Duplicate RPC invocation: packet_seq 70 in PTabletWriterAddChunkRequest already process: BE:3397792]
W20260312 10:38:05.306128 22982424635136 pipeline_driver.cpp:601] cancel pipeline driver error [driver=query_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8 fragment_id=71822e87-1dbc-11f1-9058-a0369fd7e4a9 driver=driver_0_12 addr=0x14e67748d410, status=RUNNING, operator-chain: [olap_scan_0_0x14e5ab4ead10(X) {
full:false iostasks:3 has_active:false num_chunks:0 morsel:fixed_morsel_queue empty:false} -> chunk_accumulate_0_0x14e6773ebd10(X) -> project_1_0x14e67748b910(O) -> olap_table_sink_-1_0x14e67748c810(X)]]: Cancelled: Cancelled by pipeline engine, reason: Cancelled: x.x.x.151: cancel: BE:3709019
W20260312 10:38:05.306182 22982424635136 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397791], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
W20260312 10:38:05.306192 22982424635136 tablet_sink_sender.cpp:250] close channel failed. channel_name=NodeChannel[3397792], load_info=load_id=71822e87-1dbc-11f1-9058-a0369fd7e4a8, txn_id: 105487766, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine, reason: Cancelled: x.x.x.151
: cancel: BE:3709019
I20260312 10:41:25.170768 23021062448896 data_dir.cpp:316] load tablet from meta finished, loaded tablet: 118337, error tablet: 0, path: /data/starrocks/storage duration: 22340ms
I20260312 10:41:26.714400 23021062448896 data_dir.cpp:432] load rowset from meta finished, data dir: /data/starrocks/storage error/total: 45600/58174 duration: 1443ms

我通过AI分析3台be节点的be.out ,/var/log/messages,给出3.5.12版本存在bug


请大佬帮忙看下,这个分析靠谱吗?

有没有开启coredump?

按照 如何获取coredump 这个说的,开启过一次coredump,文件有218G,压缩后也有17G,但是不知道怎么看

1.使用对应版本的debuginfo包


2.使用gdb查看dump

./gdb be/lib/starrocks_be core_xxx
#打印堆栈
>bt
#显示结构体时会比较漂亮
>set print pretty
#跳到第一个错误栈
>f 0

以下根据具体情况分析,得结合对应版本的be代码分析变量

#打印当前指针指向的地址
>p this 
#打印指针对应的值
>p *this
#查看当前调试函数的堆栈帧信息
>info frame