【详述】BE被异常kill产生大量CORE文件
【背景】每天晚上ETL抽取T+1,先truncate后insert覆盖
【业务影响】2个BE节点宕机,导致业务受损
【StarRocks版本】2.3.0
【集群规模】3fe(3 follower)+10be(fe与be混部)
【联系方式】15623937986
【附件】
BE报错日志
I0209 09:30:01.176982 11074 tablet_manager.cpp:616] Found the best tablet to compact. compaction_type=base tablet_id=204993557 highest_score=2
I0209 09:30:01.376366 11438 stream_load.cpp:208] new income streaming load request.id=d045fd34f38ad135-6f418a687ff16ab1, job_id=-1, txn_id: -1, label=0bb2f476-b345-4be3-8e18-48bf69a75431, db=dip_test, db=dip_test, tbl=t_yarn_running_tasks
I0209 09:30:01.379004 11438 stream_load_executor.cpp:57] begin to execute job. label=0bb2f476-b345-4be3-8e18-48bf69a75431, txn_id: 3381712, query_id=d045fd34-f38a-d135-6f41-8a687ff16ab1
I0209 09:30:01.379065 11438 plan_fragment_executor.cpp:69] Prepare(): query_id=d045fd34-f38a-d135-6f41-8a687ff16ab1 fragment_instance_id=d045fd34-f38a-d135-6f41-8a687ff16ab2 backend_num=0
I0209 09:30:01.380043 10693 plan_fragment_executor.cpp:180] Open(): fragment_instance_id=d045fd34-f38a-d135-6f41-8a687ff16ab2
I0209 09:30:01.540858 11149 tablet_manager.cpp:616] Found the best tablet to compact. compaction_type=cumulative tablet_id=208890692 highest_score=3
I0209 09:30:01.541921 11208 tablet_manager.cpp:616] Found the best tablet to compact. compaction_type=cumulative tablet_id=208892579 highest_score=3
I0209 09:30:01.541921 11183 tablet_manager.cpp:616] Found the best tablet to compact. compaction_type=cumulative tablet_id=208891529 highest_score=3
I0209 09:30:01.541924 11197 tablet_manager.cpp:616] Found the best tablet to compact. compaction_type=cumulative tablet_id=208889872 highest_score=3
I0209 09:54:14.369508 38650 daemon.cpp:260]  version 2.3.0 RELEASE (build a9bdb09)
Built on 2022-07-26 20:15:11 by StarRocks@docker
I0209 09:54:14.389011 38650 mem_info.cpp:74] Physical Memory: 251.32 GB
I0209 09:54:14.389045 38650 daemon.cpp:266] Cpu Info:
Model: Intel® Xeon® Gold 6230 CPU @ 2.10GHz
Cores: 80
Max Possible Cores: 80
L1 Cache: 32.00 KB (Line: 64.00 B)
L2 Cache: 1.00 MB (Line: 64.00 B)
L3 Cache: 27.50 MB (Line: 64.00 B)
gdb分析core文件,信息如下:
*** Aborted at 1675906201 (unix time) try “date -d @1675906201” if you are using GNU date ***
PC: @          0x27460a1 starrocks::vectorized::JsonDocumentStreamParser::get_current()
*** SIGSEGV (@0x8) received by PID 36486 (TID 0x7fb735be6700) from PID 8; stack trace: ***
@          0x3fa3ad2 google::(anonymous namespace)::FailureSignalHandler()
@     0x7fba13ae3630 (unknown)
@          0x27460a1 starrocks::vectorized::JsonDocumentStreamParser::get_current()
@          0x27455d7 starrocks::vectorized::JsonReader::_read_rows<>()
@          0x27414d9 starrocks::vectorized::JsonReader::read_chunk()
@          0x27416ec starrocks::vectorized::JsonScanner::get_next()
@          0x272e5e0 starrocks::vectorized::FileScanNode::_scanner_scan()
@          0x272ff4f starrocks::vectorized::FileScanNode::_scanner_worker()
@          0x5a21410 execute_native_thread_routine
@     0x7fba13adbea5 start_thread
@     0x7fba130f696d __clone
@                0x0 (unknown)