flink-cdc同步导致BE服务自动停止。

【详述】这边通过flinkcdc进行采集导入到StarRocks。
flink sink 设置了15秒/10W一批刷入SR。
同步脚本1:一共同步了60张表,分两个脚本,两波同步,数据量大约5亿。
ETL脚本2:有10个10分钟一批查询SR再插入SR的表。逻辑大约6个Join。数据量千万级别。
【背景】flinkcdc同步StarRocks
【业务影响】所有数据导入失败。
【StarRocks版本】2.3
【集群规模】3fe(1 follower+2observer)+3be(fe与be混部)
【表模型】主键模型
【导入或者导出方式】Flink
【附件】



您好,cpu和内存使用多少?监控指标发下

辛苦您发下完整的be.out

劳烦看下。
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
*** Aborted at 1670233630 (unix time) try “date -d @1670233630” if you are using GNU date ***
PC: @ 0x2860c7b starrocks::ExprContext::close()
*** SIGSEGV (@0x60) received by PID 300249 (TID 0x7f475d772700) from PID 96; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f47a6a80852 os::Linux::chained_handler()
@ 0x7f47a6a87676 JVM_handle_linux_signal
@ 0x7f47a6a7d653 signalHandler()
@ 0x7f47a5f46630 (unknown)
@ 0x2860c7b starrocks::ExprContext::close()
@ 0x2862b8f starrocks::Expr::close()
@ 0x24b6b9e starrocks::vectorized::TabletScanner::close()
@ 0x24b6fe8 starrocks::vectorized::TabletScanner::~TabletScanner()
@ 0x220bb47 ZZN9starrocks10ObjectPool3addINS_10vectorized13TabletScannerEEEPT_S5_ENUlPvE_4_FUNES6
@ 0x220b3cf starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x220b992 starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x1cb3927 std::_Sp_counted_ptr<>::_M_dispose()
@ 0x1cae1a2 starrocks::RuntimeState::~RuntimeState()
@ 0x1c47202 starrocks::FragmentExecState::~FragmentExecState()
@ 0x1c505ab std::_Sp_counted_ptr<>::_M_dispose()
@ 0x16d495a std::_Sp_counted_base<>::_M_release()
@ 0x1c48715 _ZNSt17_Function_handlerIFvvEZN9starrocks11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsERKSt8functionIFvPNS1_20PlanFragmentExecutorEEESC_EUlvE_E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation
@ 0x1d74a72 starrocks::FunctionRunnable::~FunctionRunnable()
@ 0x1d74572 starrocks::ThreadPool::dispatch_thread()
@ 0x1d6fd78 starrocks::thread::supervise_thread()
@ 0x7f47a5f3eea5 start_thread
@ 0x7f47a53438dd __clone
@ 0x0 (unknown)
start time: Mon Dec 5 17:53:16 CST 2022
start time: Mon Dec 5 19:36:37 CST 2022
[doris@slave03 log]$ cat be.out | tail -n 100
*** SIGSEGV (@0x0) received by PID 384209 (TID 0x7f933495a700) from PID 0; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f9389c73852 os::Linux::chained_handler()
@ 0x7f9389c7a676 JVM_handle_linux_signal
@ 0x7f9389c70653 signalHandler()
@ 0x7f9389139630 (unknown)
@ 0x0 (unknown)
start time: Mon Dec 5 16:08:33 CST 2022
*** Aborted at 1670227921 (unix time) try “date -d @1670227921” if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 69244 (TID 0x7fa1c2549700) from PID 0; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fa220060852 os::Linux::chained_handler()
@ 0x7fa220067676 JVM_handle_linux_signal
@ 0x7fa22005d653 signalHandler()
@ 0x7fa21f526630 (unknown)
@ 0x0 (unknown)
start time: Mon Dec 5 16:16:43 CST 2022
log4j:WARN No appenders could be found for logger (org.apache.hadoop.fs.FileSystem).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
*** Aborted at 1670228660 (unix time) try “date -d @1670228660” if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 91242 (TID 0x7f6fb7b65700) from PID 0; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f7007765852 os::Linux::chained_handler()
@ 0x7f700776c676 JVM_handle_linux_signal
@ 0x7f7007762653 signalHandler()
@ 0x7f7006c2b630 (unknown)
@ 0x0 (unknown)
start time: Mon Dec 5 16:24:57 CST 2022
start time: Mon Dec 5 17:00:50 CST 2022
start time: Mon Dec 5 17:31:56 CST 2022
*** Check failure stack trace: ***
@ 0x34f435d google::LogMessage::Fail()
@ 0x34f6639 google::LogMessage::SendToLog()
@ 0x34f3ed9 google::LogMessage::Flush()
@ 0x34f6c39 google::LogMessageFatal::~LogMessageFatal()
@ 0x15e1cd7 main
@ 0x7ff71889e555 __libc_start_main
@ 0x16cdb6e (unknown)
@ (nil) (unknown)
start time: Mon Dec 5 17:34:47 CST 2022
*** Aborted at 1670233130 (unix time) try “date -d @1670233130” if you are using GNU date ***
PC: @ 0x2860c7b starrocks::ExprContext::close()
*** SIGSEGV (@0x60) received by PID 288088 (TID 0x7f75bd36a700) from PID 96; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f7609be7630 (unknown)
@ 0x2860c7b starrocks::ExprContext::close()
@ 0x2862b8f starrocks::Expr::close()
@ 0x24b6b9e starrocks::vectorized::TabletScanner::close()
@ 0x24b6fe8 starrocks::vectorized::TabletScanner::~TabletScanner()
@ 0x220bb47 ZZN9starrocks10ObjectPool3addINS_10vectorized13TabletScannerEEEPT_S5_ENUlPvE_4_FUNES6
@ 0x220b3cf starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x220b992 starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x1cb3927 std::_Sp_counted_ptr<>::_M_dispose()
@ 0x1cae1a2 starrocks::RuntimeState::~RuntimeState()
@ 0x1c47202 starrocks::FragmentExecState::~FragmentExecState()
@ 0x1c505ab std::_Sp_counted_ptr<>::_M_dispose()
@ 0x16d495a std::_Sp_counted_base<>::_M_release()
@ 0x1c48715 _ZNSt17_Function_handlerIFvvEZN9starrocks11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsERKSt8functionIFvPNS1_20PlanFragmentExecutorEEESC_EUlvE_E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation
@ 0x1d74a72 starrocks::FunctionRunnable::~FunctionRunnable()
@ 0x1d74572 starrocks::ThreadPool::dispatch_thread()
@ 0x1d6fd78 starrocks::thread::supervise_thread()
@ 0x7f7609bdfea5 start_thread
@ 0x7f7608fe48dd __clone
@ 0x0 (unknown)
start time: Mon Dec 5 17:39:46 CST 2022
log4j:WARN No appenders could be found for logger (org.apache.hadoop.fs.FileSystem).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
*** Aborted at 1670233630 (unix time) try “date -d @1670233630” if you are using GNU date ***
PC: @ 0x2860c7b starrocks::ExprContext::close()
*** SIGSEGV (@0x60) received by PID 300249 (TID 0x7f475d772700) from PID 96; stack trace: ***
@ 0x34fe482 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f47a6a80852 os::Linux::chained_handler()
@ 0x7f47a6a87676 JVM_handle_linux_signal
@ 0x7f47a6a7d653 signalHandler()
@ 0x7f47a5f46630 (unknown)
@ 0x2860c7b starrocks::ExprContext::close()
@ 0x2862b8f starrocks::Expr::close()
@ 0x24b6b9e starrocks::vectorized::TabletScanner::close()
@ 0x24b6fe8 starrocks::vectorized::TabletScanner::~TabletScanner()
@ 0x220bb47 ZZN9starrocks10ObjectPool3addINS_10vectorized13TabletScannerEEEPT_S5_ENUlPvE_4_FUNES6
@ 0x220b3cf starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x220b992 starrocks::vectorized::OlapScanNode::~OlapScanNode()
@ 0x1cb3927 std::_Sp_counted_ptr<>::_M_dispose()
@ 0x1cae1a2 starrocks::RuntimeState::~RuntimeState()
@ 0x1c47202 starrocks::FragmentExecState::~FragmentExecState()
@ 0x1c505ab std::_Sp_counted_ptr<>::_M_dispose()
@ 0x16d495a std::_Sp_counted_base<>::_M_release()
@ 0x1c48715 _ZNSt17_Function_handlerIFvvEZN9starrocks11FragmentMgr18exec_plan_fragmentERKNS1_23TExecPlanFragmentParamsERKSt8functionIFvPNS1_20PlanFragmentExecutorEEESC_EUlvE_E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation
@ 0x1d74a72 starrocks::FunctionRunnable::~FunctionRunnable()
@ 0x1d74572 starrocks::ThreadPool::dispatch_thread()
@ 0x1d6fd78 starrocks::thread::supervise_thread()
@ 0x7f47a5f3eea5 start_thread
@ 0x7f47a53438dd __clone
@ 0x0 (unknown)
start time: Mon Dec 5 17:53:16 CST 2022

同步脚本1和ETL脚本2都是flink作业吗?ETL是指用flink source connector从StarRocks读出数据然后再用sink connector写回去?

Flink侧异常信息是什么

已修复,升级到2.3.5

flink报错是 timeout txn manager。
还有报错是 canceled 。没有具体报错信息。
同步脚本是 doris的定时ETL。就是读doris表 然后join doris表。然后写到doris表。定时脚本执行的。
刚才BE又崩了。这次看着好像是丢了好多版本,3副本没有同步。

是什么BUG引起的呢?

还没升级吗?常见 Crash / BUG 堆栈查询