为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
对业务表通过backup & restore 的方式从另一个2.5.10的SR集群进行数据迁移,迁移完后发现写数据没法正常的publish, 导致txn 超过库的限制导数任务报错。
具体操作如下:
2023-08-19 15:30 左右 在新的SR集群执行restore 脚本,恢复数据
2023-08-19 16:00 启动该表在SR新集群的同步任务
2023-08-19 17:55 左右发下该库的flink 同步任务报txn 超过限制的问题
具体查看该库的事务发现全部被刚刚restore的其中一个表占满。
running_txns.txt (136.4 KB)
2023-08-19 18:20 左右,对BE进行重启,发现有4个节点无法正常启动了 ,报以下的错误信息
start time: Sat Aug 19 18:24:32 CST 2023
*** Check failure stack trace: ***
2.5.10 RELEASE (build 9feb716)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 7462877688
tracker:query_pool consumption: 459437848
tracker:load consumption: 21512408
tracker:metadata consumption: 790087354
tracker:tablet_metadata consumption: 259752441
tracker:rowset_metadata consumption: 435956585
tracker:segment_metadata consumption: 15375654
tracker:column_metadata consumption: 79002674
tracker:tablet_schema consumption: 40104641
tracker:segment_zonemap consumption: 14356603
tracker:short_key_index consumption: 0
tracker:column_zonemap_index consumption: 27494114
tracker:ordinal_index consumption: 25327632
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 3259667504
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:update consumption: 603101456
tracker:chunk_allocator consumption: 89077400
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1692440721 (unix time) try “date -d @1692440721” if you are using GNU date ***
PC: @ 0x7f2d69ba5387 __GI_raise
*** SIGABRT (@0x4b200003699) received by PID 13977 (TID 0x7f23757ce700) from PID 13977; stack trace: ***
@ 0x5aed0a2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f2d6a65a630 (unknown)
@ 0x7f2d69ba5387 __GI_raise
@ 0x7f2d69ba6a78 __GI_abort
@ 0x2cce6fe starrocks::failure_function()
@ 0x5ae0a7d google::LogMessage::Fail()
@ 0x5ae2eef google::LogMessage::SendToLog()
@ 0x5ae05ce google::LogMessage::Flush()
@ 0x5ae34f9 google::LogMessageFatal::~LogMessageFatal()
@ 0x424efdc starrocks::TabletUpdates::_apply_rowset_commit()
@ 0x424f4c3 starrocks::TabletUpdates::do_apply()
@ 0x4af5cd5 starrocks::ThreadPool::dispatch_thread()
@ 0x4af06ba starrocks:
:supervise_thread()
@ 0x7f2d6a652ea5 start_thread
@ 0x7f2d69c6db0d __clone
@ 0x0 (unknown)
对其中一个BE节点生成coredump 文件
其中一个BE 节点的日志
FE Leader 节点的日志
【背景】做过哪些操作?
【业务影响】
【StarRocks版本】例如:2.5.10
【集群规模】例如:3fe + 13be
【机器信息】CPU虚拟核/内存/网卡,64C/256G/万兆
【联系方式】杨荣 StarRocks 社区群3
【附件】
coredump 文件及FE/BE日志附件太大,没法上传,发链接给社区的 @trueeyu老师了