【救命】3.0.0-48f4d81 be频繁重启

【详述】问题详细描述
【背景】做过哪些操作? 运维侧没动过
【业务影响】
【StarRocks版本】3.0.0-48f4d81
【集群规模】例如:3fe)+3be
【机器信息】4c32G
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢
【附件】

be.INFO.tar.xz (4.7 MB) INFO日志
be.WARNING.tar.xz (3.4 MB) WARNING日志

be.out (9.7 KB)

c6fd26a4-162e-11ee-889b-00163e1af23b
3c9d4b31-162b-11ee-98b8-00163e28f831
72db9830-162c-11ee-98b8-00163e28f831
在fe.audit.log 里面找一下折几个query, 暂时拉黑一下

感觉在缓存释放上有些问题,现在每10分钟清理下内存

先升级3.0.2吧

看起来是某个query导致的,先看看这个query是不是稳定把be 打挂

我先升级下,数据在最近有大量新增,我再看看有没有sql导致的

SELECT NULL,form_widget_id,widget_title,enum_name,COUNT(widget_value) AS option_num,day_date,NOW() FROM

(

    SELECT a.form_widget_id,a.enum_name,b.widget_title,

            # 添加这句IF判断就会挂掉,在mysql中可执行,starrocks表中数据只有几百行

           IF(b.widget_value_a LIKE CONCAT('%',a.enum_name_a,'%'),a.enum_name,NULL) AS widget_value,

               DATE_FORMAT(b.gmt_create,'%Y-%m-%d') AS day_date

    FROM

    (

            SELECT xxxxx

    ) a

    JOIN

    (

            SELECT xxxxx

    ) b

    ON a.xxx=b.xxxx

    GROUP BY a.xxxxx

) c

GROUP BY xxxx

ORDER BY xxxxxx

有使用JDBC外表吗?

没有外表。。。like(b.widget_value,CONCAT(’%’,a.enum_name,’%’)) as n, sql改成这样就行了,可能是starrocks不太兼容吧

应该是BUG,你先级到3.0.2,如果还有问题,我们介入非查下

发下挂的SQL的执行计划 explain verbose 这个SQL;

_explain_verbose_SELECT_NULL_form_widget_id_widget_title_enum_na_202306291737.txt (42.1 KB)

升级后,还挂吗?

不挂了,会报这个错SQL 错误 [1064] [42000]: Internal error: vector::reserve

去BE上找下这个日志,发我下: vector::reserve, 这个关键字

这里应该是有BUG

I0629 17:52:46.734553 9167 pipeline_driver_executor.cpp:270] [Driver] Succeed to report exec state: fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af243
I0629 17:52:46.896337 9165 internal_service.cpp:486] cancel fragment, fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af248, reason: InternalError
W0629 17:52:46.896365 9165 fragment_context.cpp:123] [Driver] Canceled, query_id=b34cba0a-1662-11ee-b147-00163e1af23b, instance_id=b34cba0a-1662-11ee-b147-00163e1af248, reason=InternalError
I0629 17:52:46.897459 9167 pipeline_driver_executor.cpp:270] [Driver] Succeed to report exec state: fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af248
I0629 17:52:46.907552 9164 internal_service.cpp:486] cancel fragment, fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af245, reason: InternalError
W0629 17:52:46.907574 9164 fragment_context.cpp:123] [Driver] Canceled, query_id=b34cba0a-1662-11ee-b147-00163e1af23b, instance_id=b34cba0a-1662-11ee-b147-00163e1af245, reason=InternalError
I0629 17:52:46.907677 9174 sink_buffer.cpp:204] fragment_instance_id b34cba0a-1662-11ee-b147-00163e1af245 -> b34cba0a-1662-11ee-b147-00163e1af245, _num_uncancelled_sinkers 1, _is_finishing false, _num_remaining_eos 1
W0629 17:52:46.951573 9175 stack_util.cpp:350] 2023-06-29 17:52:46.941585, query_id=b34cba0a-1662-11ee-b147-00163e1af23b, fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af245 throws exception: std::length_error, trace:
@ 0x2ea7f77 std::__throw_length_error()
@ 0x30e132f std::vector<>::reserve()
@ 0x30e136f starrocks::BinaryColumnBase<>::_build_slices()
@ 0x5690258 starrocks::ColumnViewer<>::ColumnViewer()
@ 0x5887fa4 starrocks::StringFunctions::concat()
@ 0x454a544 starrocks::VectorizedFunctionCallExpr::evaluate_checked()
@ 0x3e225e3 starrocks::ExprContext::evaluate()
@ 0x454a0d4 starrocks::VectorizedFunctionCallExpr::evaluate_checked()
@ 0x4509574 starrocks::VectorizedIfExpr<>::evaluate_checked()
@ 0x3e225e3 starrocks::ExprContext::evaluate()
@ 0x3e2292f starrocks::ExprContext::evaluate()
@ 0x33e20b4 starrocks::pipeline::ProjectOperator::push_chunk()
@ 0x310fef9 starrocks::pipeline::PipelineDriver::process()
@ 0x578d84b starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x50e5222 starrocks::ThreadPool::dispatch_thread()
@ 0x50dfd1a starrocks::thread::supervise_thread()
@ 0x7f24d588dea5 start_thread
@ 0x7f24d4ea8b0d __clone
@ (nil) (unknown)

W0629 17:52:46.956688 9175 pipeline_driver_executor.cpp:165] [Driver] Process error, query_id=b34cba0a-1662-11ee-b147-00163e1af23b, instance_id=b34cba0a-1662-11ee-b147-00163e1af245, status=Internal error: Internal error: vector::reserve
I0629 17:52:46.956722 9175 sink_buffer.cpp:204] fragment_instance_id b34cba0a-1662-11ee-b147-00163e1af245 -> b34cba0a-1662-11ee-b147-00163e1af245, _num_uncancelled_sinkers 0, _is_finishing true, _num_remaining_eos 1
I0629 17:52:46.957053 9167 pipeline_driver_executor.cpp:263] [Driver] Fail to report exec state due to query not found: fragment_instance_id=b34cba0a-1662-11ee-b147-00163e1af245
I0629 17:52:47.470115 9353 rowset_merger.cpp:252] compaction merge finished. tablet=33007 #key=1 algorithm=VERTICAL_COMPACTION column_group_size=4 input(entry=2 rows=749151 del=32293 actual=749151 bytes=47.33 MB) output(rows=749151 chunk=338 bytes=45.54 MB) duration: 875ms
I0629 17:52:47.475700 9353 tablet_updates.cpp:1506] commit compaction tablet:33007 version:642.2 rowset:1271 #seg:1 #row:749151 size:45.54 MB #pending:0 state_memory:11.43 MB
I0629 17:52:47.475868 13622 tablet_updates.cpp:1535] apply_compaction_commit start tablet:33007 version:642.2 rowset:1271
I0629 17:52:47.504890 13622 primary_index.cpp:1132] load primary index finish table:24228 tablet:33007 version:642 #rowset:2 #segment:2 data_size:49631857 rowsets:961,1270 size:749151 capacity:1048560 memory:17825520 duration: 29ms
I0629 17:52:47.519402 13622 tablet_updates.cpp:1711] apply_compaction_commit finish tablet:33007 version:642.2 total del/row:0/749151 0% rowset:1271 #row:749151 #del:0 #delvec:1 duration:44ms(29/15/0)
I0629 17:52:47.520336 9353 tablet_manager.cpp:662] Found the best tablet to compact. compaction_type=update tablet_id=33019 highest_score=228999694
I0629 17:52:47.520382 9353 tablet_updates.cpp:2009] update compaction start tablet:33019 version:642.1 score:228999696 pick:2/valid:2/all:2 961,1270 #rows:781532->749279 bytes:47.37 MB->45.42 MB(estimate)
I0629 17:52:48.251662 9157 fragment_executor.cpp:165] Prepare(): query_id=b44a7afc-1662-11ee-a1c7-00163e28f831 fragment_instance_id=b44a7afc-1662-11ee-a1c7-00163e28f832 is_stream_pipeline=0 backend_num=1
I0629 17:52:48.253232 9167 pipeline_driver_executor.cpp:270] [Driver] Succeed to report exec state: fragment_instance_id=b44a7afc-1662-11ee-a1c7-00163e28f832
I0629 17:52:48.270458 9353 rowset_merger.cpp:252] compaction merge finished. tablet=33019 #key=1 algorithm=VERTICAL_COMPACTION column_group_size=4 input(entry=2 rows=749279 del=32253 actual=749279 bytes=47.37 MB) output(rows=749279 chunk=340 bytes=45.60 MB) duration: 750ms
I0629 17:52:48.275815 9353 tablet_updates.cpp:1506] commit compaction tablet:33019 version:642.2 rowset:1271 #seg:1 #row:749279 size:45.60 MB #pending:0 state_memory:11.43 MB

给个be.WARNING reserve的报错,然后再发一个explain verbose