BE挂掉以后无法启动

【详述】执行3个大表同时insert导数,导致3个BE节点全部挂掉,并且无法启动
【背景】执行3个大表同时insert导数,导致3个BE节点全部挂掉,有core文件,并且全部无法启动
【业务影响】集群无法使用
【StarRocks版本】例如:2.4.0
【集群规模】例如:1fe + 3be
【机器信息】8C/32G
【附件】


  • 慢查询:

    • Profile信息
    • 并行度:
    • pipeline是否开启:开启了
    • be节点cpu和内存使用率截图
  • 查询报错:

  • be crash

    • be.out
      start time: Tue Nov 15 10:56:51 CST 2022
      query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
      *** Aborted at 1668481013 (unix time) try “date -d @1668481013” if you are using GNU date ***
      PC: @ 0x226b325 starrocks::vectorized::MaskMergeIterator::do_get_next()
      *** SIGSEGV (@0x0) received by PID 5431 (TID 0x7f4a85a9e700) from PID 0; stack trace: ***
      @ 0x481e332 google::(anonymous namespace)::FailureSignalHandler()
      @ 0x7f4b5547d630 (unknown)
      @ 0x226b325 starrocks::vectorized::MaskMergeIterator::do_get_next()
      @ 0x1f9623c starrocks::vectorized::RowsetMergerImpl<>::_do_merge_vertically()
      @ 0x1f97a93 starrocks::vectorized::RowsetMergerImpl<>::do_merge()
      @ 0x1f8b12c starrocks::vectorized::compaction_merge_rowsets()
      @ 0x1e74e9c starrocks::TabletUpdates::_do_compaction()
      @ 0x1e75e8c starrocks::TabletUpdates::compaction()
      @ 0x1dec879 starrocks::StorageEngine::_perform_update_compaction()
      @ 0x1fed5be starrocks::StorageEngine::_update_compaction_thread_callback()
      @ 0x631ac60 execute_native_thread_routine
      @ 0x7f4b55475ea5 start_thread
      @ 0x7f4b54a908dd __clone
      @ 0x0 (unknown)

表是什么模型?core文件方便的话也上传下看看,3个be的be.out文件内容是一样的吗?

表有三张,一张unique,两张PRIMARY,同时导入,可以发core的,be.out都一样的

core传不了,太大了,怎么给你

谁能帮忙看看啊,集群起不来,瘫痪着呢,急啊

升级到2.4.1 或是临是修改be.conf vertical_compaction_max_columns_per_group=1000

等升级后,再去掉这个配置试试

2.4.1已发布,直接升级到2.4.1就行

http://cdn-release.starrocks.com/StarRocks-2.4.1.tar.gz?Expires=1669708222&OSSAccessKeyId=LTAI4GFYjbX9e7QmFnAAvkt8&Signature=CK0%2BSR%2FcDcxN8UvLkdRIdxTY7Us%3D

:pray: :pray: :pray:
感谢哈,我升级一下试试

升级后,还有问题吗?

暂时没有发现问题,谢谢啦

我还想问一下大神,我们使用SR的时候,如果内存已经用得很多了,后续的查询或者导数就会报错,有没有队列之类的等待功能呢

导数据报什么错?

1064 - Memory of process exceed limit. read chunk from storage Backend: 10.10.210.26, fragment: 0aaf445b-6579-11ed-84f4-fa163eb174f0 Used: 28721411366, Limit: 28699114843. Mem usage has exceed the limit of BE

curl -s http://BE_IP:BE_HTTP+PORT/metrics | grep “^starrocks_be_.*_mem_bytes|^starrocks_be_tcmalloc_bytes_in_use” 分析下哪里用的内存多

当前还不支持预估内存使用,排队功能

主键模型 compaction crash

query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
*** Aborted at 1666941018 (unix time) try "date -d @1666941018" if you are using GNU date ***
PC: @          0x226b325 starrocks::vectorized::MaskMergeIterator::do_get_next()
*** SIGSEGV (@0x0) received by PID 24658 (TID 0x7f5a42fff700) from PID 0; stack trace: ***
    @          0x481e332 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f5bd751f630 (unknown)
    @          0x226b325 starrocks::vectorized::MaskMergeIterator::do_get_next()
    @          0x1f9623c starrocks::vectorized::RowsetMergerImpl<>::_do_merge_vertically()
    @          0x1f97a93 starrocks::vectorized::RowsetMergerImpl<>::do_merge()
    @          0x1f8b12c starrocks::vectorized::compaction_merge_rowsets()
    @          0x1e74e9c starrocks::TabletUpdates::_do_compaction()
    @          0x1e75e8c starrocks::TabletUpdates::compaction()
    @          0x1dec879 starrocks::StorageEngine::_perform_update_compaction()
    @          0x1fed5be starrocks::StorageEngine::_update_compaction_thread_callback()
    @          0x631ac60 execute_native_thread_routine
    @     0x7f5bd7517ea5 start_thread
    @     0x7f5bd6b3296d __clone
    @                0x0 (unknown)

好的,就是insert into 大表,比较占内存

具体SQL发下。

insert into dwd.CHARGE_DETAIL_PRI
select
IN_HOS_ID ,
CHARGE_DETAIL_CODE,
ID,
PARTITION_ID ,
ORGANIZATION_CODE ,
ORGANIZATION_NAME ,
DIVISION_CODE ,
DIVISION_NAME,
HIS_PATIENT_ID,

`REC_NUMBER` ,
`PATIENT_TYPE` ,
`PATIENT_TYPE_NAME`,
`CHARGE_ORDER_DATE` ,
`CHARGE_ACTUAL_DATE`,
`ORDER_BRANCH_CODE` ,
`ORDER_BRANCH_NAME`,
`ORDER_DEPT_CODE` ,
`ORDER_DEPT_NAME` ,
`ORDER_DEPT_KIND_CODE` ,
`ORDER_DEPT_KIND_NAME` ,
`ORDER_DEPT_ATTR_CODE` ,
`ORDER_DEPT_ATTR_NAME` ,
`ORDER_DEPT_TYPE_CODE` ,
`ORDER_DEPT_TYPE_NAME` ,
`ORDER_DEPT_FIRST_LEVEL_CODE` ,
`ORDER_DEPT_FIRST_LEVEL_NAME` ,
`ORDER_DEPT_SECOND_LEVEL_CODE`,
`ORDER_DEPT_SECOND_LEVEL_NAME`,
`ORDER_DEPT_THIRD_LEVEL_CODE` ,
`ORDER_DEPT_THIRD_LEVEL_NAME` ,
`ORDER_DEPT_FOURTH_LEVEL_CODE`,
`ORDER_DEPT_FOURTH_LEVEL_NAME`,
`ORDER_DEPT_FIFTH_LEVEL_CODE` ,
`ORDER_DEPT_FIFTH_LEVEL_NAME` ,
`ORDER_DEPT_LEVEL_CODE`,
`ORDER_DEPT_LEVEL_NAME`,
`ORDER_DEPT_BUSINESS_KIND_CODE` ,
`ORDER_DEPT_BUSINESS_KIND_NAME` ,
`ORDER_DEPT_TREAT_KIND_CODE` ,
`ORDER_DEPT_TREAT_KIND_NAME` ,
`ORDER_DEPT_IS_LAST` ,
`ORDER_GROUP_CODE` ,
`ORDER_GROUP_NAME` ,
`ORDER_EMP_CODE` ,
`ORDER_EMP_NAME` ,
`PAT_DEPT_CODE` ,
`PAT_DEPT_NAME` ,
`CHARGE_EXEC_DATE` ,
`EXEC_BRANCH_CODE` ,
`EXEC_BRANCH_NAME`,
`EXEC_DEPT_CODE`,
`EXEC_DEPT_NAME`,
`EXEC_DEPT_KIND_CODE`,
`EXEC_DEPT_KIND_NAME` ,
`EXEC_DEPT_ATTR_CODE`,
`EXEC_DEPT_ATTR_NAME` ,
`EXEC_DEPT_TYPE_CODE` ,
`EXEC_DEPT_TYPE_NAME`,
`EXEC_DEPT_FIRST_LEVEL_CODE` ,
`EXEC_DEPT_FIRST_LEVEL_NAME`,
`EXEC_DEPT_SECOND_LEVEL_CODE`,
`EXEC_DEPT_SECOND_LEVEL_NAME`,
`EXEC_DEPT_THIRD_LEVEL_CODE`,
`EXEC_DEPT_THIRD_LEVEL_NAME`,
`EXEC_DEPT_FOURTH_LEVEL_CODE`,
`EXEC_DEPT_FOURTH_LEVEL_NAME`,
`EXEC_DEPT_FIFTH_LEVEL_CODE`,
`EXEC_DEPT_FIFTH_LEVEL_NAME`,
`EXEC_DEPT_LEVEL_CODE`,
`EXEC_DEPT_LEVEL_NAME`,
`EXEC_DEPT_BUSINESS_KIND_CODE`,
`EXEC_DEPT_BUSINESS_KIND_NAME`,
`EXEC_DEPT_TREAT_KIND_CODE`,
`EXEC_DEPT_TREAT_KIND_NAME` ,
`EXEC_DEPT_IS_LAST`,
`EXEC_WARD_CODE`,
`EXEC_WARD_NAME` ,
`EXEC_GROUP_CODE` ,
`EXEC_GROUP_NAME`,
`EXEC_EMP_CODE`,
`EXEC_EMP_NAME`,
`CHARGE_KIND_CODE`,
`CHARGE_KIND_NAME`,

`CHARGE_DETAIL_NAME`,
`CHARGE_NUM`,
`CHARGE_PRICE` ,
`CHARGE_AMOUNT`,
`CASHIER_EMP_CODE`,
`CASHIER_EMP_NAME`,
`PRI_DOC_ID`,
`PRI_DOC_NAME`,
`DRUG_FLAG`,
`DRUG_FLAG_NAME`,
`OWNER_ORG_CODE`,
`OWNER_ORG_NAME`,
`PROVINCE_CODE`,
`PROVINCE_NAME`,
`CITY_CODE`,
`CITY_NAME`,
`DISTRICT_CODE`,
`DISTRICT_NAME`,
`DATA_UPDATE_TIME`,
`STD_UPDATE_TIME`
	from dwd.CHARGE_DETAIL where `IN_HOS_ID` is not null and `CHARGE_DETAIL_CODE` is not null