主键模型,没有分区,没有开启enable_persistent_index 写磁盘,一次倒入2亿条数据 导致be挂掉

【详述】主键模型,没有分区,没有开启enable_persistent_index 写磁盘,一次倒入2亿条数据,
导致be挂掉,而且be无法启动,启动后内存迅速涨满,然后be挂掉。
【导入/导出方式】hdfs LOAD 导入
【背景】一次倒入2亿条数据
【业务影响】 测试环境,暂无影响
【StarRocks版本】2.3.0
【集群规模】1fe+3be 混合部署
【机器信息】CPU虚拟核/内存/网卡,16C/64G/万兆
【附件】

  • be.out
    start time: Tue Nov 8 17:52:44 CST 2022
    tcmalloc: large alloc 1568088064 bytes == 0x112882000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    tcmalloc: large alloc 2089787392 bytes == 0x16fff4000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    tcmalloc: large alloc 2089213952 bytes == 0x1ed0ee000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    tcmalloc: large alloc 2000945152 bytes == 0x2a9100000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    tcmalloc: large alloc 2079457280 bytes == 0x3d9cee000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    tcmalloc: large alloc 2050834432 bytes == 0x455c0e000 @ 0x57b2cff 0x5a4435c 0x2072b38 0x59947f5 0x1c4511a 0x1c454a5 0x7f96390b520b
    terminate called after throwing an instance of ‘std::bad_alloc’
    what(): std::bad_alloc
    *** Aborted at 1667901243 (unix time) try “date -d @1667901243” if you are using GNU date ***
    PC: @ 0x7f9638609387 __GI_raise
    *** SIGABRT (@0xa7cb) received by PID 42955 (TID 0x7f962d1eb700) from PID 42955; stack trace: ***
    @ 0x3f90ad2 google::(anonymous namespace)::FailureSignalHandler()
    @ 0x7f96390be630 (unknown)
    @ 0x7f9638609387 __GI_raise
    @ 0x7f963860aa78 __GI_abort
    @ 0x187f1fd _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
    @ 0x59942a6 __cxxabiv1::__terminate()
    @ 0x5994311 std::terminate()
    @ 0x5994464 __cxa_throw
    @ 0x1c1d905 phmap::priv::raw_hash_set<>::resize()
    @ 0x1c1f3f3 phmap::priv::raw_hash_set<>::prepare_insert()
    @ 0x1c1ffae starrocks::SliceHashIndex::upsert()
    @ 0x1c183d6 starrocks::ShardByLengthSliceHashIndex::upsert()
    @ 0x1c0af78 starrocks::PrimaryIndex::upsert()
    @ 0x1a92395 starrocks::TabletUpdates::_apply_rowset_commit()
    @ 0x1a96463 starrocks::TabletUpdates::do_apply()
    @ 0x21294fd starrocks::ThreadPool::dispatch_thread()
    @ 0x2124d0a starrocks::thread::supervise_thread()
    @ 0x7f96390b6ea5 start_thread
    @ 0x7f96386d1b0d __clone
    @ 0x0 (unknown)

抓取的内存分区情况
curl -XGET -s http://be_ip:8040/metrics | grep “^starrocks_be_.*_mem_bytes|^starrocks_be_tcmalloc_bytes_in_use”
starrocks_be_chunk_allocator_mem_bytes 0
starrocks_be_clone_mem_bytes 0
starrocks_be_column_pool_mem_bytes 0
starrocks_be_compaction_mem_bytes 0
starrocks_be_consistency_mem_bytes 0
starrocks_be_load_mem_bytes 0
starrocks_be_process_mem_bytes 52667726256
starrocks_be_query_mem_bytes 0
starrocks_be_schema_change_mem_bytes 0
starrocks_be_storage_page_cache_mem_bytes 0
starrocks_be_tablet_meta_mem_bytes 24839391
starrocks_be_tcmalloc_bytes_in_use 59575105704
starrocks_be_update_mem_bytes 46712088958

主键长度是多少,不分区是有什么特殊考虑吗?

长度:32+8+11
不加分区是查询的时候,不能做分区时间限制,不确定数据是哪一天的,再考虑其他的分区键是否可行

目前主要是be 起不来了,已启动 内存就打满了,就挂掉了

这个表有多少列?

be.conf 添加过什么配置吗?

已经导入了多少条数据了,这张表总共现在有多少行数据

应该是导入了有2-3亿条数据

没有添加过其他的配置,有11列

be上挂了几块盘?

一次导入两亿条,我估计导入成功了2次了,用3亿条数据算,现在你这个场景内存峰值大概要用45G,4亿条会更多

建表语句可以发下吗?

先把表Drop掉,开persistent index,试试,persistent index是用来解决这个问题的。
你的表分了多少桶?

CREATE TABLE dim.dim_info(
dt VARCHAR(32) COMMENT ‘’,
id VARCHAR(32) COMMENT ‘’,
code VARCHAR(32) COMMENT ‘’,
guid VARCHAR(32) COMMENT ‘’,
num1 int COMMENT ‘’,
num2 int COMMENT ‘’,
status int COMMENT ‘’,
time VARCHAR(32) COMMENT ‘’,
info VARCHAR(32) COMMENT ‘’,
num5 VARCHAR(32) COMMENT ‘’,
num6 int COMMENT ‘’,
INDEX index_status (status) USING BITMAP COMMENT ‘status索引’
)PRIMARY KEY(dt,id,code)
DISTRIBUTED BY HASH(dt,id) BUCKETS 3
PROPERTIES(
“bloom_filter_columns” = “id,code”
);
表结构是这个嘛,修改了字段名称

分桶太少了,创建更多的分桶

没有drop 掉,数据应该是没删掉,现在的问题是 be启动不了,启动起来不用10s 就挂了

如果不用persistent index ,可以加下分桶数。比如64个

内存很快就涨满了,然后就挂了

加个微信,详细聊下?

3个be 每个 8块盘