查询sql导致be宕机

【详述】一个查询,导致be全挂
【背景】无
【业务影响】 导致服务不可用
【StarRocks版本】2.3.4
【集群规模】3fe(3 follower)+7be
【机器信息】CPU虚拟核/内存/网卡,:56C/256G/万兆
【附件】

查询sql
:SELECT level_first,count(1),sum(gmv) FROM furion.consume_member_section_label_wide_test a ,furion.tag_user_bitmap_test_1 b
WHERE enddate=‘2022-12-12’ AND consume_type=‘集团新客首消项目’ AND tag_id=1111
AND bitmap_contains(b.users,mapid)
GROUP BY level_first

建表语句1:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| consume_member_section_label_wide_test | CREATE TABLE consume_member_section_label_wide_test (
mapid bigint(20) NOT NULL COMMENT “mapid”,
enddate date NOT NULL COMMENT “截止日期”,
consume_type varchar(65533) NOT NULL COMMENT “消费类型”,
product_name varchar(65533) NULL COMMENT “项目名称”,
orders bigint(20) NULL COMMENT “订单数”,
revenue_before decimal64(18, 2) NULL COMMENT “促前营收”,
revenue_after decimal64(18, 2) NULL COMMENT “促后营收”,
gmv decimal64(18, 2) NULL COMMENT “GMV”,
consume_diff int(11) NULL COMMENT “消费间隔”,
coupon_percent decimal64(18, 4) NULL COMMENT “营收占比”,
level_first varchar(65533) NULL COMMENT “一级分类”,
consume_diff_label varchar(65533) NULL COMMENT “消费间隔标签”,
coupon_percent_label varchar(65533) NULL COMMENT “促销占比标签”,
resident_city_name varchar(65533) NULL COMMENT “常驻城市”,
flow_date_diff int(11) NULL COMMENT “访问流失间隔”,
flow_date_diff_label varchar(65533) NULL COMMENT “访问流失标签”,
marketing_counts int(11) NULL COMMENT “营销频次”,
marketing_counts_label varchar(65533) NULL COMMENT “营销频次标签”,
INDEX bi_consume_type (consume_type) USING BITMAP COMMENT ‘’,
INDEX bi_product_name (product_name) USING BITMAP COMMENT ‘’,
INDEX bi_consume_diff_label (consume_diff_label) USING BITMAP COMMENT ‘’,
INDEX bi_level_first (level_first) USING BITMAP COMMENT ‘’,
INDEX bi_coupon_percent_label (coupon_percent_label) USING BITMAP COMMENT ‘’,
INDEX bi_resident_city_name (resident_city_name) USING BITMAP COMMENT ‘’,
INDEX bi_flow_date_diff_label (flow_date_diff_label) USING BITMAP COMMENT ‘’,
INDEX bi_marketing_counts_label (marketing_counts_label) USING BITMAP COMMENT ‘’
) ENGINE=OLAP
PRIMARY KEY(mapid, enddate, consume_type)
COMMENT “OLAP”
PARTITION BY RANGE(enddate)
(PARTITION p20221209 VALUES [(‘2022-12-09’), (‘2022-12-10’)),
PARTITION p20221210 VALUES [(‘2022-12-10’), (‘2022-12-11’)),
PARTITION p20221211 VALUES [(‘2022-12-11’), (‘2022-12-12’)),
PARTITION p20221212 VALUES [(‘2022-12-12’), (‘2022-12-13’)),
PARTITION p20221213 VALUES [(‘2022-12-13’), (‘2022-12-14’)),
PARTITION p20221214 VALUES [(‘2022-12-14’), (‘2022-12-15’)),
PARTITION p20221215 VALUES [(‘2022-12-15’), (‘2022-12-16’)),
PARTITION p20221216 VALUES [(‘2022-12-16’), (‘2022-12-17’)),
PARTITION p20221217 VALUES [(‘2022-12-17’), (‘2022-12-18’)),
PARTITION p20221218 VALUES [(‘2022-12-18’), (‘2022-12-19’)))
DISTRIBUTED BY HASH(mapid) BUCKETS 10
PROPERTIES (
“replication_num” = “3”,
“colocate_with” = “group_furion_comsume_member_section”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.time_zone” = “Asia/Shanghai”,
“dynamic_partition.start” = “-365”,
“dynamic_partition.end” = “3”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “10”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”
);

建表语句2:

| tag_user_bitmap_test_1 | CREATE TABLE tag_user_bitmap_test_1 (
tag_id int(11) NULL COMMENT “”,
input_date date NULL COMMENT “”,
tag_value bigint(20) NULL COMMENT “”,
users bitmap BITMAP_UNION NOT NULL COMMENT “访问用户id”
) ENGINE=OLAP
AGGREGATE KEY(tag_id, input_date, tag_value)
COMMENT “OLAP”
PARTITION BY RANGE(input_date)
(PARTITION p20221209 VALUES [(‘2022-12-09’), (‘2022-12-10’)),
PARTITION p20221210 VALUES [(‘2022-12-10’), (‘2022-12-11’)),
PARTITION p20221211 VALUES [(‘2022-12-11’), (‘2022-12-12’)),
PARTITION p20221212 VALUES [(‘2022-12-12’), (‘2022-12-13’)),
PARTITION p20221213 VALUES [(‘2022-12-13’), (‘2022-12-14’)),
PARTITION p20221214 VALUES [(‘2022-12-14’), (‘2022-12-15’)),
PARTITION p20221215 VALUES [(‘2022-12-15’), (‘2022-12-16’)),
PARTITION p20221216 VALUES [(‘2022-12-16’), (‘2022-12-17’)),
PARTITION p20221217 VALUES [(‘2022-12-17’), (‘2022-12-18’)),
PARTITION p20221218 VALUES [(‘2022-12-18’), (‘2022-12-19’)))
DISTRIBUTED BY HASH(tag_id) BUCKETS 10
PROPERTIES (
“replication_num” = “3”,
“dynamic_partition.enable” = “true”,
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.time_zone” = “Asia/Shanghai”,
“dynamic_partition.start” = “-365”,
“dynamic_partition.end” = “3”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “10”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”,
“enable_persistent_index” = “false”
);

麻烦您提供下be Crash时间段的be.out日志,我们查看下对应的堆栈信息

方便发下BE.out文件嘛 然后您be crash是什么时间您也提供下 ,当前sql导致be宕机的,会在be.out堆栈里面打印出对应的query_id的,您上述图片的信息不是很全,麻烦了

tcmalloc: large alloc 2270535680 bytes == 0x7f979237c000 @ 0x5832bbf 0x5ac421c 0x20c7ac8 0x5a146b5 0x1c94faa 0x1c95335 0x7fb600fe320b
tcmalloc: large alloc 2267029504 bytes == 0x7f91dc8b6000 @ 0x5832bbf 0x5ac421c 0x20c7ac8 0x5a146b5 0x1c94faa 0x1c95335 0x7fb600fe320b
tcmalloc: large alloc 2274074624 bytes == 0x7f9531416000 @ 0x5832bbf 0x5ac421c 0x20c7ac8 0x5a146b5 0x1c94faa 0x1c95335 0x7fb600fe320b
tcmalloc: large alloc 2272960512 bytes == 0x7f9354522000 @ 0x5832bbf 0x5ac421c 0x20c7ac8 0x5a146b5 0x1c94faa 0x1c95335 0x7fb600fe320b
tcmalloc: large alloc 2267037696 bytes == 0x7f979237c000 @ 0x5832bbf 0x5ac421c 0x20c7ac8 0x5a146b5 0x1c94faa 0x1c95335 0x7fb600fe320b
src/central_freelist.cc:333] tcmalloc: allocation failed 16384
src/central_freelist.cc:333] tcmalloc: allocation failed 16384
terminate called recursively
terminate called after throwing an instance of ‘std::runtime_error’
what(): failed memory alloc in constructorquery_id:048c57d1-7c44-11ed-ba5e-024246d62641, fragment_instance:048c57d1-7c44-11ed-ba5e-024246d62646
*** Aborted at 1671086809 (unix time) try “date -d @1671086809” if you are using GNU date ***

PC: @ 0x7fb600537387 __GI_raise
*** SIGABRT (@0x1393ad) received by PID 1282989 (TID 0x7fb55d8bc700) from PID 1282989; stack trace: ***
@ 0x4010972 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fb600fec630 (unknown)
@ 0x7fb600537387 __GI_raise
@ 0x7fb600538a78 __GI_abort
@ 0x5a156b2 __gnu_cxx::__verbose_terminate_handler()
@ 0x5a14166 __cxxabiv1::__terminate()
@ 0x5a141d1 std::terminate()
@ 0x5a14376 __cxa_rethrow
@ 0x160fefc ZNSt8_Rb_treeIjSt4pairIKj7RoaringESt10_Select1stIS3_ESt4lessIjESaIS3_EE7_M_copyINS9_11_Alloc_nodeEEEPSt13_Rb_tree_nodeIS3_EPKSD_PSt18_Rb_tree_node_baseRT.isra.0.cold
@ 0x20dbb0b starrocks::BitmapValue::BitmapValue()
@ 0x253879b starrocks::vectorized::ObjectColumn<>::append()
@ 0x2538b22 starrocks::vectorized::ObjectColumn<>::append_value_multiple_times()
@ 0x28d0410 starrocks::pipeline::CrossJoinLeftOperator::_copy_joined_rows_with_index_base_build()
@ 0x28d0c82 starrocks::pipeline::CrossJoinLeftOperator::pull_chunk()
@ 0x28f8513 starrocks::pipeline::PipelineDriver::process()
@ 0x28eedfc starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x217e939 starrocks::ThreadPool::dispatch_thread()
@ 0x217a4ea starrocks::thread::supervise_thread()
@ 0x7fb600fe4ea5 start_thread
@ 0x7fb6005ff8dd __clone
@ 0x0 (unknown)

be.out.temp (97.1 KB)

crash时间,就是上面监控图中,内存飙升的时间

您好 这个看堆栈是一个已知问题 我在确认下 麻烦能发下这个query查询的profile吗? 生成方法您参考: https://docs.starrocks.io/zh-cn/latest/quick_start/Import_and_query#查看-profile-并分析查询瓶颈
,然后您也可以关闭Pipeline引擎在执行下该sql尝试下看能否复现