CN隔一段时间就滚动重启,任务异常

复现了

| MultiCastDataSinks
| STREAM DATA SINK
| EXCHANGE ID: 04
| RANDOM
| STREAM DATA SINK
| EXCHANGE ID: 12
| RANDOM
| 3:REPEAT_NODE
| | repeat: repeat 1 lines [[2], [3]]
| |
| 2:Decode
| | <dict id 30> : <string id 2>
| | <dict id 31> : <string id 3>
| | <dict id 32> : <string id 8>
| |
| 1:MERGING-EXCHANGE
| offset: 3
| limit: 1

这个 repeat_node后面需要 加个 project node.

大佬, 合到3.1.1.4版本了吗?

StarRocks-3.1.14-centos-amd64.tar.gz
我从3.1.11升级到了这个包,还是报这个错。

这个估计要3.1.15猜能进这个PR,我这里可以基于3.1.14打一个patch给你

或者你自己打这个 https://github.com/StarRocks/starrocks/pull/48787 patch也行

好的 谢谢大佬,麻烦大佬打个patch,我这边机器不行,编译不了sr。

大佬,

昨天我们升级了私有化环境的一个sr,3fe + 3be,3.1.7升级到3.1.11 非FQDN版本,非K8S,物理机部署。

也发现了这个grouping sets的问题

多试了几次,每次都报错:rpc failed, host: 192.168.75.11
这个会导致所有be全部重启,以前反馈的是FQDN模式下,单个be/cn重启

MySQL [starrocks_audit_db__]> select
one_level , two_level , three_level , four_level , five_level ,
count(distinct device_cmt) device_cmt , count(distinct online_cmt) online_cmt , count(distinct unonline_cmt) unonline_cmt ,
count(distinct safe_cmt) safe_cmt , count(distinct care_cmt) care_cmt , count(distinct treat_cmt) treat_cmt ,
count(distinct health_cmt) health_cmt , count(distinct sport_cmt) sport_cmt , count(distinct execise_cmt) execise_cmt ,
count(distinct home_cmt) home_cmt , count(distinct visit_cmt) visit_cmt ,
count(distinct other_cmt) other_cmt from (
select
one_level , two_level , three_level , four_level , five_level , od.device_mac device_cmt ,
case when tb.online_status = 1 then od.device_mac else null end online_cmt ,
case when tb.online_status = 0 then od.device_mac else null end unonline_cmt ,
case when tp.product_type = 1 then od.device_mac else null end safe_cmt ,
case when tp.product_type = 2 then od.device_mac else null end care_cmt ,
case when tp.product_type = 3 then od.device_mac else null end treat_cmt ,
case when tp.product_type = 4 then od.device_mac else null end health_cmt ,
case when tp.product_type = 5 then od.device_mac else null end sport_cmt ,
case when tp.product_type = 6 then od.device_mac else null end execise_cmt ,
case when tp.product_type = 7 then od.device_mac else null end home_cmt ,
case when tp.product_type = 8 then od.device_mac else null end visit_cmt ,
case when tp.product_type not in (1, 2, 3, 4, 5, 6, 7, 8) then od.device_mac else null end other_cmt
from db_elderly_care_ods.ods_tb_org_device_dap d
join db_elderly_care_ods.ods_tb_device_operation_org_dap od – 挂机构
on d.org_device_id = od.org_device_id
left join db_elderly_care_ods.ods_tb_device_dap tb – 挂设备状态
on d.device_id = tb.device_id
left join db_elderly_care_ods.ods_tb_product_dap tp – 挂类型
on d.product_id = tp.product_id
join db_elderly_care_dim.dim_org_level_dap e
on od.operation_org_id = e.org_id and e.org_type_id in (3, 4, 5, 6, 7, 8, 9, 10) – 要有集团与子集团
where d.delete_status = 0 and date(d.create_time) < date_trunc(‘year’,add_months(‘2024-09-13’, 12)) – 时间
) ac_device
group by grouping sets ( (one_level) , (two_level) , (three_level) , (four_level) , (five_level) ) ;
ERROR 1064 (HY000): rpc failed, host: 192.168.75.11

query_id:d6d6be9a-7198-11ef-bd72-fa163eb4f0f4, fragment_instance:d6d6be9a-7198-11ef-bd72-fa163eb4f10a
tracker:process consumption: 1259350992
tracker:query_pool consumption: 34882364
tracker:load consumption: 0
tracker:metadata consumption: 125786246
tracker:tablet_metadata consumption: 63214581
tracker:rowset_metadata consumption: 59050031
tracker:segment_metadata consumption: 778974
tracker:column_metadata consumption: 2742660
tracker:tablet_schema consumption: 1033269
tracker:segment_zonemap consumption: 470064
tracker:short_key_index consumption: 4674
tracker:column_zonemap_index consumption: 1045668
tracker:ordinal_index consumption: 580704
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 164074696
tracker:page_cache consumption: 151605088
tracker:update consumption: 25052239
tracker:chunk_allocator consumption: 119097208
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1726208672 (unix time) try “date -d @1726208672” if you are using GNU date ***
PC: @ 0x2a9d805 starrocks::FixedLengthColumnBase<>::append()
*** SIGSEGV (@0x74f) received by PID 2449295 (TID 0x7f5a3d9e7700) from PID 1871; stack trace: ***
@ 0x651eae2 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f5add2044c0 (unknown)
@ 0x2a9d805 starrocks::FixedLengthColumnBase<>::append()
@ 0x32af7a2 starrocks::NullableColumn::append()
@ 0x328f2e2 starrocks::Chunk::append()
@ 0x4f70be9 starrocks::ChunkPipelineAccumulator::push()
@ 0x36eddfc starrocks::pipeline::ChunkAccumulateOperator::push_chunk()
@ 0x36756b8 starrocks::pipeline::PipelineDriver::process()
@ 0x3665c7e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x2d01fec starrocks::ThreadPool::dispatch_thread()
@ 0x2cfbc9a starrocks::thread::supervise_thread()
@ 0x7f5add1f9f1b (unknown)
@ 0x7f5adcf831c0 clone
@ 0x0 (unknown)

升级导3.1.15

好的。不确定3.1.15是否覆盖修复了这个bug。跟之前尝试不一样,我们计划晚上升级3.1.15试一下