另外这个bt看着只一个线程starrocks::StarOSWorker::get_shard_filesystem, 取到core时, 集群表象是卡住超时了吗?
取 gcore 时, pod 所在的 node 和 ip 没更换, 但是显示 cn 进程重启了, 图片是 K9S , 其中红圈处会 +1
gcore 文件20 多个 G, 我周一再重新导出一份, 用网盘传给你们吧
gcore时, 进程处于不响应状态. pod healthy check会失败, kubelet会杀pod重新拉起.
重新导出了一份 core文件20多 G,
百度网盘链接 通过网盘分享的文件:starrocks_be_core.27
链接: https://pan.baidu.com/s/1F6TfxLW-yJiMDBbMvs4l1A?pwd=av8k 提取码: av8k
–来自百度网盘超级会员v1的分享
CN是哪个版本?
这个 core 是基于 3.5.14 导出来的
Thread 532 (Thread 0x7f978de36640 (LWP 755)):
#0 __futex_abstimed_wait_common (cancel=false, private=<optimized out>, abstime=0x0, clockid=0, expected=3, futex_word=0x7f98ae3e222c) at ./nptl/futex-internal.c:103
#1 __GI___futex_abstimed_wait64 (futex_word=futex_word@entry=0x7f98ae3e222c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:128
#2 0x00007f98b369224f in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x7f98ae3e2220) at ./nptl/pthread_rwlock_common.c:730
#3 ___pthread_rwlock_wrlock (rwlock=0x7f98ae3e2220) at ./nptl/pthread_rwlock_wrlock.c:26
#4 0x000000000bf2ee29 in std::__glibcxx_rwlock_wrlock (__rwlock=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:80
#5 std::__shared_mutex_pthread::lock (this=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:193
#6 std::shared_mutex::lock (this=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:420
#7 std::unique_lock<std::shared_mutex>::lock (this=<synthetic pointer>) at /usr/include/c++/11/bits/unique_lock.h:139
#8 std::unique_lock<std::shared_mutex>::unique_lock (__m=..., this=<synthetic pointer>) at /usr/include/c++/11/bits/unique_lock.h:69
#9 starrocks::StarOSWorker::new_shared_filesystem (this=this@entry=0x7f98ae3e2190, scheme=..., conf=...) at be/src/service/staros_worker.cpp:364
#10 0x000000000bf30d34 in starrocks::StarOSWorker::build_filesystem_from_shard_info (this=this@entry=0x7f98ae3e2190, info=..., conf=...) at /usr/include/c++/11/string_view:137
#11 0x000000000bf323de in starrocks::StarOSWorker::get_shard_filesystem (this=0x7f98ae3e2190, id=83521, conf=...) at be/src/service/staros_worker.cpp:239
#12 0x00000000081ead1c in starrocks::StarletFileSystem::get_shard_filesystem (shard_id=<optimized out>, this=0x7f97b26cf000) at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#13 starrocks::StarletFileSystem::delete_dir (this=0x7f97b26cf000, dirname=...) at be/src/fs/fs_starlet.cpp:467
#14 0x000000000a7deb31 in starrocks::lake::LoadSpillBlockManager::clear_parent_path (this=this@entry=0x7f96e53f68e0) at be/src/storage/lake/load_spill_block_manager.cpp:103
#15 0x000000000a7df033 in starrocks::lake::LoadSpillBlockManager::~LoadSpillBlockManager (this=this@entry=0x7f96e53f68e0, __in_chrg=<optimized out>) at be/src/storage/lake/load_spill_block_manager.cpp:90
mutex lock: this=0x7f98ae3e2220
(gdb) p /x *this
$6 = {_M_rwlock = {__data = {__readers = 0xfffffffb, __writers = 0x0, __wrphase_futex = 0x1, __writers_futex = 0x3, __pad3 = 0x0, __pad4 = 0x0, __cur_writer = 0x2fa, __shared = 0x0, __rwelision = 0x0, __pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __pad2 = 0x0, __flags = 0x0}, __size = {0xfb, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x3, 0x0 <repeats 11 times>, 0xfa, 0x2, 0x0 <repeats 30 times>}, __align = 0xfffffffb}}
当前锁被【写锁独占】(不是读锁共享)
写锁持有者线程 TID = 762
写锁被重入了 5 次
无其他线程等待写锁
(gdb) t 539
[Switching to thread 539 (Thread 0x7f979133d640 (LWP 762))]
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000000010266dda in bthread::futex_wait_private (timeout=0x0, expected=<optimized out>, addr1=<optimized out>) at ./src/bthread/sys_futex.h:40
#2 bthread::ParkingLot::wait (expected_state=..., this=<optimized out>) at ./src/bthread/parking_lot.h:60
#3 bthread::TaskGroup::wait_task (this=this@entry=0x7f98a8b44080, tid=tid@entry=0x7f979132d6f8) at src/bthread/task_group.cpp:133
#4 0x0000000010269d6b in bthread::TaskGroup::run_main_task (this=this@entry=0x7f98a8b44080) at src/bthread/task_group.cpp:161
#5 0x0000000010263dc2 in bthread::TaskControl::worker_thread (arg=<optimized out>) at src/bthread/task_control.cpp:99
#6 0x00007f98b368bac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7 0x00007f98b371d8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
TID 762是一个bthread worker线程. 大概率是bthread持锁又切换线程导致unlock()失效.
尝试复现crash在第一现场
export PTHREAD_MUTEX_ERRORCHECK=1
在POD里注入这个环境变量再尝试复现, mutex跨线程释放锁时, 进程会crash.
可能有更多信息排查问题.
另外, 这是一个稳定复现的问题吗? 可以看看是不是有一个最小复现步骤, 我也复现看看.
最小复现步骤,
【集群版本】 直接部署 3.5.14
【集群规模】 3个fe(node 8核心 20G 内存 50G 磁盘)
1个cn(node 16核心 64G 内存 100G 磁盘)
【Hadoop版本】 基于 Apache Ambari 2.7.5.0 部署的 hadoop 3.1.1版
cn 的 cm 配置:
# fe config
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-cn-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
cn.conf: |
JAVA_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
storage_root_path = /opt/starrocks/cn/storage/root
spill_local_storage_dir = /opt/starrocks/cn/storage/spill
datacache_enable = false
datacache_mem_size = 10%
datacache_disk_size = 107374182400
mem_limit = 90%
report_task_interval_seconds = 10
starlet_star_cache_disk_size_percent = 20
fe 的 cm 配置
# fe config
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-fe-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
fe.conf: |
LOG_DIR = /opt/starrocks/fe/log
DATE = "$(date +%Y%m%d-%H%M%S)"
JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xms8192m -Xmx8192m -XX:+UseG1GC -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE -XX:ErrorFile=${LOG_DIR}/hs_err_pid%p.log -Djava.security.policy=${STARROCKS_HOME}/conf/udf_security.policy"
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
sys_log_level = INFO
mysql_service_nio_enabled = true
tablet_create_timeout_second = 60
fast_schema_evolution = true
# config for shared-data mode
run_mode = shared_data
cloud_native_meta_port = 6090
cloud_native_storage_type = S3
enable_load_volume_from_conf = false
enable_udf = true
max_automatic_partition_number = 87600
enable_statistic_collect = false
enable_collect_full_statistic = false
hadoop 的 cm 配置
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-hdfs-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
hadoop_env.sh: |
export HADOOP_USER_NAME="starrocks"
export HADOOP_CLASSPATH=${STARROCKS_HOME}/lib/hadoop/common/*:${STARROCKS_HOME}/lib/hadoop/common/lib/*:${STARROCKS_HOME}/lib/hadoop/hdfs/*:${STARROCKS_HOME}/lib/hadoop/hdfs/lib/*
if [ -z "${HADOOP_USER_NAME}" ]
then
if [ -z "${USER}" ]
then
export HADOOP_USER_NAME=$(id -u -n)
else
export HADOOP_USER_NAME=${USER}
fi
fi
if [ ${HADOOP_CONF_DIR}"X" != "X" ]; then
export HADOOP_CLASSPATH=${HADOOP_CONF_DIR}:${HADOOP_CLASSPATH}
fi
hdfs-site.xml: |
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>dfs.nameservices</name>
<value>ljx</value>
</property>
<property>
<name>dfs.ha.namenodes.ljx</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ljx.nn1</name>
<value>ljx-bd-c1-nn01.ljximing.int:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ljx.nn2</name>
<value>ljx-bd-c1-nn02.ljximing.int:8020</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ljx</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
starrokcs 的 StarRocksCluster 配置
apiVersion: starrocks.com/v1
kind: StarRocksCluster
metadata:
name: starrocks
namespace: bd-starrocks
spec:
starRocksFeSpec:
image: harbor.yowin.mobi/bd/starrocks-fe-ubuntu:3.5.14
podLabels:
app: starrocks-fe
cluster: starrocks
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- starrocks-fe
topologyKey: kubernetes.io/hostname
feEnvVars:
- name: "MYSQL_PWD"
valueFrom:
secretKeyRef:
name: sr-credential
key: password
replicas: 3
requests:
cpu: 7
memory: 16Gi
limits:
cpu: 7
memory: 18Gi
storageVolumes:
- name: starrocks-fe-storage
storageClassName: oci-bv
storageSize: 100Gi
mountPath: /opt/starrocks/fe/meta
service:
type: LoadBalancer
configMapInfo:
configMapName: starrocks-fe-cm
resolveKey: fe.conf
configMaps:
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/fe/conf/hdfs-site.xml
subPath: "hdfs-site.xml"
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/fe/conf/hadoop_env.sh
subPath: "hadoop_env.sh"
starRocksCnSpec:
image: harbor.yowin.mobi/bd/starrocks-cn-ubuntu:3.5.14
podLabels:
app: starrocks-cn
cluster: starrocks
cnEnvVars:
- name: "MYSQL_PWD"
valueFrom:
secretKeyRef:
name: sr-credential
key: password
requests:
cpu: 14
memory: 57Gi
replicas: 1
configMapInfo:
configMapName: starrocks-cn-cm
resolveKey: cn.conf
configMaps:
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/cn/conf/hdfs-site.xml
subPath: "hdfs-site.xml"
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/cn/conf/hadoop_env.sh
subPath: "hadoop_env.sh"
storageVolumes:
- name: starrocks-cn-storage
storageClassName: oci-bv
storageSize: 100Gi
mountPath: /opt/starrocks/cn/storage
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Delete
1. 初始化集群root账号
2. 创建 volume_hdfs
CREATE STORAGE VOLUME volume_hdfs
TYPE = HDFS
LOCATIONS = ("hdfs://ljx/apps/starrocks/volume-test/")
PROPERTIES (
"username" = "starrocks",
"hadoop.security.authentication" = "simple"
);
SET volume_hdfs AS DEFAULT STORAGE VOLUME;
3. 创建库,表
creat database dmp;
CREATE TABLE dmp.adt_ip_country_source (
`__dt` datetime NOT NULL COMMENT "数据时间,精确到小时",
`ip` varchar(50) NOT NULL COMMENT "ip",
`country` varchar(3) NOT NULL COMMENT "国家",
`source` varchar(16) NOT NULL COMMENT "IP 来源"
) ENGINE=OLAP
COMMENT "IP 国家表, 用于 IP 活跃度统计"
PARTITION BY date_trunc('hour', __dt)
DISTRIBUTED BY HASH(`ip`, `country`) BUCKETS 1
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"datacache.partition_duration" = "1 days",
"enable_async_write_back" = "false",
"partition_live_number" = "840",
"replication_num" = "1",
"storage_volume" = "volume_hdfs"
);
4.当执行如下语句很快复现, 问题一旦发生, 在不重启 cn 和 fe 时, 后面每次执行都会触发卡住
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-01 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-02 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
#... 我就不贴具体语句了, 修改其中日期列, 一天一条, 然后放到 mysql 客户端中连续逐条执行, 很快就触发,一般 10 条后就卡住了
#... 使用 insert into 一个新分区的数据也会触发, 但触发率低
我试试看.
前段时间病了

加上这个 进行 测试,
第一次执行sql
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-01 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
cn 进程确实直接就 crash 了, crash 之前的日志
I20260413 07:20:24.693708 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d3b5d76-3709-11f1-a4e9-1e2de80dab7d, txn_id: 266461, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
I20260413 07:20:24.698909 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d3b5d76-3709-11f1-a4e9-1e2de80dab7d, txn_id: 266461, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
I20260413 07:20:24.703985 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d3b5d76-3709-11f1-a4e9-1e2de80dab7d, txn_id: 266461, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
I20260413 07:20:24.709102 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d3b5d76-3709-11f1-a4e9-1e2de80dab7d, txn_id: 266461, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
I20260413 07:20:24.738090 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d42d7de-3709-11f1-aecf-bedc8990b519, txn_id: 266462, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
I20260413 07:20:24.764576 140520190547520 tablet_sink_sender.cpp:353] Olap table sink statistics. load_id: 3d476bc1-3709-11f1-aecf-bedc8990b519, txn_id: 266463, add chunk time(ms)/wait lock time(ms)/num: {296476:(0)(0)(1)}
W20260413 07:20:28.208846 140517568968256 stack_util.cpp:437] 2026-04-13 07:20:28.208818, query_id=00000000-0000-0000-0000-000000000000, fragment_instance_id=00000000-0000-0000-0000-000000000000 throws exception: std::system_error, trace:
@ 0xc106faf __wrap___cxa_throw
@ 0x140619b8 std::__throw_system_error(int)
@ 0xbf2f2da starrocks::StarOSWorker::new_shared_filesystem(std::basic_string_view<char, std::char_traits<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<charB^B
@ 0xbf30d34 starrocks::StarOSWorker::build_filesystem_from_shard_info(staros::starlet::ShardInfo const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::aB^B
@ 0xbf323de starrocks::StarOSWorker::get_shard_filesystem(unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std:B^B
@ 0x81ead1c starrocks::StarletFileSystem::delete_dir(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
@ 0xa7deb31 starrocks::lake::LoadSpillBlockManager::clear_parent_path()
@ 0xa7df033 starrocks::lake::LoadSpillBlockManager::~LoadSpillBlockManager()
@ 0xbcaa96f starrocks::lake::DeltaWriter::~DeltaWriter()
@ 0xbef9cab starrocks::lake::AsyncDeltaWriter::~AsyncDeltaWriter()
@ 0xbeee5b8 starrocks::LakeTabletsChannel::~LakeTabletsChannel()
@ 0x8103d3a std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
@ 0xbdea345 starrocks::LoadChannel::_add_chunk(starrocks::Chunk*, starrocks::MonotonicStopWatch const*, starrocks::PTabletWriterAddChunkRequest const&, starrocks::PTabletWriterAddBatchResult*)
@ 0xbdeb545 starrocks::LoadChannel::add_chunks(starrocks::PTabletWriterAddChunksRequest const&, starrocks::PTabletWriterAddBatchResult*)
@ 0xbddf054 starrocks::LoadChannelMgr::add_chunks(starrocks::PTabletWriterAddChunksRequest const&, starrocks::PTabletWriterAddBatchResult*)
@ 0xbfafb41 starrocks::BackendInternalServiceImpl<starrocks::PInternalService>::tablet_writer_add_chunks(google::protobuf::RpcController*, starrocks::PTabletWriterAddChunksRequest const*, starrocks::PTabletWriterAddBatchResult*, google::protobuf::Closure*)
@ 0x1038f253 brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
@ 0x102b493b brpc::ProcessInputMessage(void*)
@ 0x102b5d84 brpc::InputMessenger::OnNewMessages(brpc::Socket*)
@ 0x102f26a2 brpc::Socket::ProcessEvent(void*)
@ 0x10269837 bthread::TaskGroup::task_runner(long)
@ 0x10252821 bthread_make_fcontext
I20260413 07:20:37.578325 139633710034112 daemon.cpp:344] version 3.5.14-23a56ec
再多次尝试执行 sql
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-01 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
没有再出先 crash 的情况, 一直卡着不动, 无法创建出新的分区
有crash的stack trace吗?
当时没有做 stack trace
没有再复现出 crash 的情况了, 感觉那个 crash 可能是内部 static 语句触发的.
我看出了 3.5.15版本, 一会我去试试新版本会不会复现此问题
刚测试了 3.5.15版本, 有一样的问题,
单独插入几条数据还不能触发,
insert 或者 XPLAIN ANALYZE 这样语句, 如果触发了建立多个分区的场景, 很容易触发
谢谢您, 我等 3.5.16 出来再测试一下
在当前版本里, 可以通过设置CN参数 enable_load_spill = false 关闭load_spill能力, 避免走到LoadSpillBlockManager的代码. 应该就不会卡住了.
刚刚测试了 3.5.16 版本, 问题已经修复. 
