另外这个bt看着只一个线程starrocks::StarOSWorker::get_shard_filesystem, 取到core时, 集群表象是卡住超时了吗?
取 gcore 时, pod 所在的 node 和 ip 没更换, 但是显示 cn 进程重启了, 图片是 K9S , 其中红圈处会 +1
gcore 文件20 多个 G, 我周一再重新导出一份, 用网盘传给你们吧
gcore时, 进程处于不响应状态. pod healthy check会失败, kubelet会杀pod重新拉起.
重新导出了一份 core文件20多 G,
百度网盘链接 通过网盘分享的文件:starrocks_be_core.27
链接: https://pan.baidu.com/s/1F6TfxLW-yJiMDBbMvs4l1A?pwd=av8k 提取码: av8k
–来自百度网盘超级会员v1的分享
CN是哪个版本?
这个 core 是基于 3.5.14 导出来的
Thread 532 (Thread 0x7f978de36640 (LWP 755)):
#0 __futex_abstimed_wait_common (cancel=false, private=<optimized out>, abstime=0x0, clockid=0, expected=3, futex_word=0x7f98ae3e222c) at ./nptl/futex-internal.c:103
#1 __GI___futex_abstimed_wait64 (futex_word=futex_word@entry=0x7f98ae3e222c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:128
#2 0x00007f98b369224f in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x7f98ae3e2220) at ./nptl/pthread_rwlock_common.c:730
#3 ___pthread_rwlock_wrlock (rwlock=0x7f98ae3e2220) at ./nptl/pthread_rwlock_wrlock.c:26
#4 0x000000000bf2ee29 in std::__glibcxx_rwlock_wrlock (__rwlock=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:80
#5 std::__shared_mutex_pthread::lock (this=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:193
#6 std::shared_mutex::lock (this=0x7f98ae3e2220) at /usr/include/c++/11/shared_mutex:420
#7 std::unique_lock<std::shared_mutex>::lock (this=<synthetic pointer>) at /usr/include/c++/11/bits/unique_lock.h:139
#8 std::unique_lock<std::shared_mutex>::unique_lock (__m=..., this=<synthetic pointer>) at /usr/include/c++/11/bits/unique_lock.h:69
#9 starrocks::StarOSWorker::new_shared_filesystem (this=this@entry=0x7f98ae3e2190, scheme=..., conf=...) at be/src/service/staros_worker.cpp:364
#10 0x000000000bf30d34 in starrocks::StarOSWorker::build_filesystem_from_shard_info (this=this@entry=0x7f98ae3e2190, info=..., conf=...) at /usr/include/c++/11/string_view:137
#11 0x000000000bf323de in starrocks::StarOSWorker::get_shard_filesystem (this=0x7f98ae3e2190, id=83521, conf=...) at be/src/service/staros_worker.cpp:239
#12 0x00000000081ead1c in starrocks::StarletFileSystem::get_shard_filesystem (shard_id=<optimized out>, this=0x7f97b26cf000) at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#13 starrocks::StarletFileSystem::delete_dir (this=0x7f97b26cf000, dirname=...) at be/src/fs/fs_starlet.cpp:467
#14 0x000000000a7deb31 in starrocks::lake::LoadSpillBlockManager::clear_parent_path (this=this@entry=0x7f96e53f68e0) at be/src/storage/lake/load_spill_block_manager.cpp:103
#15 0x000000000a7df033 in starrocks::lake::LoadSpillBlockManager::~LoadSpillBlockManager (this=this@entry=0x7f96e53f68e0, __in_chrg=<optimized out>) at be/src/storage/lake/load_spill_block_manager.cpp:90
mutex lock: this=0x7f98ae3e2220
(gdb) p /x *this
$6 = {_M_rwlock = {__data = {__readers = 0xfffffffb, __writers = 0x0, __wrphase_futex = 0x1, __writers_futex = 0x3, __pad3 = 0x0, __pad4 = 0x0, __cur_writer = 0x2fa, __shared = 0x0, __rwelision = 0x0, __pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __pad2 = 0x0, __flags = 0x0}, __size = {0xfb, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x3, 0x0 <repeats 11 times>, 0xfa, 0x2, 0x0 <repeats 30 times>}, __align = 0xfffffffb}}
当前锁被【写锁独占】(不是读锁共享)
写锁持有者线程 TID = 762
写锁被重入了 5 次
无其他线程等待写锁
(gdb) t 539
[Switching to thread 539 (Thread 0x7f979133d640 (LWP 762))]
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000000010266dda in bthread::futex_wait_private (timeout=0x0, expected=<optimized out>, addr1=<optimized out>) at ./src/bthread/sys_futex.h:40
#2 bthread::ParkingLot::wait (expected_state=..., this=<optimized out>) at ./src/bthread/parking_lot.h:60
#3 bthread::TaskGroup::wait_task (this=this@entry=0x7f98a8b44080, tid=tid@entry=0x7f979132d6f8) at src/bthread/task_group.cpp:133
#4 0x0000000010269d6b in bthread::TaskGroup::run_main_task (this=this@entry=0x7f98a8b44080) at src/bthread/task_group.cpp:161
#5 0x0000000010263dc2 in bthread::TaskControl::worker_thread (arg=<optimized out>) at src/bthread/task_control.cpp:99
#6 0x00007f98b368bac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7 0x00007f98b371d8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
TID 762是一个bthread worker线程. 大概率是bthread持锁又切换线程导致unlock()失效.
尝试复现crash在第一现场
export PTHREAD_MUTEX_ERRORCHECK=1
在POD里注入这个环境变量再尝试复现, mutex跨线程释放锁时, 进程会crash.
可能有更多信息排查问题.
另外, 这是一个稳定复现的问题吗? 可以看看是不是有一个最小复现步骤, 我也复现看看.
最小复现步骤,
【集群版本】 直接部署 3.5.14
【集群规模】 3个fe(node 8核心 20G 内存 50G 磁盘)
1个cn(node 16核心 64G 内存 100G 磁盘)
【Hadoop版本】 基于 Apache Ambari 2.7.5.0 部署的 hadoop 3.1.1版
cn 的 cm 配置:
# fe config
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-cn-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
cn.conf: |
JAVA_OPTS="--add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
storage_root_path = /opt/starrocks/cn/storage/root
spill_local_storage_dir = /opt/starrocks/cn/storage/spill
datacache_enable = false
datacache_mem_size = 10%
datacache_disk_size = 107374182400
mem_limit = 90%
report_task_interval_seconds = 10
starlet_star_cache_disk_size_percent = 20
fe 的 cm 配置
# fe config
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-fe-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
fe.conf: |
LOG_DIR = /opt/starrocks/fe/log
DATE = "$(date +%Y%m%d-%H%M%S)"
JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xms8192m -Xmx8192m -XX:+UseG1GC -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE -XX:ErrorFile=${LOG_DIR}/hs_err_pid%p.log -Djava.security.policy=${STARROCKS_HOME}/conf/udf_security.policy"
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
sys_log_level = INFO
mysql_service_nio_enabled = true
tablet_create_timeout_second = 60
fast_schema_evolution = true
# config for shared-data mode
run_mode = shared_data
cloud_native_meta_port = 6090
cloud_native_storage_type = S3
enable_load_volume_from_conf = false
enable_udf = true
max_automatic_partition_number = 87600
enable_statistic_collect = false
enable_collect_full_statistic = false
hadoop 的 cm 配置
apiVersion: v1
kind: ConfigMap
metadata:
name: starrocks-hdfs-cm
namespace: bd-starrocks
labels:
cluster: starrocks
data:
hadoop_env.sh: |
export HADOOP_USER_NAME="starrocks"
export HADOOP_CLASSPATH=${STARROCKS_HOME}/lib/hadoop/common/*:${STARROCKS_HOME}/lib/hadoop/common/lib/*:${STARROCKS_HOME}/lib/hadoop/hdfs/*:${STARROCKS_HOME}/lib/hadoop/hdfs/lib/*
if [ -z "${HADOOP_USER_NAME}" ]
then
if [ -z "${USER}" ]
then
export HADOOP_USER_NAME=$(id -u -n)
else
export HADOOP_USER_NAME=${USER}
fi
fi
if [ ${HADOOP_CONF_DIR}"X" != "X" ]; then
export HADOOP_CLASSPATH=${HADOOP_CONF_DIR}:${HADOOP_CLASSPATH}
fi
hdfs-site.xml: |
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>dfs.nameservices</name>
<value>ljx</value>
</property>
<property>
<name>dfs.ha.namenodes.ljx</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ljx.nn1</name>
<value>ljx-bd-c1-nn01.ljximing.int:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ljx.nn2</name>
<value>ljx-bd-c1-nn02.ljximing.int:8020</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ljx</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
starrokcs 的 StarRocksCluster 配置
apiVersion: starrocks.com/v1
kind: StarRocksCluster
metadata:
name: starrocks
namespace: bd-starrocks
spec:
starRocksFeSpec:
image: harbor.yowin.mobi/bd/starrocks-fe-ubuntu:3.5.14
podLabels:
app: starrocks-fe
cluster: starrocks
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- starrocks-fe
topologyKey: kubernetes.io/hostname
feEnvVars:
- name: "MYSQL_PWD"
valueFrom:
secretKeyRef:
name: sr-credential
key: password
replicas: 3
requests:
cpu: 7
memory: 16Gi
limits:
cpu: 7
memory: 18Gi
storageVolumes:
- name: starrocks-fe-storage
storageClassName: oci-bv
storageSize: 100Gi
mountPath: /opt/starrocks/fe/meta
service:
type: LoadBalancer
configMapInfo:
configMapName: starrocks-fe-cm
resolveKey: fe.conf
configMaps:
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/fe/conf/hdfs-site.xml
subPath: "hdfs-site.xml"
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/fe/conf/hadoop_env.sh
subPath: "hadoop_env.sh"
starRocksCnSpec:
image: harbor.yowin.mobi/bd/starrocks-cn-ubuntu:3.5.14
podLabels:
app: starrocks-cn
cluster: starrocks
cnEnvVars:
- name: "MYSQL_PWD"
valueFrom:
secretKeyRef:
name: sr-credential
key: password
requests:
cpu: 14
memory: 57Gi
replicas: 1
configMapInfo:
configMapName: starrocks-cn-cm
resolveKey: cn.conf
configMaps:
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/cn/conf/hdfs-site.xml
subPath: "hdfs-site.xml"
- name: starrocks-hdfs-cm
mountPath: /opt/starrocks/cn/conf/hadoop_env.sh
subPath: "hadoop_env.sh"
storageVolumes:
- name: starrocks-cn-storage
storageClassName: oci-bv
storageSize: 100Gi
mountPath: /opt/starrocks/cn/storage
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Delete
1. 初始化集群root账号
2. 创建 volume_hdfs
CREATE STORAGE VOLUME volume_hdfs
TYPE = HDFS
LOCATIONS = ("hdfs://ljx/apps/starrocks/volume-test/")
PROPERTIES (
"username" = "starrocks",
"hadoop.security.authentication" = "simple"
);
SET volume_hdfs AS DEFAULT STORAGE VOLUME;
3. 创建库,表
creat database dmp;
CREATE TABLE dmp.adt_ip_country_source (
`__dt` datetime NOT NULL COMMENT "数据时间,精确到小时",
`ip` varchar(50) NOT NULL COMMENT "ip",
`country` varchar(3) NOT NULL COMMENT "国家",
`source` varchar(16) NOT NULL COMMENT "IP 来源"
) ENGINE=OLAP
COMMENT "IP 国家表, 用于 IP 活跃度统计"
PARTITION BY date_trunc('hour', __dt)
DISTRIBUTED BY HASH(`ip`, `country`) BUCKETS 1
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"datacache.partition_duration" = "1 days",
"enable_async_write_back" = "false",
"partition_live_number" = "840",
"replication_num" = "1",
"storage_volume" = "volume_hdfs"
);
4.当执行如下语句很快复现, 问题一旦发生, 在不重启 cn 和 fe 时, 后面每次执行都会触发卡住
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-01 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
XPLAIN ANALYZE INSERT INTO adt_ip_country_source (__dt,ip,country,source) SELECT DATE_ADD('2026-02-02 00:00:00', INTERVAL d hour),'','','' FROM table(generate_series(0, 23)) AS g(d);
#... 我就不贴具体语句了, 修改其中日期列, 一天一条, 然后放到 mysql 客户端中连续逐条执行, 很快就触发,一般 10 条后就卡住了
#... 使用 insert into 一个新分区的数据也会触发, 但触发率低
1赞
我试试看.
