【详述】BE IO Util 突然飙升到100%
【背景】使用k8s-operator部署集群,存算分离模式
【业务影响】
【是否存算分离】是
【StarRocks版本】3.2.1
【集群规模】3fe(1 follower+2observer)+ 3cn(fe与cn混部)
【机器信息】fe:8C/16G/万兆 cn:28C/96G/万兆
【附件】
mysql> admin execute on 10004 ’
'> System.print(ExecEnv.io_profile_and_get_topn_stats(“all”, 30, 10))
'> ';
10004是be_id, show backends; 可以看到
30是采样时间
10 是取 TOP 10
show backends;没有值,部署的cn节点,存算分离模式
iotop 看下吧,估计是 data cache
看起来IO也不高啊
主要在查PK表吗?
不是,主要是通过stream load方式导入数据到SR中形成base表,基于base表做了物化视图用于查询
最近发起过大表的SchemaChange吗
同步物化视图,还是异步物化视图
这是基础表和物化视图的结构,主要就是这张表
CREATE TABLE trace
(
trace_id
varchar(32) NOT NULL COMMENT “”,
span_id
varchar(16) NULL COMMENT “”,
start_time
datetime NULL COMMENT “”,
start_time_unix_nano
bigint(20) NULL COMMENT “”,
service_name
varchar(255) NULL COMMENT “”,
namespaces
varchar(24) NULL COMMENT “”,
resource
json NULL COMMENT “”,
scope_name
varchar(255) NULL COMMENT “”,
scope_version
varchar(255) NULL COMMENT “”,
end_time_unix_nano
bigint(20) NULL COMMENT “”,
kind
int(11) NULL COMMENT “”,
name
varchar(255) NULL COMMENT “”,
parent_span_id
varchar(255) NULL COMMENT “”,
status_code
int(11) NULL COMMENT “”,
status_message
varchar(255) NULL COMMENT “”,
events
json NULL COMMENT “”,
links
json NULL COMMENT “”,
attributes
json NULL COMMENT “”,
duration
bigint(20) NULL COMMENT “”,
net_peer_name
varchar(255) NULL COMMENT “”,
db_type
varchar(255) NULL COMMENT “”,
op_type
varchar(255) NULL COMMENT “”
) ENGINE=OLAP
DUPLICATE KEY(trace_id
)
COMMENT “OLAP”
PARTITION BY date_trunc(‘hour’, start_time)
DISTRIBUTED BY HASH(trace_id
) BUCKETS 32
PROPERTIES (
“replication_num” = “1”,
“datacache.partition_duration” = “12 hours”,
“datacache.enable” = “true”,
“storage_volume” = “builtin_storage_volume”,
“enable_async_write_back” = “false”,
“enable_persistent_index” = “false”,
“compression” = “LZ4”
);
CREATE MATERIALIZED VIEW trace_app
(start_time
, namespaces
, service_name
, name
, duration
, net_peer_name
, status_code
, trace_id
)
COMMENT “MATERIALIZED_VIEW”
PARTITION BY (date_trunc(‘hour’, start_time
))
DISTRIBUTED BY HASH(namespaces
, service_name
) BUCKETS 12
ORDER BY (start_time,namespaces,service_name)
REFRESH ASYNC EVERY(INTERVAL 5 MINUTE)
PROPERTIES (
“replicated_storage” = “true”,
“replication_num” = “1”,
“partition_ttl” = “7 DAY”,
“datacache.enable” = “true”,
“enable_async_write_back” = “false”,
“storage_volume” = “builtin_storage_volume”
)
AS SELECT trace
.start_time
, trace
.namespaces
, trace
.service_name
, trace
.name
, trace
.duration
, trace
.net_peer_name
, trace
.status_code
, trace
.trace_id
FROM opentelemetry
.trace
;
你加我下,我拉你到一个存算分离的群里,让我们负责这块的同学看下?
加您了,通过一下