1月18日 15点开始创建表耗时时间超长

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】1月18日 15点开始创建表耗时时间超长,语句如下
MySQL [sjzx_db]> CREATE TABLE dc_share_fgsqdjx_pz34 (
-> c_dlsbh varchar(65533) NOT NULL COMMENT “代理编号”,
-> c_hzny date NOT NULL COMMENT “汇总年月”,
-> c_cplx varchar(65533) NOT NULL COMMENT “产品类型 6:邮驿付”,
-> c_dlsmc varchar(65533) NULL COMMENT “代理名称”,
-> c_ywlx varchar(65533) NULL COMMENT “业务类型 1:代理 2:银行”,
-> c_sssf varchar(65533) NULL COMMENT “省份”,
-> c_gw varchar(65533) NULL COMMENT “岗位 1:业务 2:运营 3:调剂 4:分总”,
-> c_xm varchar(65533) NULL COMMENT “姓名”,
-> c_bl decimal(20,10) NULL COMMENT “比例”,
-> c_cjsj datetime NULL DEFAULT CURRENT_TIMESTAMP COMMENT “创建时间”
-> ) ENGINE=OLAP
-> DUPLICATE KEY(c_dlsbh,c_hzny,c_cplx)
-> COMMENT “分公司渠道绩效配置”
-> DISTRIBUTED BY HASH(c_dlsbh) BUCKETS 8
-> PROPERTIES (
-> “replication_num” = “3”,
-> “in_memory” = “false”,
-> “storage_format” = “DEFAULT”,
-> “enable_persistent_index” = “false”,
-> “compression” = “LZ4”
-> );
Query OK, 0 rows affected (1 min 1.89 sec)

集群资源使用情况不高

be节点的状态如下:


【业务影响】ddl操作耗时增加,查询正常
【是否存算分离】否
【StarRocks版本】2.5.16
【集群规模】3fe(1 follower+2observer)+12be(fe与be混部)
【机器信息】40C/128G/万兆
【联系方式】 StarRocks社区群5 -思变 [1412195108@qq.com]
【附件】
其中最近一个2024-01-18 17:28:43,516 WARN (starrocks-mysql-nio-pool-190|4812) [SimpleScheduler.addToBlacklist():235] add black list 2273437

BackendId为2273437的be.warning和pstack采集的日志如下:
be_WARNING.tar.gz (25.1 MB)

pstack.log (880.4 KB)

从fe leader 的fe.log 搜索关键词
cat fe.warn.log|grep “add black list”
2024-01-18 15:16:26,464 WARN (starrocks-mysql-nio-pool-66434|1920818) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 15:17:26,523 WARN (starrocks-mysql-nio-pool-66434|1920818) [SimpleScheduler.addToBlacklist():235] add black list 3482351
2024-01-18 15:25:56,913 WARN (starrocks-mysql-nio-pool-66481|1922378) [SimpleScheduler.addToBlacklist():235] add black list 2273438
2024-01-18 16:11:55,225 WARN (starrocks-mysql-nio-pool-66769|1934651) [SimpleScheduler.addToBlacklist():235] add black list 3482351
2024-01-18 16:12:53,313 WARN (starrocks-mysql-nio-pool-66788|1935182) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:12:53,611 WARN (starrocks-mysql-nio-pool-66787|1935181) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:12:55,312 WARN (starrocks-mysql-nio-pool-66769|1934651) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:13:10,211 WARN (starrocks-mysql-nio-pool-66784|1935171) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:18:21,918 WARN (thrift-server-pool-1821482|1935085) [SimpleScheduler.addToBlacklist():235] add black list 2273438
2024-01-18 16:22:31,328 WARN (thrift-server-pool-1822410|1936039) [SimpleScheduler.addToBlacklist():235] add black list 3482351
2024-01-18 16:23:25,422 WARN (starrocks-mysql-nio-pool-66818|1936822) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:24:09,620 WARN (thrift-server-pool-1823238|1936893) [SimpleScheduler.addToBlacklist():235] add black list 1392750
2024-01-18 16:29:04,214 WARN (starrocks-mysql-nio-pool-66852|1938790) [SimpleScheduler.addToBlacklist():235] add black list 2273438
2024-01-18 17:28:34,820 WARN (starrocks-mysql-nio-pool-186|4687) [SimpleScheduler.addToBlacklist():235] add black list 2273437
2024-01-18 17:28:34,820 WARN (starrocks-mysql-nio-pool-180|4654) [SimpleScheduler.addToBlacklist():235] add black list 2273437
2024-01-18 17:28:43,516 WARN (starrocks-mysql-nio-pool-190|4812) [SimpleScheduler.addToBlacklist():235] add black list 2273437

重启be节点 2273438问题依旧,接着重启be节点1392750 ,建表语句耗时恢复正常

1392750 这个be节点故障期间的日志方便发下么,be.INFO

be.INFO.tar.gz (9.5 MB)

目前,add black list 搜索出来的be我们都重启了一遍,但是监控fe leader 的fe.log 还会出现

建表语句刚恢复正常后现在又很慢了

be 节点的IO使用率高不

io都不高,idle都在95%以上

[root@gpseg15 ~]# mpstat 1 10
Linux 3.10.0-693.el7.x86_64 (gpseg15) 01/18/2024 x86_64 (40 CPU)

07:45:46 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
07:45:47 PM all 18.51 0.00 0.77 0.00 0.00 0.02 0.00 0.00 0.00 80.69
07:45:48 PM all 11.34 0.00 0.40 0.27 0.00 0.00 0.00 0.00 0.00 87.99
07:45:49 PM all 5.27 0.00 0.25 0.83 0.00 0.00 0.00 0.00 0.00 93.65
07:45:50 PM all 2.65 0.00 0.12 0.00 0.00 0.02 0.00 0.00 0.00 97.20
07:45:51 PM all 3.13 0.00 0.13 0.00 0.00 0.00 0.00 0.00 0.00 96.75
07:45:52 PM all 1.67 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 98.20
07:45:53 PM all 0.20 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 99.70
07:45:54 PM all 1.22 0.00 0.17 0.00 0.00 0.00 0.00 0.00 0.00 98.60
07:45:55 PM all 10.34 0.00 0.83 0.00 0.00 0.00 0.00 0.00 0.00 88.84
07:45:56 PM all 1.82 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 98.05
Average: all 5.62 0.00 0.30 0.11 0.00 0.00 0.00 0.00 0.00 93.96

io看着都不高

您创建一个单副本单个桶的表可以复现建表耗时长这个问题么

因为影响生产时间比较久了,我们刚把所有be都重启了一遍,目前都正常,创建单副本单桶的表也正常

  1. show tablet from dc_share_fgsqdjx_pz34;创建慢但是成功了,这个表如果没有重建, 找一个tablet
  2. 到be 中搜索下这个tablet,应该有三个节点可以搜索到信息,这个信息请您提供下

tablet信息.txt (12.7 KB)

dc_share_fgsqdjx_pz34已经重建,dc_share_fgsqdjx_pz11这个应该是当时有问题期间创建卡住的

  1. grep “dc_share_fgsqdjx_pz11” fe.log 2. grep “dc_share_fgsqdjx_pz11” fe.audit.log 您连接哪个fe执行的,请去哪个fe上执行下,看下对应的日志

fe.audit.log:2024-01-18 16:31:58,586 [query] |Client=10.168.89.25:63714|User=root|AuthorizedUser=‘root’@’%’|ResourceGroup=|Catalog=default_catalog|Db=sjzx_db|State=ERR|ErrorCode=|Time=240037|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=5006798|QueryId=7c739960-b5db-11ee-aac5-6805caecd97c|IsQuery=false|feIp=10.168.89.25|Stmt=CREATE TABLE dc_share_fgsqdjx_pz11 ( c_dlsbh varchar(65533) NOT NULL COMMENT “代理编号”, c_hzny date NOT NULL COMMENT “汇总年月”, c_cplx varchar(65533) NOT NULL COMMENT “产品类型 6:邮驿付”, c_dlsmc varchar(65533) NULL COMMENT “代理名称”, c_ywlx varchar(65533) NULL COMMENT “业务类型 1:代理 2:银行”, c_sssf varchar(65533) NULL COMMENT “省份”, c_gw varchar(65533) NULL COMMENT “岗位 1:业务 2:运营 3:调剂 4:分总”, c_xm varchar(65533) NULL COMMENT “姓名”, c_bl decimal(20,10) NULL COMMENT “比例”, c_cjsj datetime NULL DEFAULT CURRENT_TIMESTAMP COMMENT “创建时间” ) ENGINE=OLAP DUPLICATE KEY(c_dlsbh,c_hzny,c_cplx) COMMENT “分公司渠道绩效配置” DISTRIBUTED BY HASH(c_dlsbh) BUCKETS 8 PROPERTIES ( “replication_num” = “3”, “in_memory” = “false”, “storage_format” = “DEFAULT”, “enable_persistent_index” = “false”, “compression” = “LZ4” )|Digest=
fe.audit.log:2024-01-18 16:31:58,586 [slow_query] |Client=10.168.89.25:63714|User=root|AuthorizedUser=‘root’@’%’|ResourceGroup=|Catalog=default_catalog|Db=sjzx_db|State=ERR|ErrorCode=|Time=240037|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=5006798|QueryId=7c739960-b5db-11ee-aac5-6805caecd97c|IsQuery=false|feIp=10.168.89.25|Stmt=CREATE TABLE dc_share_fgsqdjx_pz11 ( c_dlsbh varchar(65533) NOT NULL COMMENT “代理编号”, c_hzny date NOT NULL COMMENT “汇总年月”, c_cplx varchar(65533) NOT NULL COMMENT “产品类型 6:邮驿付”, c_dlsmc varchar(65533) NULL COMMENT “代理名称”, c_ywlx varchar(65533) NULL COMMENT “业务类型 1:代理 2:银行”, c_sssf varchar(65533) NULL COMMENT “省份”, c_gw varchar(65533) NULL COMMENT “岗位 1:业务 2:运营 3:调剂 4:分总”, c_xm varchar(65533) NULL COMMENT “姓名”, c_bl decimal(20,10) NULL COMMENT “比例”, c_cjsj datetime NULL DEFAULT CURRENT_TIMESTAMP COMMENT “创建时间” ) ENGINE=OLAP DUPLICATE KEY(c_dlsbh,c_hzny,c_cplx) COMMENT “分公司渠道绩效配置” DISTRIBUTED BY HASH(c_dlsbh) BUCKETS 8 PROPERTIES ( “replication_num” = “3”, “in_memory” = “false”, “storage_format” = “DEFAULT”, “enable_persistent_index” = “false”, “compression” = “LZ4” )|Digest=
fe.audit.log:2024-01-18 16:40:38,810 [query] |Client=10.168.89.25:63714|User=root|AuthorizedUser=‘root’@’%’|ResourceGroup=|Catalog=default_catalog|Db=sjzx_db|State=OK|ErrorCode=|Time=166921|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=5011119|QueryId=de1c4108-b5dc-11ee-aac5-6805caecd97c|IsQuery=false|feIp=10.168.89.25|Stmt=CREATE TABLE dc_share_fgsqdjx_pz11 ( c_dlsbh varchar(65533) NOT NULL COMMENT “代理编号”, c_hzny date NOT NULL COMMENT “汇总年月”, c_cplx varchar(65533) NOT NULL COMMENT “产品类型 6:邮驿付”, c_dlsmc varchar(65533) NULL COMMENT “代理名称”, c_ywlx varchar(65533) NULL COMMENT “业务类型 1:代理 2:银行”, c_sssf varchar(65533) NULL COMMENT “省份”, c_gw varchar(65533) NULL COMMENT “岗位 1:业务 2:运营 3:调剂 4:分总”, c_xm varchar(65533) NULL COMMENT “姓名”, c_bl decimal(20,10) NULL COMMENT “比例”, c_cjsj datetime NULL DEFAULT CURRENT_TIMESTAMP COMMENT “创建时间” ) ENGINE=OLAP DUPLICATE KEY(c_dlsbh,c_hzny,c_cplx) COMMENT “分公司渠道绩效配置” DISTRIBUTED BY HASH(c_dlsbh) BUCKETS 8 PROPERTIES ( “replication_num” = “3”, “in_memory” = “false”, “storage_format” = “DEFAULT”, “enable_persistent_index” = “false”, “compression” = “LZ4” )|Digest=
fe.audit.log:2024-01-18 16:40:38,810 [slow_query] |Client=10.168.89.25:63714|User=root|AuthorizedUser=‘root’@’%’|ResourceGroup=|Catalog=default_catalog|Db=sjzx_db|State=OK|ErrorCode=|Time=166921|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=5011119|QueryId=de1c4108-b5dc-11ee-aac5-6805caecd97c|IsQuery=false|feIp=10.168.89.25|Stmt=CREATE TABLE dc_share_fgsqdjx_pz11 ( c_dlsbh varchar(65533) NOT NULL COMMENT “代理编号”, c_hzny date NOT NULL COMMENT “汇总年月”, c_cplx varchar(65533) NOT NULL COMMENT “产品类型 6:邮驿付”, c_dlsmc varchar(65533) NULL COMMENT “代理名称”, c_ywlx varchar(65533) NULL COMMENT “业务类型 1:代理 2:银行”, c_sssf varchar(65533) NULL COMMENT “省份”, c_gw varchar(65533) NULL COMMENT “岗位 1:业务 2:运营 3:调剂 4:分总”, c_xm varchar(65533) NULL COMMENT “姓名”, c_bl decimal(20,10) NULL COMMENT “比例”, c_cjsj datetime NULL DEFAULT CURRENT_TIMESTAMP COMMENT “创建时间” ) ENGINE=OLAP DUPLICATE KEY(c_dlsbh,c_hzny,c_cplx) COMMENT “分公司渠道绩效配置” DISTRIBUTED BY HASH(c_dlsbh) BUCKETS 8 PROPERTIES ( “replication_num” = “3”, “in_memory” = “false”, “storage_format” = “DEFAULT”, “enable_persistent_index” = “false”, “compression” = “LZ4” )|Digest=
fe.log:2024-01-18 16:27:58,583 INFO (starrocks-mysql-nio-pool-66853|1938791) [LocalMetastore.buildPartitions():1657] start to build 1 partitions sequentially for table sjzx_db.dc_share_fgsqdjx_pz11 with 24 replicas
fe.log:2024-01-18 16:31:58,584 WARN (starrocks-mysql-nio-pool-66853|1938791) [StmtExecutor.handleDdlStmt():1213] DDL statement (CREATE TABLE dc_share_fgsqdjx_pz11 (
fe.log:2024-01-18 16:37:51,921 INFO (starrocks-mysql-nio-pool-66884|1940965) [LocalMetastore.buildPartitions():1657] start to build 1 partitions sequentially for table sjzx_db.dc_share_fgsqdjx_pz11 with 24 replicas
fe.log:2024-01-18 16:40:38,810 INFO (starrocks-mysql-nio-pool-66884|1940965) [LocalMetastore.createOlapOrLakeTable():2434] Successfully create table[dc_share_fgsqdjx_pz11;27199998]

我们查了dc_share_fgsqdjx_pz11表所有副本在各个be上创建耗时,有发现以下几个tablet创建时间比较久

fe.log:2024-01-18 16:37:51,921 INFO (starrocks-mysql-nio-pool-66884|1940965) [LocalMetastore.buildPartitions():1657] start to build 1 partitions sequentially for table sjzx_db.dc_share_fgsqdjx_pz11 with 24 replicas
fe.log:2024-01-18 16:40:38,810 INFO (starrocks-mysql-nio-pool-66884|1940965) [LocalMetastore.createOlapOrLakeTable():2434] Successfully create table[dc_share_fgsqdjx_pz11;27199998]

[srdb@gpseg8 log]$ grep -i 27200004 .
be.INFO.log.20240117-005226:I0118 16:37:51.921685 409561 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200020
be.INFO.log.20240117-005226:I0118 16:37:51.921753 371066 tablet_manager.cpp:140] Creating tablet 27200020
be.INFO.log.20240117-005226:I0118 16:39:00.601917 371066 tablet_manager.cpp:209] Created tablet 27200020

[srdb@gpseg9 log]$ grep -i 27200020 .
be.INFO:I0118 16:37:51.922044 364060 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200020
be.INFO:I0118 16:37:51.922123 56358 tablet_manager.cpp:140] Creating tablet 27200020
be.INFO:I0118 16:40:38.807544 56358 tablet_manager.cpp:209] Created tablet 27200020

[srdb@gpseg10 log]$ grep -i 27200004 .
be.INFO.log.20240117-134548:I0118 16:37:51.922093 188815 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200004
be.INFO.log.20240117-134548:I0118 16:37:51.922211 337753 tablet_manager.cpp:140] Creating tablet 27200004
be.INFO.log.20240117-134548:I0118 16:38:01.693523 337753 tablet_manager.cpp:209] Created tablet 27200004

[srdb@gpseg14 log]$ grep -i 27200012 .
be.INFO:I0118 16:37:51.921746 410453 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200012
be.INFO:I0118 16:37:51.921885 10696 tablet_manager.cpp:140] Creating tablet 27200012
be.INFO:I0118 16:38:28.272373 10696 tablet_manager.cpp:209] Created tablet 27200012

[srdb@gpseg15 log]$ grep -i 27200012 .
be.INFO1:I0118 16:37:51.922096 133963 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200012
be.INFO1:I0118 16:39:23.373620 60409 tablet_manager.cpp:140] Creating tablet 27200012
be.INFO1:I0118 16:39:23.373769 60409 tablet_manager.cpp:209] Created tablet 27200012

[srdb@gpseg15 log]$ grep -i 27200028 .
be.INFO1:I0118 16:37:51.922152 133963 agent_server.cpp:382] Submit task success. type=CREATE, signature=27200028
be.INFO1:I0118 16:39:23.373878 60409 tablet_manager.cpp:140] Creating tablet 27200028
be.INFO1:I0118 16:39:23.373999 60409 tablet_manager.cpp:209] Created tablet 27200028

您可以尝试升级一下最新的2.5.18版本,最近我们解决过一些会导致创建和删除tablet的耗时过长的bad case。

1赞