【详述】查询一个paimon 大表非常慢,表总数据行数大约22 亿,集群 100 个 backends节点,查看 profile发现数据 scan只有 3 实例运行, 如何充分利用集群节点增加查询并发?查询 iceberg表,scan阶段则有 100 实例运行, 不确定是否是 paimon catalog 实现的问题?
paimon 表 200 bucket
查询语句 select count(1) from paimon.xxx.xxxx;
【StarRocks版本】3.2.4
【集群规模】1fe +100 be
@trueeyu @yuchen1019 @Doni 帮忙看下感谢
执行计划如下
±--------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                  |
±--------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0(F01)                                                                                                            |
|   Output Exprs:20: count                                                                                                        |
|   Input Partition: UNPARTITIONED                                                                                                |
|   RESULT SINK                                                                                                                   |
|                                                                                                                                 |
|   4:AGGREGATE (merge finalize)                                                                                                  |
|   |  aggregate: count[([20: count, BIGINT, false]); args: TINYINT; result: BIGINT; args nullable: true; result nullable: false] |
|   |  cardinality: 1                                                                                                             |
|   |  column statistics:                                                                                                         |
|   |  * count–>[0.0, 3.533228509E9, 0.0, 8.0, 1.0] ESTIMATE                                                                     |
|   |                                                                                                                             |
|   3:EXCHANGE                                                                                                                    |
|      distribution type: GATHER                                                                                                  |
|      cardinality: 1                                                                                                             |
|                                                                                                                                 |
| PLAN FRAGMENT 1(F00)                                                                                                            |
|                                                                                                                                 |
|   Input Partition: RANDOM                                                                                                       |
|   OutPut Partition: UNPARTITIONED                                                                                               |
|   OutPut Exchange Id: 03                                                                                                        |
|                                                                                                                                 |
|   2:AGGREGATE (update serialize)                                                                                                |
|   |  aggregate: count[(1); args: TINYINT; result: BIGINT; args nullable: false; result nullable: false]                         |
|   |  cardinality: 1                                                                                                             |
|   |  column statistics:                                                                                                         |
|   |  * count–>[0.0, 3.533228509E9, 0.0, 8.0, 1.0] ESTIMATE                                                                     |
|   |                                                                                                                             |
|   1:Project                                                                                                                     |
|   |  output columns:                                                                                                            |
|   |  22 <-> 1                                                                                                                   |
|   |  cardinality: 3533228509                                                                                                    |
|   |  column statistics:                                                                                                         |
|   |  * auto_fill_col–>[1.0, 1.0, 0.0, 1.0, 1.0] ESTIMATE                                                                       |
|   |                                                                                                                             |
|   0:PaimonScanNode                                                                                                              |
|      TABLE: xxxxxxxxxxxx                                                                                |
|      partitions=1/1                                                                                                             |
|      avgRowSize=0.0                                                                                                             |
|      cardinality: -1                                                                                                            |
|      column statistics:                                                                                                         |
±--------------------------------------------------------------------------------------------------------------------------------+
40 rows in set (3.38 sec)


大致定位到原因了,com.starrocks.qe.HDFSBackendSelector.HdfsScanRangeHasher#acceptScanRangeLocations没有考虑到paimon
下面是一个 paimon的THdfsScanRange具体信息,没有一个条件命中com.starrocks.qe.HDFSBackendSelector.HdfsScanRangeHasher#acceptScanRangeLocations中的primitiveSink,所以最终所有 paimon THdfsScanRange计算的 hash 值都是 0
com.starrocks.qe.HDFSBackendSelector.HdfsScanRangeHasher#acceptScanRangeLocations
因为 hash值都是 0, 所以 hashring 返回的始终是同一组 backend, com.starrocks.qe.HDFSBackendSelector#computeScanRangeAssignment中
一个修复思路是把 split bucket 信息放入THdfsScanRange,根据 bucket 信息做 hash
com.starrocks.planner.PaimonScanNode#addSplitScanRangeLocations
提交 issue
非常棒的分析,希望社区能尽快修复一下




