hash join时当build table超过1024行记录时,prod表需要扫描大量行的数据

【详述】
hash join时当build table超过1024行记录时,prod表需要扫描大量行的数据,如下面所示:
table A :100000000 rows,table B: 2000 rows
when this sql executed:
select A.* from A inner join (select * from B limit 1025) as B on A.a = B.b limit 10; -->execution duration: 5s
RuntimeFilter: 6.545M (6545020) -> 12.331K (12331) (99.81%)

| └──HASH_JOIN (id=5) [BROADCAST, INNER JOIN] |
| │ Estimates: [row: 1501, cpu: 2281506867.55, memory: 32801.31, network: 0.00, cost: 2282114851.94] |
| │ TotalTime: 441.259us (0.06%) [CPUTime: 441.259us] |
| │ OutputRows: 1.004K (1004) |
| │ PeakMemory: 934.616 KB, AllocatedMemory: 3.130 MB |
| │ BuildTime: 251.092us |
| │ ProbeTime: 180.616us |
| │ EqJoinConjuncts: [2: DLR_CUST_NO = 29: DLR_CUST_NO] |
| │ SubordinateOperators: |
| │ CHUNK_ACCUMULATE |
| │ LOCAL_EXCHANGE [Passthrough] |
| ├── OLAP_SCAN (id=0) |
| │ Estimates: [row: 71345222, cpu: 2281377982.53, memory: 0.00, network: 0.00, cost: 1140688991.27] |
| │ TotalTime: 782.928ms (99.37%) [CPUTime: 12.298ms, ScanTime: 770.630ms] |
| │ OutputRows: 12.331K (12331) |
| │ RuntimeFilter: 6.545M (6545020) -> 12.331K (12331) (99.81%) |
| │ Table: : t_usc_mdmn_bu_dlr_cust_addr |
| │ SubordinateOperators: |
| │ CHUNK_ACCUMULATE |
| │ NOOP |
| │ OLAP_SCAN_PREPARE |
| │ Detail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] |
| │ IOTaskExecTime: 693.553ms [min=610.418ms, max=770.501ms] |
| │ SegmentInit: 575.399ms [min=522.850ms, max=666.077ms] |
| │ BitmapIndexFilter: 572.413ms [min=520.393ms, max=663.529ms] |
| │ IOTaskWaitTime: 193.650us [min=60.029us, max=305.847us] |
| └── EXCHANGE (id=4) [BROADCAST] |
| Estimates: [row: 1025, cpu: 98403.94, memory: 98403.94, network: 98403.94, cost: 475619.02] |
| TotalTime: 758.261us (0.10%) [CPUTime: 310.167us, NetworkTime: 448.094us] |
| OutputRows: 1.025K (1025) |
| PeakMemory: 89.227 KB, AllocatedMemory: 402.718 KB
when this SQL executed:

select A.* from A inner join (select * from B limit 1024) as B on A.a = B.b limit 10; -->execution duration: 0.2s
RuntimeFilter: 1.013K (1013) -> 1.013K (1013) (0.00%)

| └──HASH_JOIN (id=5) [BROADCAST, INNER JOIN] |
| │ Estimates: [row: 1495, cpu: 2281506364.58, memory: 32673.31, network: 0.00, cost: 2282111976.35] |
| │ TotalTime: 278.949us (0.92%) [CPUTime: 278.949us] |
| │ OutputRows: 1.013K (1013) |
| │ PeakMemory: 812.352 KB, AllocatedMemory: 2.769 MB |
| │ BuildTime: 177.800us |
| │ ProbeTime: 95.104us |
| │ EqJoinConjuncts: [2: DLR_CUST_NO = 29: DLR_CUST_NO] |
| │ SubordinateOperators: |
| │ CHUNK_ACCUMULATE |
| │ LOCAL_EXCHANGE [Passthrough] |
| ├── OLAP_SCAN (id=0) |
| │ Estimates: [row: 71345222, cpu: 2281377982.53, memory: 0.00, network: 0.00, cost: 1140688991.27] |
| │ TotalTime: 25.573ms (84.59%) [CPUTime: 2.679ms, ScanTime: 22.894ms] |
| │ OutputRows: 1.013K (1013) |
| │ RuntimeFilter: 1.013K (1013) -> 1.013K (1013) (0.00%) |
| │ Table: : t_usc_mdmn_bu_dlr_cust_addr |
| │ SubordinateOperators: |
| │ CHUNK_ACCUMULATE |
| │ NOOP |
| │ OLAP_SCAN_PREPARE |
| │ Detail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] |
| │ IOTaskExecTime: 11.657ms [min=4.691ms, max=22.877ms] |
| │ IOTime: 2.283ms [min=0ns, max=12.010ms] |
| │ SegmentInit: 10.404ms [min=4.355ms, max=22.122ms] |
| │ BitmapIndexFilter: 10.265ms [min=4.253ms, max=21.966ms] |
| │ IOTaskWaitTime: 29.597us [min=13.603us, max=74.601us] |
| └── EXCHANGE (id=4) [BROADCAST] |
| Estimates: [row: 1021, cpu: 98019.92, memory: 98019.92, network: 98019.92, cost: 473762.95] |
| TotalTime: 776.131us (2.57%) [CPUTime: 326.848us, NetworkTime: 449.283us] |
| OutputRows: 1.021K (1021) |
| PeakMemory: 89.890 KB, AllocatedMemory: 370.702 KB

when hash table execeeds 1024 rows ,Runtime Filter does not effect on the Probe table,but runtime_join_filter_push_down_limit :1024000

show global variables like ‘%filter%’;
±-------------------------------------------±--------+
| Variable_name | Value |
±-------------------------------------------±--------+
| enable_filter_unused_columns_in_scan_stage | true |
| enable_global_runtime_filter | true |
| enable_multicolumn_global_runtime_filter | false |
| enable_topn_runtime_filter | true |
| runtime_filter_on_exchange_node | false |
| runtime_join_filter_push_down_limit | 1024000 |
±-------------------------------------------±--------+
6 rows in set (0.00 sec)

【是否存算分离】否
【StarRocks版本】3.1.14
【集群规模】例如:3fe(1 follower+2observer)+5be
【机器信息】物理机,网卡20Gb

这个与starrocks的runtime filter的实现有关。当用于过滤probe表的keySet<=1024时keySet转换为in filter,而当用于过滤probe表的keySet > 1024时keySet转换为bloom filter。in filter没有假阳而bloom filter有假阳,所以keySet>1024行时会probe出更大的数据集。