optimizer实现问题请教

请问下,对于PhysicalHashJoin得hash属性是怎么计算的。
是否也是类似于orca呢?

// given an optimization context, HJN creates three optimization requests

// to enforce distribution of its children:

// Req(1 to N) (redistribute, redistribute), where we request the first hash join child

//      to be distributed on single hash join keys separately, as well as the set

//      of all hash join keys,

//      the second hash join child is always required to match the distribution returned

//      by first child

// Req(N + 1) (hashed, broadcast)

// Req(N + 2) (non-singleton, broadcast)

// Req(N + 3) (singleton, singleton)

例如是l inner join r on l.c1 = r.c1 and l.c2=r.c2
redistribute, redistribute方案会形成哪几种组合呢

physical hash join的hash属性根据join on谓词和数据的分布算的

broadcast:广播R表
shuffle:Shuffle L表和R表,左右表参与计算hash shuffle的列组合是[l.c1, l.c2] x [r.c1, r.c2],复杂查询中可能是[l.c1] x [r.c1],也可能是[l.c2] x [r.c2],具体依赖上下文推导
bucket-shuffle:Shuffle R表,前提是L表按照[l.c1, l.c2],或者[l.c1], 或者[l.c2]分布
colocate:建表的时候指定好colocate group,本地join

1赞

那对于连续的等值join呢,有哪些考量因素呢

多表join,会有对应的reorder算法,探索的流程和orca类似