大数据量多表关联查询性能优化？

InUrFuture · 2022年02月16日 02:46

当前遇到一个场景，需要3表INNER JOIN做分页条件查询
A表：id,xx,…
中间表：a_id,b_id
B表：id,xx,…
三张表都是根据id去重做属性更新的聚合表，都是32个bucket 64个分区

SELECT * FROM (select * from a_uniq ) t1 inner join relation_uniq t2 on t1.id = t2.src_vid
inner join (select * from b_uniq ) t3 on t2.dst_vid = t3.id order by t2.update_time desc limit 10;

最大数据量场景下的A表约6亿，中间表30亿，B表6000万
查询内存溢出
其余数据量场景下A表约7000万，中间表100万，B表8000万
查询耗时2m17s

测试了Colocate Join，要求HASH字段类型和数量一致，而且关联的时候需要作为关联条件
这样仅能满足两表的关联，无法满足
还有什么方案吗？

InUrFuture · 2022年02月16日 02:51

profile.txt (94.3 KB)
刚才查询还需要2m17s 第二次查就是37s了

zhiwen_gou · 2025年01月3日 06:25

找到方案了吗