计算30天留存问题

select
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-01’) as day1,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-02’) as day2,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-03’) as day3,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-04’) as day4,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-05’) as day5,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-06’) as day6,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-07’) as day7,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-08’) as day8,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-09’) as day9,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-10’) as day10,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-11’) as day11,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-12’) as day12,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-13’) as day13,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-14’) as day14,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-15’) as day15,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-16’) as day16,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-17’) as day17,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-18’) as day18,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-19’) as day19,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-20’) as day20,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-21’) as day21,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-22’) as day22,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-23’) as day23,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-24’) as day24,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-25’) as day25,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-26’) as day26,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-27’) as day27,
intersect_count(user_id_bit, dt, ‘2021-11-30’, ‘2021-12-28’) as day28

from mds_log_weblog_click_test1
这个sql 还有优化空间吗

30天留存profile.txt (1.4 MB)
建表语句.txt (2.7 KB)

单机48c,512G,8be,3fe混部,ssd硬盘
parallel_exchange_instance_num -1
parallel_fragment_exec_instance_num |32

建议把并行度调低看下,set parallel_fragment_exec_instance_num=24

另外看下select * from statistics.table_statistic_v1 where table_name like '% mds_log_weblog_click_test1%'的输出,如果没有,执行下 ```
ANALYZE TABLE mds_log_weblog_click_test1

之后再执行这个sql

查出来是这些

并行度调低怎么样呢?

调底后现在14s多

set streaming_preaggregation_mode = force_streaming;
set new_planner_agg_stage = 1;
设置这个再跑下看下结果

快了几十毫秒,没有明显的提升,别的公司求留存的话,都在多长时间呢,多大的数据量,还有别的更好求留存的函数吗

发下profile

另外也可以参考下https://www.bilibili.com/video/BV13T4y1o7K8?from=search&seid=314660585554512892&spm_id_from=333.337.0.0,看看bitmap能满足嘛

留存profile.txt (182.3 KB)

现在这个求留存数据结构也是用的bitmap

你好,2.2版本推出了新的留存函数,会有比较大的优化,可参考retention函数

retain 最多同时只能保留31个条件