为了更快的定位您的问题,请提供以下信息,谢谢
【详述】如何查询limit + union all 多个条件 ,实现采样
【背景】
需要实现类似采样的sql ,
目前已经使用
select * from table where rand() < 0.001 limit 1000;
但觉得不够快 ,希望可以实现在每个分区下 每个前缀索引minutes 中 随机limit 1 一条 快速返回1000条数据
例如
select * from table where hours =2025031318 and minutes = 202503131820 limit 1
union all
select * from table where hours =2025031318 and minutes = 202503131821 limit 1
union all
select * from table where hours =2025031318 and minutes = 202503131822 limit 1
…
union all
select * from table where hours =2025031302 and minutes = 202503131200 limit 1
但目前报Getting syntax error at line 2, column 0. Detail message: Unexpected input ‘union’, the most similar input is {, ‘;’}.
union 或union all都不行
WITH time_combinations AS (
SELECT
CASE
WHEN minute_val < 1900 THEN 2025031318
ELSE 2025031319
END AS hours,
CONCAT(‘20250313’, LPAD(minute_val, 4, ‘0’)) AS minutes
FROM (
SELECT 1820 + n AS minute_val
FROM (
SELECT a + b10 + c100 AS n
FROM (
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS a,
(
SELECT 0 AS b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6
) AS b,
(
SELECT 0 AS c
) AS c
WHERE a + b10 + c100 <= 60
) AS numbers
) AS minutes
),
ranked_data AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.hours, t.minutes ORDER BY t.minutes) AS rn
FROM table t
JOIN time_combinations tc ON t.hours = tc.hours AND t.minutes = tc.minutes
)
SELECT *
FROM ranked_data
WHERE rn <= 17;
with + ROW_NUMBER() OVER +join rank , 试过了 很慢 估计还是要扫所有数据作rank
请问有类似其他的语法可以实现?
【是否存算分离】否
【StarRocks版本】3.2.8
【集群规模】1台
【联系方式】社区群8-tempo