Used: 438105027512, Limit: 438103947386. Mem usage has exceed the limit of single query, You can change the limit by set session variable exec_mem_limit目前我设置比较这个限制的值大,还是报错超出限制,

这个两个我执行了,好像还是不生效,我觉得是不是有参数冲突呢

2.2.1默认已经开启了pipeline性能也很好,2.3.0是修复了bug和一些优化吧,也建议您升级至2.3.1

感谢指点,好的我升级一下

先把版本和参数配置好再测试一下

be 配置参数,您这边有什么建议吗?有需要增加配置来提升性能

fe.conf
enable_pipeline_engine=true
parallel_fragment_exec_instance_num=60
pipeline_dop=0
exec_mem_limit = 515396075520

be.conf
mem_limit=95%

两个大表关联查询可以考虑
Colocate Join @ Colocate_join @ StarRocks Docs

之前让您帮忙看那个sql7和sql8 那两个hash join 那个是不是非常适用colocate_with

你那两大SQL 都是大表join小表 只要大表左边,小表右边就ok。

数据量有点大,几个cte查询加起来处理的数据较多,建议把一个sql拆一下

就是把那几个cte结果插入两个colocate的结果表,然后两个大表关联就很好。
单独跑那几个CTE没报错吧?

还没有跑呢,我把版本升级到了2.3.0,在导入数据,下午测试一下,有结果和您反馈

我觉跟建表关系挺大的,我给你之前测试一个sql

select  o_orderpriority,  count(*) as order_count
from  orders
where
  o_orderdate >= date '1993-07-01'
  and o_orderdate < date '1993-07-01' + interval '3' month
  and exists (
    select  * from  lineitem
    where  l_orderkey = o_orderkey and l_commitdate < l_receiptdate
  )
group by o_orderpriority
order by o_orderpriority;

这个两个表我建表语句是这样的
create table lineitem (
l_orderkey integer,
l_partkey integer,
l_suppkey integer,
l_linenumber integer,
l_quantity decimal (15, 2),
l_extendedprice decimal (15, 2),
l_discount decimal (15, 2),
l_tax decimal (15, 2),
l_returnflag char (1),
l_linestatus char (1),
l_shipdate date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char (25),
l_shipmode char (10),
l_comment varchar (44),
t char (15)
)ENGINE=OLAP
DUPLICATE KEY(l_orderkey)
COMMENT “OLAP”
DISTRIBUTED BY HASH(l_suppkey,l_partkey,l_suppkey,l_returnflag,l_linestatus) BUCKETS 800
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“bloom_filter_columns” = “l_orderkey,l_partkey”,
“storage_format” = “DEFAULT”
);

create table orders (
o_orderkey integer,
o_custkey integer,
o_orderstatus char (1),
o_totalprice decimal (15, 2),
o_orderdate date,
o_orderpriority char (15),
o_clerk char (15),
o_shippriority integer,
o_comment varchar (79),
t char (15)
)ENGINE=OLAP
DUPLICATE KEY(o_orderkey)
COMMENT “OLAP”
DISTRIBUTED BY HASH(o_orderkey,o_custkey) BUCKETS 300
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“bloom_filter_columns” = “o_orderkey,o_custkey”,
“storage_format” = “DEFAULT”
);
这个sql 2分多钟就出来了

DISTRIBUTED BY HASH( l_suppkey , l_partkey , l_suppkey , l_returnflag , l_linestatus ) BUCKETS 800
这个值太多了 1-2个就好了

现在都是1-2个了,这个是之前的

bloom_filter_columns 如果没用上bloom索引 可以不加

嗯嗯,如何判断用没有用上呢

Bloomfilter 索引 就是explain cost搜一下 bloom
o_orderdate 如您那个SQL 这个可以考虑加分区键
批量创建分区
数据分布 @ Data_distribution @ StarRocks Docs

CREATE TABLE site_access (
    datekey DATE,
    site_id INT,
    city_code SMALLINT,
    user_name VARCHAR(32),
    pv BIGINT DEFAULT '0'
)
ENGINE=olap
DUPLICATE KEY(datekey, site_id, city_code, user_name)
PARTITION BY RANGE (datekey) (
    START ("2019-01-01") END ("2021-01-01") EVERY (INTERVAL 1 YEAR),
    START ("2021-01-01") END ("2021-05-01") EVERY (INTERVAL 1 MONTH),
    START ("2021-05-01") END ("2021-05-04") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(site_id) BUCKETS 10
PROPERTIES (
    "replication_num" = "1"
);

START (“2019-01-01”) END (“2021-01-01”) EVERY (INTERVAL 1 YEAR),
START (“2021-01-01”) END (“2021-05-01”) EVERY (INTERVAL 1 MONTH),
START (“2021-05-01”) END (“2021-05-04”) EVERY (INTERVAL 1 DAY)

这个分区时间是跟据导入数据分区来设定吗

使用DAY就好了