这个两个我执行了,好像还是不生效,我觉得是不是有参数冲突呢
2.2.1默认已经开启了pipeline性能也很好,2.3.0是修复了bug和一些优化吧,也建议您升级至2.3.1
感谢指点,好的我升级一下
先把版本和参数配置好再测试一下
be 配置参数,您这边有什么建议吗?有需要增加配置来提升性能
fe.conf
enable_pipeline_engine=true
parallel_fragment_exec_instance_num=60
pipeline_dop=0
exec_mem_limit = 515396075520
be.conf
mem_limit=95%
之前让您帮忙看那个sql7和sql8 那两个hash join 那个是不是非常适用colocate_with
你那两大SQL 都是大表join小表 只要大表左边,小表右边就ok。
数据量有点大,几个cte查询加起来处理的数据较多,建议把一个sql拆一下
就是把那几个cte结果插入两个colocate的结果表,然后两个大表关联就很好。
单独跑那几个CTE没报错吧?
还没有跑呢,我把版本升级到了2.3.0,在导入数据,下午测试一下,有结果和您反馈
我觉跟建表关系挺大的,我给你之前测试一个sql
select o_orderpriority, count(*) as order_count
from orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate
)
group by o_orderpriority
order by o_orderpriority;
这个两个表我建表语句是这样的
create table lineitem (
l_orderkey integer,
l_partkey integer,
l_suppkey integer,
l_linenumber integer,
l_quantity decimal (15, 2),
l_extendedprice decimal (15, 2),
l_discount decimal (15, 2),
l_tax decimal (15, 2),
l_returnflag char (1),
l_linestatus char (1),
l_shipdate date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char (25),
l_shipmode char (10),
l_comment varchar (44),
t char (15)
)ENGINE=OLAP
DUPLICATE KEY(l_orderkey)
COMMENT “OLAP”
DISTRIBUTED BY HASH(l_suppkey
,l_partkey
,l_suppkey
,l_returnflag
,l_linestatus
) BUCKETS 800
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“bloom_filter_columns” = “l_orderkey,l_partkey”,
“storage_format” = “DEFAULT”
);
create table orders (
o_orderkey integer,
o_custkey integer,
o_orderstatus char (1),
o_totalprice decimal (15, 2),
o_orderdate date,
o_orderpriority char (15),
o_clerk char (15),
o_shippriority integer,
o_comment varchar (79),
t char (15)
)ENGINE=OLAP
DUPLICATE KEY(o_orderkey
)
COMMENT “OLAP”
DISTRIBUTED BY HASH(o_orderkey,o_custkey) BUCKETS 300
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“bloom_filter_columns” = “o_orderkey,o_custkey”,
“storage_format” = “DEFAULT”
);
这个sql 2分多钟就出来了
DISTRIBUTED BY HASH( l_suppkey
, l_partkey
, l_suppkey
, l_returnflag
, l_linestatus
) BUCKETS 800
这个值太多了 1-2个就好了
现在都是1-2个了,这个是之前的
bloom_filter_columns 如果没用上bloom索引 可以不加
嗯嗯,如何判断用没有用上呢
Bloomfilter 索引 就是explain cost搜一下 bloom
o_orderdate 如您那个SQL 这个可以考虑加分区键
批量创建分区
数据分布 @ Data_distribution @ StarRocks Docs
CREATE TABLE site_access (
datekey DATE,
site_id INT,
city_code SMALLINT,
user_name VARCHAR(32),
pv BIGINT DEFAULT '0'
)
ENGINE=olap
DUPLICATE KEY(datekey, site_id, city_code, user_name)
PARTITION BY RANGE (datekey) (
START ("2019-01-01") END ("2021-01-01") EVERY (INTERVAL 1 YEAR),
START ("2021-01-01") END ("2021-05-01") EVERY (INTERVAL 1 MONTH),
START ("2021-05-01") END ("2021-05-04") EVERY (INTERVAL 1 DAY)
)
DISTRIBUTED BY HASH(site_id) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);
START (“2019-01-01”) END (“2021-01-01”) EVERY (INTERVAL 1 YEAR),
START (“2021-01-01”) END (“2021-05-01”) EVERY (INTERVAL 1 MONTH),
START (“2021-05-01”) END (“2021-05-04”) EVERY (INTERVAL 1 DAY)
这个分区时间是跟据导入数据分区来设定吗
使用DAY就好了