拿下profile,不是这个
嗯~~这个profile是去哪拿的?
不是,这是执行计划
这块应该用id呀,用dt分桶数据分布极其不均衡,另外整个表数据才几个g?14个桶应该就够了
4.4G,可是需要用dt这个时间字段去用作分区的依据
分区没问题,但是你现在用dt做了分桶( DISTRIBUTED BY HASH( dt )),这个肯定数据倾斜很严重了,分桶一定是类似于user id那种基数比较高的列,这样数据才会被打散
建表改成这样试试
CREATE TABLE ods_lg_sdk_init (
id int(11) NOT NULL COMMENT "",
dt date NOT NULL COMMENT "",
hh int(11) NOT NULL COMMENT "",
record_date datetime NOT NULL COMMENT "",
game_id int(11) NOT NULL COMMENT "",
game_site varchar(50) NOT NULL COMMENT "",
aid varchar(50) NOT NULL COMMENT "",
channel_id int(11) NOT NULL COMMENT "",
throwin_id int(11) NOT NULL COMMENT "",
device_type int(11) NOT NULL COMMENT "",
device_id varchar(50) NOT NULL COMMENT "",
client_version varchar(50) NULL COMMENT "",
server_version varchar(50) NULL COMMENT "",
game_version_code varchar(50) NULL COMMENT "",
game_version_name varchar(50) NULL COMMENT "",
channel_version varchar(50) NULL COMMENT "",
ip varchar(50) NOT NULL COMMENT "",
is_first_device tinyint(4) NOT NULL COMMENT "",
log_time bigint(20) NOT NULL COMMENT "",
is_first_active tinyint(4) NULL COMMENT "",
cid varchar(25) NULL COMMENT "",
material_id varchar(63) NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY( id , dt )
COMMENT "OLAP"
PARTITION BY date_trunc('month', dt)
DISTRIBUTED BY HASH( id ) BUCKETS 7
ORDER BY( record_date )
PROPERTIES (
"replication_num" = "1",
"datacache.partition_duration" = "1 years",
"datacache.enable" = "true",
"storage_volume" = "def_volume",
"enable_async_write_back" = "false",
"enable_persistent_index" = "true",
"persistent_index_type" = "LOCAL",
"compression" = "LZ4"
);
好的,我试试
改用id分桶后确实快了
现在多少s
开了datacache没
一秒左右,DataCache是要在profile里面的BlockCacheReadBytes看吗
好的好的,感谢


