使用FE + CN组合后,2000万的单表聚合查询花了25秒

数据存储CN的节点上,开了Cache,元数据存在了对象存储里

每次查都这么慢吗?拿一下profile看下

每次都会,而且其他数据稍微大的表都会

Explain String |
------------------------------------------------------------------------------------------------------------------------------------------------------------+
[0mSummary [0m |
[0mQueryId: a1059f89-cfbe-11ee-b607-00163e02b0aa [0m |
[0mVersion: 3.2.2-269e832 [0m |
[0mState: Finished [0m |
[0mTotalTime: 21s81ms [0m |
[0mExecutionTime: 21s70ms [Scan: 20s813ms (98.78%), Network: 383.968ms (1.82%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 72.102ms (0.34%)] [0m|
[0mCollectProfileTime: 5ms [0m |
[0mFrontendProfileMergeTime: 1.655ms [0m |
[0mQueryPeakMemoryUsage: 1.035 GB, QueryAllocatedMemoryUsage: 33.914 GB [0m |
[0mTop Most Time-consuming Nodes: [0m |
[1m [31m1. OLAP_SCAN (id=0) : 20s826ms (94.22%) [0m |
[0m2. EXCHANGE (id=3) [SHUFFLE]: 708.872ms (3.21%) [0m |
[0m3. AGGREGATION (id=4) [merge, serialize]: 397.418ms (1.80%) [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 128.068ms (0.58%) [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 39.556ms (0.18%) [0m |
[0m6. PROJECT (id=1) : 2.735ms (0.01%) [0m |
[0m7. EXCHANGE (id=6) [GATHER]: 643.603us (0.00%) [0m |
[0m8. RESULT_SINK: 190.507us (0.00%) [0m |
[0mTop Most Memory-consuming Nodes: [0m |
[0m1. OLAP_SCAN (id=0) : 103.692 MB [0m |
[0m2. AGGREGATION (id=4) [merge, serialize]: 76.671 MB [0m |
[0m3. EXCHANGE (id=3) [SHUFFLE]: 23.578 MB [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 1.284 MB [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 1.067 MB [0m |
[0m6. EXCHANGE (id=6) [GATHER]: 29.195 KB [0m |
[0mNonDefaultVariables: [0m |
[0mcharacter_set_results: utf8 -> NULL [0m |
[0menable_adaptive_sink_dop: false -> true [0m |
[0menable_async_profile: true -> false [0m |
[0menable_profile: false -> true [0m |
[0msql_mode_v2: 32 -> 2097184 [0m |
[0msql_select_limit: 9223372036854775807 -> 200 [0m |
[0mtime_zone: Etc/UTC -> Asia/Shanghai [0m |
[0mFragment 0 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 357.602 KB, InstanceAllocatedMemoryUsage: 1.702 MB [0m |
│ [0mPrepareTime: 239.018us [0m |
└── [0mRESULT_SINK [0m |
│ [0mTotalTime: 190.507us (0.00%) [CPUTime: 190.507us] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mSinkType: MYSQL_PROTOCAL [0m |
└── [0mEXCHANGE (id=6) [GATHER] [0m |
[0mEstimates: [row: 200, cpu: 0.00, memory: 0.00, network: 0.00, cost: 22944559882.56] [0m |
[0mTotalTime: 643.603us (0.00%) [CPUTime: 223.713us, NetworkTime: 419.890us] [0m |
[0mOutputRows: 200 [0m |
[0mPeakMemory: 29.195 KB, AllocatedMemory: 72.952 KB [0m |
[0m |
[0mFragment 1 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 953.916 MB, InstanceAllocatedMemoryUsage: 2.884 GB [0m |
│ [0mPrepareTime: 440.116us [0m |
└── [0mDATA_STREAM_SINK (id=6) [0m |
│ [0mPartitionType: UNPARTITIONED [0m |
└── [0mAGGREGATION (id=5) [finalize, update] [0m |
│ [0mEstimates: [row: 200, cpu: 3524286999.42, memory: 13637.70, network: 0.00, cost: 22944532607.16] [0m |
│ [0mTotalTime: 39.556ms (0.18%) [CPUTime: 39.556ms] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mPeakMemory: 1.067 MB, AllocatedMemory: 9.557 MB [0m |
│ [0mAggExprs: [count(11: device_id)] [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=4) [merge, serialize] [0m |
│ [0mEstimates: [row: 36639835, cpu: 3524286999.42, memory: 3524286999.42, network: 0.00, cost: 21182361832.05] [0m |
│ [0mTotalTime: 397.418ms (1.80%) [CPUTime: 397.418ms] [0m |
│ [0mOutputRows: 5.206M (5205781) [0m |
│ [0mPeakMemory: 76.671 MB, AllocatedMemory: 2.042 GB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mEXCHANGE (id=3) [SHUFFLE] [0m |
[0mEstimates: [row: 36639835, cpu: 3524286999.42, memory: 0.00, network: 0.00, cost: 12371644333.49] [0m |
[0mTotalTime: 708.872ms (3.21%) [CPUTime: 325.324ms, NetworkTime: 383.548ms] [0m |
[0mOutputRows: 5.218M (5218053) [0m |
[0mPeakMemory: 23.578 MB, AllocatedMemory: 1.800 GB [0m |
[0m |
[0mFragment 2 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 204.824 MB, InstanceAllocatedMemoryUsage: 31.029 GB [0m |
│ [0mPrepareTime: 756.271us [0m |
└── [0mDATA_STREAM_SINK (id=3) [0m |
│ [0mPartitionType: HASH_PARTITIONED [0m |
│ [0mPartitionExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=2) [serialize, update] [0m |
│ [0mEstimates: [row: 36639835, cpu: 3560926834.92, memory: 3524286999.42, network: 0.00, cost: 10609500833.77] [0m |
│ [0mTotalTime: 128.068ms (0.58%) [CPUTime: 128.068ms] [0m |
│ [0mOutputRows: 5.218M (5218053) [0m |
│ [0mPeakMemory: 1.284 MB, AllocatedMemory: 20.400 MB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mPROJECT (id=1) [0m |
│ [0mEstimates: [row: ?, cpu: ?, memory: ?, network: ?, cost: ?] [0m |
│ [0mTotalTime: 2.735ms (0.01%) [CPUTime: 2.735ms] [0m |
│ [0mOutputRows: 5.218M (5218053) [0m |
│ [0mExpression: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [1m [31mOLAP_SCAN (id=0) [0m |
[1m [31mEstimates: [row: 36639835, cpu: 3560926834.92, memory: 0.00, network: 0.00, cost: 1780463417.46] [0m |
[1m [31mTotalTime: 20s826ms (94.22%) [CPUTime: 13.125ms, ScanTime: 20s813ms] [0m |
[1m [31mOutputRows: 5.218M (5218053) [0m |
[1m [31mPeakMemory: 103.692 MB, AllocatedMemory: 30.032 GB [0m |
[1m [31mTable: : ods_lg_sdk_init [0m |
[1m [31mSubordinateOperators: [0m |
[1m [31mCHUNK_ACCUMULATE [0m |
[1m [31mDetail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] [0m |
[1m [31mIOTaskExecTime: 7s57ms [min=75.969ms, max=20s811ms] [0m |
[1m [31mIOStatistics: [0m |
[1m [31mIOTimeRemote: 6s355ms [min=0ns, max=19s155ms] [0m |
[1m [31mIOTimeTotal: 6s770ms [min=0ns, max=20s216ms] [0m |
[1m [31mIOTime: 6s770ms [min=0ns, max=20s216ms] [0m |
[1m [31mSegmentRead: 5s867ms [min=28.846ms, max=18s226ms] [0m |
[1m [31mBlockFetch: 5s753ms [min=3.537us, max=18s1ms] [0m |
[1m [31mIOTaskWaitTime: 936.652us [min=20.528us, max=4.496ms] [0m |
[0m |

拿下profile,不是这个

嗯~~这个profile是去哪拿的?

是这个吗
profile.txt (7.8 KB)

不是,这是执行计划

profile.txt (70.1 KB) 这个应该没错了

看起来没开cache呀,另外建表怎么写的,这个表多大数据量

这块应该用id呀,用dt分桶数据分布极其不均衡,另外整个表数据才几个g?14个桶应该就够了

另外datacache开启参考 https://docs.starrocks.io/zh/docs/2.5/data_source/data_cache/

4.4G,可是需要用dt这个时间字段去用作分区的依据

分区没问题,但是你现在用dt做了分桶( DISTRIBUTED BY HASH( dt )),这个肯定数据倾斜很严重了,分桶一定是类似于user id那种基数比较高的列,这样数据才会被打散

建表改成这样试试

    CREATE TABLE ods_lg_sdk_init (
    id int(11) NOT NULL COMMENT "",
    dt date NOT NULL COMMENT "",
    hh int(11) NOT NULL COMMENT "",
    record_date datetime NOT NULL COMMENT "",
    game_id int(11) NOT NULL COMMENT "",
    game_site varchar(50) NOT NULL COMMENT "",
    aid varchar(50) NOT NULL COMMENT "",
    channel_id int(11) NOT NULL COMMENT "",
    throwin_id int(11) NOT NULL COMMENT "",
    device_type int(11) NOT NULL COMMENT "",
    device_id varchar(50) NOT NULL COMMENT "",
    client_version varchar(50) NULL COMMENT "",
    server_version varchar(50) NULL COMMENT "",
    game_version_code varchar(50) NULL COMMENT "",
    game_version_name varchar(50) NULL COMMENT "",
    channel_version varchar(50) NULL COMMENT "",
    ip varchar(50) NOT NULL COMMENT "",
    is_first_device tinyint(4) NOT NULL COMMENT "",
    log_time bigint(20) NOT NULL COMMENT "",
    is_first_active tinyint(4) NULL COMMENT "",
    cid varchar(25) NULL COMMENT "",
    material_id varchar(63) NULL COMMENT ""
    ) ENGINE=OLAP
    PRIMARY KEY( id , dt )
    COMMENT "OLAP"
    PARTITION BY date_trunc('month', dt)
    DISTRIBUTED BY HASH( id ) BUCKETS 7
    ORDER BY( record_date )
    PROPERTIES (
    "replication_num" = "1",
    "datacache.partition_duration" = "1 years",
    "datacache.enable" = "true",
    "storage_volume" = "def_volume",
    "enable_async_write_back" = "false",
    "enable_persistent_index" = "true",
    "persistent_index_type" = "LOCAL",
    "compression" = "LZ4"
    );

好的,我试试

改用id分桶后确实快了

现在多少s

开了datacache没