使用FE + CN组合后,2000万的单表聚合查询花了25秒

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】部署改成FE + CN后查询速度慢了很多
【背景】从3be +1fe 变成 1fe+1cn(28核)
【业务影响】物化试图调度都跑不完,一直挤压着
【是否存算分离】是
【StarRocks版本】例如:3.1
【集群规模】例如:fe + cn

Explain String |
------------------------------------------------------------------------------------------------------------------------------------------------------------+
[0mSummary [0m |
[0mQueryId: e51b080d-cfb8-11ee-81b9-00163e04df48 [0m |
[0mVersion: 3.2.2-269e832 [0m |
[0mState: Finished [0m |
[0mTotalTime: 25s324ms [0m |
[0mExecutionTime: 25s306ms [Scan: 25s97ms (99.18%), Network: 535.017ms (2.11%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 65.753ms (0.26%)] [0m|
[0mCollectProfileTime: 5ms [0m |
[0mFrontendProfileMergeTime: 2.326ms [0m |
[0mQueryPeakMemoryUsage: 1.037 GB, QueryAllocatedMemoryUsage: 38.524 GB [0m |
[0mTop Most Time-consuming Nodes: [0m |
[1m [31m1. OLAP_SCAN (id=0) : 25s109ms (95.22%) [0m |
[0m2. EXCHANGE (id=3) [SHUFFLE]: 774.615ms (2.94%) [0m |
[0m3. AGGREGATION (id=4) [merge, serialize]: 341.515ms (1.30%) [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 108.059ms (0.41%) [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 30.267ms (0.11%) [0m |
[0m6. PROJECT (id=1) : 4.094ms (0.02%) [0m |
[0m7. EXCHANGE (id=6) [GATHER]: 604.364us (0.00%) [0m |
[0m8. RESULT_SINK: 215.313us (0.00%) [0m |
[0mTop Most Memory-consuming Nodes: [0m |
[0m1. OLAP_SCAN (id=0) : 106.061 MB [0m |
[0m2. AGGREGATION (id=4) [merge, serialize]: 83.741 MB [0m |
[0m3. EXCHANGE (id=3) [SHUFFLE]: 25.331 MB [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 1.269 MB [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 500.164 KB [0m |
[0m6. EXCHANGE (id=6) [GATHER]: 29.195 KB [0m |
[0mNonDefaultVariables: [0m |
[0mcharacter_set_results: utf8 -> NULL [0m |
[0menable_adaptive_sink_dop: false -> true [0m |
[0menable_async_profile: true -> false [0m |
[0menable_profile: false -> true [0m |
[0msql_mode_v2: 32 -> 2097184 [0m |
[0msql_select_limit: 9223372036854775807 -> 200 [0m |
[0mtime_zone: Etc/UTC -> Asia/Shanghai [0m |
[0mFragment 0 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 356.102 KB, InstanceAllocatedMemoryUsage: 1.701 MB [0m |
│ [0mPrepareTime: 328.977us [0m |
└── [0mRESULT_SINK [0m |
│ [0mTotalTime: 215.313us (0.00%) [CPUTime: 215.313us] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mSinkType: MYSQL_PROTOCAL [0m |
└── [0mEXCHANGE (id=6) [GATHER] [0m |
[0mEstimates: [row: 200, cpu: 0.00, memory: 0.00, network: 0.00, cost: 22943980631.40] [0m |
[0mTotalTime: 604.364us (0.00%) [CPUTime: 167.714us, NetworkTime: 436.650us] [0m |
[0mOutputRows: 200 [0m |
[0mPeakMemory: 29.195 KB, AllocatedMemory: 65.406 KB [0m |
[0m |
[0mFragment 1 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 955.047 MB, InstanceAllocatedMemoryUsage: 2.882 GB [0m |
│ [0mPrepareTime: 453.641us [0m |
└── [0mDATA_STREAM_SINK (id=6) [0m |
│ [0mPartitionType: UNPARTITIONED [0m |
└── [0mAGGREGATION (id=5) [finalize, update] [0m |
│ [0mEstimates: [row: 200, cpu: 3524198026.17, memory: 13637.70, network: 0.00, cost: 22943953356.00] [0m |
│ [0mTotalTime: 30.267ms (0.11%) [CPUTime: 30.267ms] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mPeakMemory: 500.164 KB, AllocatedMemory: 3.401 MB [0m |
│ [0mAggExprs: [count(11: device_id)] [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=4) [merge, serialize] [0m |
│ [0mEstimates: [row: 36638910, cpu: 3524198026.17, memory: 3524198026.17, network: 0.00, cost: 21181827067.51] [0m |
│ [0mTotalTime: 341.515ms (1.30%) [CPUTime: 341.515ms] [0m |
│ [0mOutputRows: 5.206M (5205707) [0m |
│ [0mPeakMemory: 83.741 MB, AllocatedMemory: 1.932 GB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mEXCHANGE (id=3) [SHUFFLE] [0m |
[0mEstimates: [row: 36638910, cpu: 3524198026.17, memory: 0.00, network: 0.00, cost: 12371332002.09] [0m |
[0mTotalTime: 774.615ms (2.94%) [CPUTime: 240.034ms, NetworkTime: 534.581ms] [0m |
[0mOutputRows: 5.218M (5217979) [0m |
[0mPeakMemory: 25.331 MB, AllocatedMemory: 1.800 GB [0m |
[0m |
[0mFragment 2 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 200.313 MB, InstanceAllocatedMemoryUsage: 35.641 GB [0m |
│ [0mPrepareTime: 462.247us [0m |
└── [0mDATA_STREAM_SINK (id=3) [0m |
│ [0mPartitionType: HASH_PARTITIONED [0m |
│ [0mPartitionExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=2) [serialize, update] [0m |
│ [0mEstimates: [row: 36638910, cpu: 3560836936.67, memory: 3524198026.17, network: 0.00, cost: 10609232989.00] [0m |
│ [0mTotalTime: 108.059ms (0.41%) [CPUTime: 108.059ms] [0m |
│ [0mOutputRows: 5.218M (5217979) [0m |
│ [0mPeakMemory: 1.269 MB, AllocatedMemory: 20.237 MB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mPROJECT (id=1) [0m |
│ [0mEstimates: [row: ?, cpu: ?, memory: ?, network: ?, cost: ?] [0m |
│ [0mTotalTime: 4.094ms (0.02%) [CPUTime: 4.094ms] [0m |
│ [0mOutputRows: 5.218M (5217979) [0m |
│ [0mExpression: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [1m [31mOLAP_SCAN (id=0) [0m |
[1m [31mEstimates: [row: 36638910, cpu: 3560836936.67, memory: 0.00, network: 0.00, cost: 1780418468.33] [0m |
[1m [31mTotalTime: 25s109ms (95.22%) [CPUTime: 12.325ms, ScanTime: 25s97ms] [0m |
[1m [31mOutputRows: 5.218M (5217979) [0m |
[1m [31mPeakMemory: 106.061 MB, AllocatedMemory: 36.490 GB [0m |
[1m [31mTable: : ods_lg_sdk_init [0m |
[1m [31mSubordinateOperators: [0m |
[1m [31mCHUNK_ACCUMULATE [0m |
[1m [31mDetail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] [0m |
[1m [31mIOTaskExecTime: 8s585ms [min=52.543ms, max=25s95ms] [0m |
[1m [31mIOStatistics: [0m |
[1m [31mIOTimeRemote: 7s791ms [min=0ns, max=23s913ms] [0m |
[1m [31mIOTimeTotal: 8s196ms [min=10.840ms, max=24s489ms] [0m |
[1m [31mIOTime: 8s196ms [min=10.840ms, max=24s489ms] [0m |
[1m [31mSegmentRead: 6s709ms [min=7.085ms, max=21s908ms] [0m |
[1m [31mBlockFetch: 6s635ms [min=7.354us, max=21s757ms] [0m |
[1m [31mIOTaskWaitTime: 1.312ms [min=42.420us, max=10.159ms] [0m |
[0m |

看起来是慢在scan上了,数据是存储在s3还是哪里?是否开启了cache?

数据存储CN的节点上,开了Cache,元数据存在了对象存储里

每次查都这么慢吗?拿一下profile看下

每次都会,而且其他数据稍微大的表都会

Explain String |
------------------------------------------------------------------------------------------------------------------------------------------------------------+
[0mSummary [0m |
[0mQueryId: a1059f89-cfbe-11ee-b607-00163e02b0aa [0m |
[0mVersion: 3.2.2-269e832 [0m |
[0mState: Finished [0m |
[0mTotalTime: 21s81ms [0m |
[0mExecutionTime: 21s70ms [Scan: 20s813ms (98.78%), Network: 383.968ms (1.82%), ResultDeliverTime: 0ns (0.00%), ScheduleTime: 72.102ms (0.34%)] [0m|
[0mCollectProfileTime: 5ms [0m |
[0mFrontendProfileMergeTime: 1.655ms [0m |
[0mQueryPeakMemoryUsage: 1.035 GB, QueryAllocatedMemoryUsage: 33.914 GB [0m |
[0mTop Most Time-consuming Nodes: [0m |
[1m [31m1. OLAP_SCAN (id=0) : 20s826ms (94.22%) [0m |
[0m2. EXCHANGE (id=3) [SHUFFLE]: 708.872ms (3.21%) [0m |
[0m3. AGGREGATION (id=4) [merge, serialize]: 397.418ms (1.80%) [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 128.068ms (0.58%) [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 39.556ms (0.18%) [0m |
[0m6. PROJECT (id=1) : 2.735ms (0.01%) [0m |
[0m7. EXCHANGE (id=6) [GATHER]: 643.603us (0.00%) [0m |
[0m8. RESULT_SINK: 190.507us (0.00%) [0m |
[0mTop Most Memory-consuming Nodes: [0m |
[0m1. OLAP_SCAN (id=0) : 103.692 MB [0m |
[0m2. AGGREGATION (id=4) [merge, serialize]: 76.671 MB [0m |
[0m3. EXCHANGE (id=3) [SHUFFLE]: 23.578 MB [0m |
[0m4. AGGREGATION (id=2) [serialize, update]: 1.284 MB [0m |
[0m5. AGGREGATION (id=5) [finalize, update]: 1.067 MB [0m |
[0m6. EXCHANGE (id=6) [GATHER]: 29.195 KB [0m |
[0mNonDefaultVariables: [0m |
[0mcharacter_set_results: utf8 -> NULL [0m |
[0menable_adaptive_sink_dop: false -> true [0m |
[0menable_async_profile: true -> false [0m |
[0menable_profile: false -> true [0m |
[0msql_mode_v2: 32 -> 2097184 [0m |
[0msql_select_limit: 9223372036854775807 -> 200 [0m |
[0mtime_zone: Etc/UTC -> Asia/Shanghai [0m |
[0mFragment 0 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 357.602 KB, InstanceAllocatedMemoryUsage: 1.702 MB [0m |
│ [0mPrepareTime: 239.018us [0m |
└── [0mRESULT_SINK [0m |
│ [0mTotalTime: 190.507us (0.00%) [CPUTime: 190.507us] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mSinkType: MYSQL_PROTOCAL [0m |
└── [0mEXCHANGE (id=6) [GATHER] [0m |
[0mEstimates: [row: 200, cpu: 0.00, memory: 0.00, network: 0.00, cost: 22944559882.56] [0m |
[0mTotalTime: 643.603us (0.00%) [CPUTime: 223.713us, NetworkTime: 419.890us] [0m |
[0mOutputRows: 200 [0m |
[0mPeakMemory: 29.195 KB, AllocatedMemory: 72.952 KB [0m |
[0m |
[0mFragment 1 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 953.916 MB, InstanceAllocatedMemoryUsage: 2.884 GB [0m |
│ [0mPrepareTime: 440.116us [0m |
└── [0mDATA_STREAM_SINK (id=6) [0m |
│ [0mPartitionType: UNPARTITIONED [0m |
└── [0mAGGREGATION (id=5) [finalize, update] [0m |
│ [0mEstimates: [row: 200, cpu: 3524286999.42, memory: 13637.70, network: 0.00, cost: 22944532607.16] [0m |
│ [0mTotalTime: 39.556ms (0.18%) [CPUTime: 39.556ms] [0m |
│ [0mOutputRows: 200 [0m |
│ [0mPeakMemory: 1.067 MB, AllocatedMemory: 9.557 MB [0m |
│ [0mAggExprs: [count(11: device_id)] [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=4) [merge, serialize] [0m |
│ [0mEstimates: [row: 36639835, cpu: 3524286999.42, memory: 3524286999.42, network: 0.00, cost: 21182361832.05] [0m |
│ [0mTotalTime: 397.418ms (1.80%) [CPUTime: 397.418ms] [0m |
│ [0mOutputRows: 5.206M (5205781) [0m |
│ [0mPeakMemory: 76.671 MB, AllocatedMemory: 2.042 GB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mEXCHANGE (id=3) [SHUFFLE] [0m |
[0mEstimates: [row: 36639835, cpu: 3524286999.42, memory: 0.00, network: 0.00, cost: 12371644333.49] [0m |
[0mTotalTime: 708.872ms (3.21%) [CPUTime: 325.324ms, NetworkTime: 383.548ms] [0m |
[0mOutputRows: 5.218M (5218053) [0m |
[0mPeakMemory: 23.578 MB, AllocatedMemory: 1.800 GB [0m |
[0m |
[0mFragment 2 [0m |
│ [0mBackendNum: 1 [0m |
│ [0mInstancePeakMemoryUsage: 204.824 MB, InstanceAllocatedMemoryUsage: 31.029 GB [0m |
│ [0mPrepareTime: 756.271us [0m |
└── [0mDATA_STREAM_SINK (id=3) [0m |
│ [0mPartitionType: HASH_PARTITIONED [0m |
│ [0mPartitionExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mAGGREGATION (id=2) [serialize, update] [0m |
│ [0mEstimates: [row: 36639835, cpu: 3560926834.92, memory: 3524286999.42, network: 0.00, cost: 10609500833.77] [0m |
│ [0mTotalTime: 128.068ms (0.58%) [CPUTime: 128.068ms] [0m |
│ [0mOutputRows: 5.218M (5218053) [0m |
│ [0mPeakMemory: 1.284 MB, AllocatedMemory: 20.400 MB [0m |
│ [0mGroupingExprs: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [0mPROJECT (id=1) [0m |
│ [0mEstimates: [row: ?, cpu: ?, memory: ?, network: ?, cost: ?] [0m |
│ [0mTotalTime: 2.735ms (0.01%) [CPUTime: 2.735ms] [0m |
│ [0mOutputRows: 5.218M (5218053) [0m |
│ [0mExpression: [2: dt, 3: hh, 5: game_id, 6: game_site, 7: aid, 8: channel_id, 9: throwin_id, …] [0m |
└── [1m [31mOLAP_SCAN (id=0) [0m |
[1m [31mEstimates: [row: 36639835, cpu: 3560926834.92, memory: 0.00, network: 0.00, cost: 1780463417.46] [0m |
[1m [31mTotalTime: 20s826ms (94.22%) [CPUTime: 13.125ms, ScanTime: 20s813ms] [0m |
[1m [31mOutputRows: 5.218M (5218053) [0m |
[1m [31mPeakMemory: 103.692 MB, AllocatedMemory: 30.032 GB [0m |
[1m [31mTable: : ods_lg_sdk_init [0m |
[1m [31mSubordinateOperators: [0m |
[1m [31mCHUNK_ACCUMULATE [0m |
[1m [31mDetail Timers: [ScanTime = IOTaskExecTime + IOTaskWaitTime] [0m |
[1m [31mIOTaskExecTime: 7s57ms [min=75.969ms, max=20s811ms] [0m |
[1m [31mIOStatistics: [0m |
[1m [31mIOTimeRemote: 6s355ms [min=0ns, max=19s155ms] [0m |
[1m [31mIOTimeTotal: 6s770ms [min=0ns, max=20s216ms] [0m |
[1m [31mIOTime: 6s770ms [min=0ns, max=20s216ms] [0m |
[1m [31mSegmentRead: 5s867ms [min=28.846ms, max=18s226ms] [0m |
[1m [31mBlockFetch: 5s753ms [min=3.537us, max=18s1ms] [0m |
[1m [31mIOTaskWaitTime: 936.652us [min=20.528us, max=4.496ms] [0m |
[0m |

拿下profile,不是这个

嗯~~这个profile是去哪拿的?

是这个吗
profile.txt (7.8 KB)

不是,这是执行计划

profile.txt (70.1 KB) 这个应该没错了

看起来没开cache呀,另外建表怎么写的,这个表多大数据量

这块应该用id呀,用dt分桶数据分布极其不均衡,另外整个表数据才几个g?14个桶应该就够了

另外datacache开启参考 https://docs.starrocks.io/zh/docs/2.5/data_source/data_cache/

4.4G,可是需要用dt这个时间字段去用作分区的依据

分区没问题,但是你现在用dt做了分桶( DISTRIBUTED BY HASH( dt )),这个肯定数据倾斜很严重了,分桶一定是类似于user id那种基数比较高的列,这样数据才会被打散

建表改成这样试试

    CREATE TABLE ods_lg_sdk_init (
    id int(11) NOT NULL COMMENT "",
    dt date NOT NULL COMMENT "",
    hh int(11) NOT NULL COMMENT "",
    record_date datetime NOT NULL COMMENT "",
    game_id int(11) NOT NULL COMMENT "",
    game_site varchar(50) NOT NULL COMMENT "",
    aid varchar(50) NOT NULL COMMENT "",
    channel_id int(11) NOT NULL COMMENT "",
    throwin_id int(11) NOT NULL COMMENT "",
    device_type int(11) NOT NULL COMMENT "",
    device_id varchar(50) NOT NULL COMMENT "",
    client_version varchar(50) NULL COMMENT "",
    server_version varchar(50) NULL COMMENT "",
    game_version_code varchar(50) NULL COMMENT "",
    game_version_name varchar(50) NULL COMMENT "",
    channel_version varchar(50) NULL COMMENT "",
    ip varchar(50) NOT NULL COMMENT "",
    is_first_device tinyint(4) NOT NULL COMMENT "",
    log_time bigint(20) NOT NULL COMMENT "",
    is_first_active tinyint(4) NULL COMMENT "",
    cid varchar(25) NULL COMMENT "",
    material_id varchar(63) NULL COMMENT ""
    ) ENGINE=OLAP
    PRIMARY KEY( id , dt )
    COMMENT "OLAP"
    PARTITION BY date_trunc('month', dt)
    DISTRIBUTED BY HASH( id ) BUCKETS 7
    ORDER BY( record_date )
    PROPERTIES (
    "replication_num" = "1",
    "datacache.partition_duration" = "1 years",
    "datacache.enable" = "true",
    "storage_volume" = "def_volume",
    "enable_async_write_back" = "false",
    "enable_persistent_index" = "true",
    "persistent_index_type" = "LOCAL",
    "compression" = "LZ4"
    );

好的,我试试

改用id分桶后确实快了