【背景】
- HIVE的数据表 ads.ads_ec_sams_app_result_monitor_di 2023-06-26 分区有313条数据
0: jdbc:hive2://IP.2:2181,IP.8:21> select count(*) from ads.ads_ec_sams_app_result_monitor_di where ts=‘2023-06-26’;
± ShuffleQueryStage 0
± Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=588]
± *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#624L])
± *(1) Project
± *(1) ColumnarToRow
± FileScan orc ads.ads_ec_sams_app_result_monitor_di[ts#620] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[cosn://wmhdprd-1313306507/user/hive/warehouse/ads.db/ads_ec_sams_app_r…, PartitionFilters: [isnotnull(ts#620), (ts#620 = 2023-06-26)], PushedFilters: [], ReadSchema: struct<>
(org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:49:55,753 [INFO] [SparkSQLSessionManager-exec-pool: Thread-676] Processing c0l0f9l’s query[897c7772-4d1c-4b53-98f4-ae230ae1eeea]: RUNNING_STATE -> FINISHED_STATE, time taken: 7.585 seconds (org.apache.kyuubi.engine.spark.operation.ExecuteStatement(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:49:55.753 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[5d92f18c-6574-4dfc-ad27-b2f80b367c9b] in FINISHED_STATE
2023-06-28 10:49:55.753 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing c0l0f9l’s query[5d92f18c-6574-4dfc-ad27-b2f80b367c9b]: RUNNING_STATE -> FINISHED_STATE, time taken: 7.586 seconds
±----------+
| count(1) |
±----------+
| 313 |
±----------+
1 row selected (7.591 seconds)
0: jdbc:hive2://IP.2:2181,IP.8:21>
- HIVE建表语句如下所示
±---------------------------------------------------+
| createtab_stmt |
±---------------------------------------------------+
| CREATE TABLEads.ads_ec_sams_app_result_monitor_di(
client_typeSTRING COMMENT ‘客户端’,
group_idBIGINT COMMENT ‘监控分组ID:1-整体;2-事件;3-页面’,
group_nameSTRING COMMENT ‘监控分组名称:整体;事件;页面’,
sub_group_nameSTRING COMMENT ‘子监控分组名称’,
kpi_nameSTRING COMMENT ‘KPI指标PV;UV’,
kpi_valueDECIMAL(38,4) COMMENT ‘KPI指标值’,
day_tb_rateDECIMAL(38,4) COMMENT ‘日同比’,
week_hb_rateDECIMAL(38,4) COMMENT ‘周环比’,
day_warning_flagINT COMMENT ‘告警标识:0-正常;1-警告’,
week_warning_flagINT COMMENT ‘告警标识:0-正常;1-警告’,
new_flagINT COMMENT ‘新增标识:0-正常;1-新增’,
etl_load_timeSTRING COMMENT ‘etl时间’,
tsSTRING COMMENT ‘日期分区’)
USING orc
PARTITIONED BY (ts)
COMMENT ‘埋点监控结果表’
LOCATION ‘cosn://wmhdprd-1313306507/user/hive/warehouse/ads.db/ads_ec_sams_app_result_monitor_di’
TBLPROPERTIES (
‘transient_lastDdlTime’ = ‘1681988466’,
‘external.table.purge’ = ‘true’)
|
±---------------------------------------------------+
1 row selected (0.125 seconds)
- 这个时候将ads.ads_ec_sams_app_result_monitor_di 2023-06-26 分区数据写到新表ads.ads_ec_sams_app_result_monitor_di_test
insert overwrite table ads.ads_ec_sams_app_result_monitor_di_test partition(ts) select * from ads.ads_ec_sams_app_result_monitor_di where ts=‘2023-06-26’;
- 如下所示: 写入成功后查询数据量313。
0: jdbc:hive2://IP.2:2181,IP.8:21> select count() from ads.ads_ec_sams_app_result_monitor_di_test where ts = ‘2023-06-26’;
2023-06-28 10:53:14.870 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing c0l0f9l’s query[4306394d-1a3a-420e-b003-a4fbd5fa11d2]: PENDING_STATE -> RUNNING_STATE, statement:
select count() from ads.ads_ec_sams_app_result_monitor_di_test where ts = ‘2023-06-26’
2023-06-28 10:53:14,871 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Processing c0l0f9l’s query[d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: PENDING_STATE -> RUNNING_STATE, statement:
select count(*) from ads.ads_ec_sams_app_result_monitor_di_test where ts = ‘2023-06-26’ (org.apache.kyuubi.engine.spark.operation.ExecuteStatement(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:14,871 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743]
Spark application name: kyuubi_CONNECTION_SPARK_SQL_c0l0f9l_6b41120c-35f6-4049-a1ab-99851e9affa1
application ID: application_1685524150303_38512
application web UI: http://IP.17:5004/proxy/application_1685524150303_38512,http://IP.5:5004/proxy/application_1685524150303_38512
master: yarn
deploy mode: cluster
version: 3.2.2
Start time: 2023-06-28T10:07:57.785
User: c0l0f9l (org.apache.kyuubi.engine.spark.operation.ExecuteStatement(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:14,893 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Execute in full collect mode (org.apache.kyuubi.engine.spark.operation.ExecuteStatement(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:14,950 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] It took 22 ms to list leaf files for 1 paths. (org.apache.spark.sql.execution.datasources.InMemoryFileIndex(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,953 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Pruning directories with: isnotnull(ts#685),(ts#685 = 2023-06-26) (org.apache.spark.sql.execution.datasources.DataSourceStrategy(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,953 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Pushed Filters: (org.apache.spark.sql.execution.datasources.FileSourceStrategy(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,953 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Post-Scan Filters: (org.apache.spark.sql.execution.datasources.FileSourceStrategy(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,953 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Output Data Schema: struct<> (org.apache.spark.sql.execution.datasources.FileSourceStrategy(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,962 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Block broadcast_41 stored as values in memory (estimated size 398.0 KiB, free 2004.2 MiB) (org.apache.spark.storage.memory.MemoryStore(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,974 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Block broadcast_41_piece0 stored as bytes in memory (estimated size 45.5 KiB, free 2004.2 MiB) (org.apache.spark.storage.memory.MemoryStore(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,976 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Created broadcast 41 from collect at ExecuteStatement.scala:104 (org.apache.spark.SparkContext(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,976 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Selected 1 partitions out of 1, pruned 0.0% partitions. (org.apache.spark.sql.execution.datasources.InMemoryFileIndex(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,977 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes. (org.apache.spark.sql.execution.FileSourceScanExec(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:14,980 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Job 25 started with 1 stages, 1 active jobs running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:14,982 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Stage 37.0 started with 1 tasks, 1 active stages running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15,272 [INFO] [spark-listener-group-shared] Finished stage: Stage(37, 0); Name: ‘collect at ExecuteStatement.scala:104’; Status: succeeded; numTasks: 1; Took: 290 msec (org.apache.spark.kyuubi.SQLOperationListener(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:15,272 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Job 25 succeeded, 0 active jobs running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15,282 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Starting job: collect at ExecuteStatement.scala:104 (org.apache.spark.SparkContext(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:15,284 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Job 26 started with 2 stages, 1 active jobs running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15,284 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Stage 39.0 started with 1 tasks, 1 active stages running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15,469 [INFO] [spark-listener-group-shared] Finished stage: Stage(39, 0); Name: ‘collect at ExecuteStatement.scala:104’; Status: succeeded; numTasks: 1; Took: 185 msec (org.apache.spark.kyuubi.SQLOperationListener(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:15,470 [INFO] [spark-listener-group-shared] Query [d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: Job 26 succeeded, 0 active jobs running (org.apache.spark.kyuubi.SQLOperationListener(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15,470 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Job 26 finished: collect at ExecuteStatement.scala:104, took 0.187502 s (org.apache.spark.scheduler.DAGScheduler(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:15,471 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Final plan: *(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#686L])
± ShuffleQueryStage 0
± Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=656]
± *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#702L])
± *(1) Project
± *(1) ColumnarToRow
± FileScan orc ads.ads_ec_sams_app_result_monitor_di_test[ts#685] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[cosn://wmhdprd-1313306507/user/hive/warehouse/ads.db/ads_ec_sams_app_r…, PartitionFilters: [isnotnull(ts#685), (ts#685 = 2023-06-26)], PushedFilters: [], ReadSchema: struct<>
(org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec(org.apache.spark.internal.Logging.logInfo:57))
2023-06-28 10:53:15,472 [INFO] [SparkSQLSessionManager-exec-pool: Thread-743] Processing c0l0f9l’s query[d7bd8a64-9bbd-478c-bed7-aac12f8ba8d9]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.601 seconds (org.apache.kyuubi.engine.spark.operation.ExecuteStatement(org.apache.kyuubi.Logging.info:59))
2023-06-28 10:53:15.472 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[4306394d-1a3a-420e-b003-a4fbd5fa11d2] in FINISHED_STATE
2023-06-28 10:53:15.472 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing c0l0f9l’s query[4306394d-1a3a-420e-b003-a4fbd5fa11d2]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.602 seconds
±----------+
| count(1) |
±----------+
| 313 |
±----------+
1 row selected (0.608 seconds)
0: jdbc:hive2://IP.2:2181,IP.8:21>
- 但从StarRocks Catalog查询这个数据表,显示没有数据
[Wed Jun 28 02:42:31 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] > select count() from hive.ads.ads_ec_sams_app_result_monitor_di_test;
±---------+
| count() |
±---------+
| 0 |
±---------+
1 row in set (0.014 sec)[Wed Jun 28 02:55:18 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] >
- catalog
[Wed Jun 28 02:56:52 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] >
[Wed Jun 28 02:56:57 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] > show create catalog hive \G;
*************************** 1. row ***************************
Catalog: hive
Create Catalog: CREATE EXTERNAL CATALOGhive
PROPERTIES (“aws.s3.access_key” = “AKyV",
“aws.s3.secret_key” = "HH5m”,
“aws.s3.endpoint” = “cos.ap-guangzhou.myqcloud.com”,
“aws.s3.use_instance_profile” = “false”,
“aws.s3.region” = “ap-guangzhou”,
“hive.metastore.uris” = “thrift://IP.*:7004”,
“type” = “hive”
)
1 row in set (0.007 sec)ERROR: No query specified
[Wed Jun 28 02:57:00 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] >
[Wed Jun 28 02:57:01 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] > show catalogs;
±----------------±---------±-----------------------------------------------------------------+
| Catalog | Type | Comment |
±----------------±---------±-----------------------------------------------------------------+
| default_catalog | Internal | An internal catalog contains this cluster’s self-managed tables. |
| hive | hive | NULL |
| hive_poc | hive | NULL |
±----------------±---------±-----------------------------------------------------------------+
3 rows in set (0.007 sec)[Wed Jun 28 02:57:03 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] >
[Wed Jun 28 02:57:03 2023]:[‘root’]>[IP.89]:[demo] [EMR云上企业版集群] >