Spark 批量导入报错

CDH集群 Spark版本 2.4.0
StarRocks版本 2.5.5

使用Spark 2.4.0客户端会报错,java.lang.NoClassDefFoundError: org/slf4j/Logger
看StarRocks lib下面使用的是Spark 2.4.6版本的jar包

于是使用Spark 2.4.6版本的客户端。

yarn集群报错如下

  • 23/06/08 08:02:29 INFO dpp.SparkDpp: Start to process rollup tree:indexid: 3528136
    
  • 23/06/08 08:02:29 INFO dpp.SparkDpp: bucket key map:{3528134_0=0}
    
  • 23/06/08 08:02:29 INFO dpp.SparkDpp: no data for file file group:EtlFileGroup{sourceType=FILE, filePaths=[hdfs://cdh1:8020/user/hive/warehouse/mydb.db/mytable], fileFieldNames=[col1, col2], columnsFromPath=null, columnSeparator='001', lineDelimiter='
    
  • ', isNegative=false, fileFormat='orc', columnMappings={name=EtlColumnMapping{functionName='null', args=null, expr=col1}, id=EtlColumnMapping{functionName='null', args=null, expr=col2}}, where='', partitions=[3528134], hiveDbTableName='null', hiveTableProperties=null}
    
  • 23/06/08 08:02:29 INFO dpp.SparkDpp: start to process index:3528136
    
  • 23/06/08 08:02:29 WARN dpp.SparkDpp: spark dpp failed for exception:java.lang.NullPointerException
    
  • 23/06/08 08:02:29 INFO server.AbstractConnector: Stopped Spark@67e48ace{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
    
  • 23/06/08 08:02:29 INFO ui.SparkUI: Stopped Spark web UI at http://cdh11:2342
    
  • 23/06/08 08:02:29 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
    
  • 23/06/08 08:02:29 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
    
  • 23/06/08 08:02:29 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
    
  • 23/06/08 08:02:29 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
    
  • (serviceOption=None,
    
  •  services=List(),
    
  •  started=false)
    
  • 23/06/08 08:02:29 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    
  • 23/06/08 08:02:29 INFO memory.MemoryStore: MemoryStore cleared
    
  • 23/06/08 08:02:29 INFO storage.BlockManager: BlockManager stopped
    
  • 23/06/08 08:02:29 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
    
  • 23/06/08 08:02:29 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    
  • 23/06/08 08:02:29 INFO spark.SparkContext: Successfully stopped SparkContext
    
  • spark etl job run failed
    
  • 23/06/08 08:02:30 WARN etl.SparkEtlJob: java.lang.NullPointerException
    
  • 23/06/08 08:02:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
    
  • 23/06/08 08:02:30 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.)
    
  • 23/06/08 08:02:30 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
    
  • 23/06/08 08:02:30 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://cdh1:8020/user/root/.sparkStaging/application_1685545664808_9413
    
  • 23/06/08 08:02:30 INFO util.ShutdownHookManager: Shutdown hook called
    
  • 23/06/08 08:02:30 INFO util.ShutdownHookManager: Deleting directory /data1/yarn/nm/usercache/root/appcache/application_1685545664808_9413/spark-0c0e6b45-6354-4485-88df-a6aad3b515eb
    
  • 23/06/08 08:02:30 INFO util.ShutdownHookManager: Deleting directory /data2/yarn/nm/usercache/root/appcache/application_1685545664808_9413/spark-a329ba93-80ab-4a20-bc21-422330f20f10
    
  • 23/06/08 08:02:30 INFO util.ShutdownHookManager: Deleting directory /data3/yarn/nm/usercache/root/appcache/application_1685545664808_9413/spark-b112d1ea-e414-4688-850f-b1ba45dad4d9

Spark UI 上显示的报错:
org.apache.http.conn.HttpHostConnectException: Connect to cdh15:7345 [cdh15/***] failed: Connection refused (Connection refused)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:159)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:302)
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.methodAction(WebAppProxyServlet.java:515)
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:354)

你这个看着是连接cdh的节点连不上?cdh15的7345端口不存活?

是申请container的时候失败了,然后就退出了,试了10多次,每次都这样

我现在利用hive 外表的方式来导入,然后成功了

但是遇到一个问题,spark resource第二次使用的时候,就会报错
show load where label=“label04”\G
ErrorMsg: type:ETL_SUBMIT_FAIL; msg:Invalid library type: spark

然后我手工把hdfs上上传的jar包删除后,任务就可以正常运行

可以看下spark load的日志,具体是啥报错

第二次使用resource 的时候, label很快就显示报错这个
ErrorMsg: type:ETL_SUBMIT_FAIL; msg:Invalid library type: spark

然后spark_launcher_log下压根就不打印报错。

我把HDFS文件删除后,重新运行就正常了,不至于每次都要上传jar 包到hdfs,然后运行前都要删除一次吧

上面写错了,不是不打印报错,是日志都不打印

看下fe.out里面有错误么

还有fe.log里面

023-06-08 12:11:46,408 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,411 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,414 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,414 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,416 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,417 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,419 INFO (pending_load_task_scheduler_pool-1|56338) [SparkLoadPendingTask.executeTask():117] begin to execute spark pending task. load job id: 3639985
2023-06-08 12:11:46,419 INFO (pending_load_task_scheduler_pool-1|56338) [SparkRepository.initRepository():105] start to init remote repository. local dpp: /data/starrocks-2.3.0/fe/spark-dpp/spark-dpp-1.0.0-jar-with-dependencies.jar
com.starrocks.common.LoadException: Invalid library type: spark
2023-06-08 12:11:46,421 WARN (pending_load_task_scheduler_pool-1|56338) [LoadJob.unprotectedExecuteCancel():589] LOAD_JOB=3639985, transaction_id={62586242}, error_msg={Failed to execute load with error: Invalid library type: spark}
2023-06-08 12:11:46,422 INFO (pending_load_task_scheduler_pool-1|56338) [DatabaseTransactionMgr.abortTransaction():1263] transaction:[TransactionState. txn_id: 62586242, label: label09, db id: 3290466, table id list: 3635988, callback id: 3639985, coordinator: FE: 172.16.10.31, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1686226306402, commit time: -1, finish time: 1686226306421, total cost: 19ms, reason: Invalid library type: spark] successfully rollback

spark jar 打包名字需要是 spark-2x.zip

https://docs.starrocks.io/zh-cn/latest/loading/SparkLoad#配置-spark-客户端

因为我试过好几个spark的版本(spark 2.4.0 spark 2.4.5),最后使用2.5.5版本的自带的spark 2.4.6 版本才没有报错的, 每次都要重新修改FE的配置文件,然后重启FE,于是所幸把 文件名改为spark.zip 了,然后不修改FE的配置文件。

而且我第一次运行Spark load,也是成功的,文件也上传了,只有第二次重复运行才会报这个错误,当删除HDFS上的文件后,重新运行又是成功的, 这个就比较奇怪了。

我理解的是spark-24.zip 这样的格式也是错误的吧?

我先改为spark-2x.zip 试试

x 不用替换 :sweat_smile:

改了之后还有问题没?

改了之后问题修复了,谢谢大佬

帖子我已经标记你的答案是解决方案了

而且之前hive catalog无法访问的也一并解决了

1赞