用的 starrocks 3.0 的operator 在k8s 做的部署,
通过 spark_conf 配置的
spark_conf["spark.jars.packages"] = "com.starrocks:starrocks-spark-connector-3.2_2.12:1.1.0"
spark driver log 里可以看到已经下载成功了
com.starrocks#starrocks-spark-connector-3.2_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-611e8ed2-f775-4292-a98d-6883783a9c54;1.0
confs: [default]
found com.starrocks#starrocks-spark-connector-3.2_2.12;1.1.0 in central
downloading https://repo1.maven.org/maven2/com/starrocks/starrocks-spark-connector-3.2_2.12/1.1.0/starrocks-spark-connector-3.2_2.12-1.1.0.jar ...
[SUCCESSFUL ] com.starrocks#starrocks-spark-connector-3.2_2.12;1.1.0!starrocks-spark-connector-3.2_2.12.jar (868ms)
:: resolution report :: resolve 671ms :: artifacts dl 870ms
:: modules in use:
com.starrocks#starrocks-spark-connector-3.2_2.12;1.1.0 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-611e8ed2-f775-4292-a98d-6883783a9c54
confs: [default]
1 artifacts copied, 0 already retrieved (23399kB/20ms)
但是写入数据的时候会出错, 写入用的spark dataframe, k8s 的环境部署的starrocks
(df.write.format("starrocks")
.option("starrocks.fe.http.url", "starrockscluster-sample-fe-service.starrocks.svc:8030")
.option("starrocks.fe.jdbc.url", "jdbc:mysql://starrockscluster-sample-fe-service.starrocks.svc:9030")
.option("starrocks.table.identifier", f"feature_aoc_34018_starrocks.{entity_name}")
.option("starrocks.user", "root")
.option("starrocks.password", "")
.mode("Append")
.save())
错误消息是
23/07/17 06:10:45 WARN TaskSetManager: Lost task 4.0 in stage 9.0 (TID 122) (10.42.106.251 executor 3): java.lang.ClassNotFoundException: com.starrocks.connector.spark.sql.write.StarRocksWriterFactory
at java.base/java.net.URLClassLoader.findClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Unknown Source)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.base/java.io.ObjectInputStream.readClassDesc(Unknown Source)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject0(Unknown Source)
at java.base/java.io.ObjectInputStream.readArray(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject0(Unknown Source)
at java.base/java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.base/java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject0(Unknown Source)
at java.base/java.io.ObjectInputStream.defaultReadFields(Unknown Source)
at java.base/java.io.ObjectInputStream.readSerialData(Unknown Source)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject0(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject(Unknown Source)
at java.base/java.io.ObjectInputStream.readObject(Unknown Source)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
23/07/17 06:10:45 INFO TaskSetManager: Lost task 1.0 in stage 9.0 (TID 119) on 10.42.106.251, executor 3: java.lang.ClassNotFoundException (com.starrocks.connector.spark.sql.write.StarRocksWriterFactory) [duplicate 1]
23/07/17 06:10:45 INFO TaskSetManager: Lost task 2.0 in stage 9.0 (TID 120) on 10.42.54.57, executor 2: java.lang.ClassNotFoundException (com.starrocks.connector.spark.sql.write.StarRocksWriterFactory) [duplicate 2]
23/07/17 06:10:45 INFO TaskSetManager: Starting task 2.1 in stage 9.0 (TID 123) (10.42.54.57, executor 2, partition 2, PROCESS_LOCAL, 5070 bytes) taskResourceAssignments Map()