庆祝社区一周年!第三连!支持读写的Spark Connector

爱了爱了,这个标题甚好! :clap:

这个项目能开源么?能否给我们一个学习的机会? :grinning:

已让同事放到Github:
https://github.com/gaoy121/starrocks-connector-for-apache-spark

1赞

谢谢,非常感谢,我们认真学习一下

大神,你们编译的时候,thrift用的哪个版本啊?

请问我在用工具类写入StarRocks 表的时候程序一直卡死,不报错但是也没有数据写进去可能是什么原因呀?

可是不支持spark3啊 利用jdbc写9030端口一直超时。求解决

spark3 支持好多bug。必须要cache、show之后才能写

1赞

方便上传一下spark3的jar包吗?我的电脑安装thrift有点费劲

请问你编译时也是运行:sh build.sh 3时报错如下是吗?
`[INFO] — maven-thrift-plugin:0.1.11:compile (thrift-sources) @ starrocks-spark2_2.11 —
[ERROR] thrift failed output:
[ERROR] thrift failed error: /bin/sh: thrift: command not found

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.023 s
[INFO] Finished at: 2023-02-14T15:51:33+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.thrift.tools:maven-thrift-plugin:0.1.11:compile (thrift-sources) on project starrocks-spark2_2.11: thrift did not exit cleanly. Review output for more information. -> [Help 1]`
请问你最终解决了吗?我参考https://zhuanlan.zhihu.com/p/119404869未能解决。

请问我编译时报错(见楼下评论),你spark3解决了吗?

已解决,详见github上issue:https://github.com/gaoy121/starrocks-connector-for-apache-spark/issues/1

写入一段时间数据后,就会报
Caused by: com.starrocks.connector.spark.exception.StreamLoadException: stream load error: Too many versions. tablet_id: 339236, version_count: 1018, limit: 1000:
轻微这个versions是怎么判断的?

请问解决了 我也遇到这个问题 如何解决

提交到集群上报java.lang.ClassNotFoundException: Failed to find data source: starrocks. Please find packages at http://spark.apache.org/third-party-projects.html,spark connector也打进去包里了,求助

之前的代码没有再继续维护,基于官网git上的代码重新打了一个connector jar,整理了一个小demo,辛苦测试看看:
starrocks-spark-connector-3.2_2.12-1.1.1-SNAPSHOT.jar (1.9 MB)
starrocks-spark-demo.zip (4.8 KB)

23/07/13 15:11:28 WARN BackendClient: Get next from StarRocks BE{host=‘xxxx’, port=9060} failed.
com.starrocks.shaded.org.apache.thrift.transport.TTransportException: MaxMessageSize reached

com.starrocks.shaded.org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin

请问用 [starrocks-spark-connector-3.2_2.12-1.1.1-SNAPSHOT.jar] 从SR读取数据,有时候能成功,有时候会失败,这是报错信息。

将打包后的 starrocks-spark-connector-3.2_2.12-1.1.1-SNAPSHOT.jar 上传到Spark集群 jars 目录下 ,否则会报错

Exception in thread “main” java.lang.ClassNotFoundException:
Failed to find data source: starrocks. Please find packages at
http://spark.apache.org/third-party-projects.html

您好,这块您这边有解决吗?我们也遇到了一样的问题

Failed to find data source: starrocks. Please find packages at
这个问题解决了吗?我今天反复测试,还是不行。依然报错,maven配置信息如下:

com.starrocks
starrocks-spark-connector-3.2_2.12
1.1.1-SNAPSHOT
system
/Users/guanpeng/Documents/upWorkspace/pe.data-spark-etl/src/main/resources/lib/starrocks-spark-connector-3.2_2.12-1.1.1-SNAPSHOT.jar