spark connector 写入主键表,导致 be节点挂掉

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】使用spark connector 向 sr 写数据
【背景】做过哪些操作?
【业务影响】
【是否存算分离】是
【StarRocks版本】3.2.1
【集群规模】1 fe + 1 be(fe与be混部)
【机器信息】8C/64G
客户端日志
WARN [task-result-getter-3] TaskSetManager: Lost task 3.0 in stage 2.0 (TID 63) (emr-worker-1.cluster-325063 executor 2): java.io.IOException: Failed to commit, partitionId: 3, taskId: 63, epochId: -1
at com.starrocks.connector.spark.sql.write.StarRocksDataWriter.commit(StarRocksDataWriter.java:96)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:430)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:457)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:358)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Caused by: java.lang.RuntimeException: com.starrocks.data.load.stream.exception.StreamLoadFailException: Stream load failed because of unknown exception, db: rta, table: ads_user_center_with_qyid_new, label: spark-792fad0e-336b-4eb1-b4ad-8e59b7285de2

BE warning 日志

W0115 10:55:12.507268 29931 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-792fad0e-336b-4eb1-b4ad-8e59b7285de2] has already been used.id=8d4dc61029805d0e-0e1251182a0ff393, job_id=-1, txn_id: -1, label=spark-792fad0e-336b-4eb1-b4ad-8e59b7285de2, db=rta
W0115 10:55:12.508262 29930 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-a546a53d-cc33-476e-b54b-f23dc9690478] has already been used.id=d64b8305992c5ce6-0206f720e9d3e292, job_id=-1, txn_id: -1, label=spark-a546a53d-cc33-476e-b54b-f23dc9690478, db=rta
W0115 10:55:12.537443 29894 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-f39a7512-ac3b-4788-b253-573c4877f8f3] has already been used.id=6644333f98055fbc-8611cec35a96e68b, job_id=-1, txn_id: -1, label=spark-f39a7512-ac3b-4788-b253-573c4877f8f3, db=rta
W0115 10:55:12.544826 29907 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-f88ac2ed-0cc6-4816-960b-4b48c5a94b87] has already been used.id=34471df15572c1c6-4bf2dae31a67f1a5, job_id=-1, txn_id: -1, label=spark-f88ac2ed-0cc6-4816-960b-4b48c5a94b87, db=rta
W0115 10:55:12.563730 29929 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-ee2b2415-25ce-49ba-9287-309d37d911fd] has already been used.id=d0493f85869a5c1a-1176148b965e3bb1, job_id=-1, txn_id: -1, label=spark-ee2b2415-25ce-49ba-9287-309d37d911fd, db=rta
W0115 10:55:22.526269 29924 stream_load_executor.cpp:187] begin transaction failed, errmsg=Label [spark-792fad0e-336b-4eb1-b4ad-8e59b7285de2] has already been used.id=a54a3a8dd4c91e97-101165b057a35095, job_id=-1, txn_id: -1, label=spark-792fad0e-336b-4eb1-b4ad-8e59b7285de2, db=rta

be.out 日志

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/StarRocks/be/lib/jni-packages/starrocks-jdbc-bridge-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/StarRocks/be/lib/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

start time: Mon Jan 15 11:17:01 CST 2024

start time: Mon Jan 15 11:26:01 CST 2024

dmesg -T看下对应时间节点是否有oom

[Mon Jan 15 11:16:15 2024] publish_version invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:16:15 2024] [] ? virtballoon_ oom _notify+0x2a/0x80 [virtio_balloon]

[Mon Jan 15 11:16:15 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:16:15 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:16:15 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:16:15 2024] /usr/local/clou invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:16:15 2024] [] ? virtballoon_ oom _notify+0x2a/0x80 [virtio_balloon]

[Mon Jan 15 11:16:15 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:16:15 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:16:15 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:16:15 2024] pull-hive-remot invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:16:15 2024] [] ? virtballoon_ oom _notify+0x2a/0x80 [virtio_balloon]

[Mon Jan 15 11:16:15 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:16:15 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:16:15 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:16:16 2024] Source Data Fet invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:16:16 2024] [] ? virtballoon_ oom _notify+0x2a/0x80 [virtio_balloon]

[Mon Jan 15 11:16:16 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:16:16 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:16:16 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:16:17 2024] VM Periodic Tas invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:16:17 2024] [] ? virtballoon_ oom _notify+0x2a/0x80 [virtio_balloon]

[Mon Jan 15 11:16:17 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:16:17 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:16:17 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:25:10 2024] query_ctx_clr invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:25:10 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:25:10 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:25:10 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:25:11 2024] /usr/local/clou invoked oom -killer: gfp_mask=0x201da, order=0, oom _score_adj=0

[Mon Jan 15 11:25:11 2024] [] oom _kill_process+0x2d5/0x4a0

[Mon Jan 15 11:25:11 2024] [] ? oom _unkillable_task+0xcd/0x120

[Mon Jan 15 11:25:11 2024] [ pid ] uid tgid total_vm rss nr_ptes swapents oom _score_adj name

[Mon Jan 15 11:16:15 2024] publish_version cpuset=/ mems_allowed=0
[Mon Jan 15 11:16:15 2024] CPU: 7 PID: 29801 Comm: publish_version Kdump: loaded Tainted: G OE ------------ T 3.10.0-1160.90.1.el7.x86_64 #1
[Mon Jan 15 11:16:15 2024] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[Mon Jan 15 11:16:15 2024] Call Trace:
[Mon Jan 15 11:16:15 2024] [] dump_stack+0x19/0x1f
[Mon Jan 15 11:16:15 2024] [] dump_header+0x90/0x22d
[Mon Jan 15 11:16:15 2024] [] ? ktime_get_ts64+0x52/0xf0
[Mon Jan 15 11:16:15 2024] [] ? delayacct_end+0x8f/0xc0
[Mon Jan 15 11:16:15 2024] [] ? virtballoon_oom_notify+0x2a/0x80 [virtio_balloon]
[Mon Jan 15 11:16:15 2024] [] oom_kill_process+0x2d5/0x4a0
[Mon Jan 15 11:16:15 2024] [] ? oom_unkillable_task+0xcd/0x120
[Mon Jan 15 11:16:15 2024] [] out_of_memory+0x31a/0x500
[Mon Jan 15 11:16:15 2024] [] __alloc_pages_nodemask+0xae4/0xbf0
[Mon Jan 15 11:16:15 2024] [] alloc_pages_current+0x98/0x110
[Mon Jan 15 11:16:15 2024] [] __page_cache_alloc+0x97/0xb0
[Mon Jan 15 11:16:15 2024] [] filemap_fault+0x270/0x420
[Mon Jan 15 11:16:15 2024] [] ext4_filemap_fault+0x36/0x50 [ext4]
[Mon Jan 15 11:16:15 2024] [] __do_fault.isra.61+0x8a/0x100
[Mon Jan 15 11:16:15 2024] [] do_read_fault.isra.63+0x4c/0x1b0
[Mon Jan 15 11:16:15 2024] [] handle_mm_fault+0xa33/0x1190
[Mon Jan 15 11:16:15 2024] [] __do_page_fault+0x213/0x510
[Mon Jan 15 11:16:15 2024] [] trace_do_page_fault+0x56/0x150
[Mon Jan 15 11:16:15 2024] [] do_async_page_fault+0x22/0x100
[Mon Jan 15 11:16:15 2024] [] async_page_fault+0x28/0x30
[Mon Jan 15 11:16:15 2024] Mem-Info:
[Mon Jan 15 11:16:15 2024] active_anon:15707811 inactive_anon:242 isolated_anon:0
active_file:1955 inactive_file:2508 isolated_file:170
unevictable:0 dirty:57 writeback:0 unstable:0
slab_reclaimable:120422 slab_unreclaimable:11568
mapped:552 shmem:324 pagetables:43154 bounce:0
free:81421 free_pcp:292 free_cma:0
[Mon Jan 15 11:16:15 2024] Node 0 DMA free:15908kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Mon Jan 15 11:16:15 2024] lowmem_reserve[]: 0 2810 63221 63221
[Mon Jan 15 11:16:15 2024] Node 0 DMA32 free:244628kB min:3000kB low:3748kB high:4500kB active_anon:2475172kB inactive_anon:24kB active_file:476kB inactive_file:872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3111576kB managed:2878072kB mlocked:0kB dirty:8kB writeback:0kB mapped:272kB shmem:36kB slab_reclaimable:133836kB slab_unreclaimable:3924kB kernel_stack:912kB pagetables:5848kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:195 all_unreclaimable? no
[Mon Jan 15 11:16:15 2024] lowmem_reserve[]: 0 0 60410 60410
[Mon Jan 15 11:16:15 2024] Node 0 Normal free:64504kB min:64560kB low:80700kB high:96840kB active_anon:60356072kB inactive_anon:944kB active_file:8640kB inactive_file:9728kB unevictable:0kB isolated(anon):0kB isolated(file):680kB present:62914560kB managed:61863688kB mlocked:0kB dirty:220kB writeback:0kB mapped:1936kB shmem:1260kB slab_reclaimable:347852kB slab_unreclaimable:42348kB kernel_stack:26656kB pagetables:166768kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4059 all_unreclaimable? no

机器内存的使用监控有吗? publish_version invoked oom -killer

没有, 只有阿里云的图形监控, 可以确定的数 OOM是有的。
[Tue Jan 16 15:26:20 2024] Out of memory: Kill process 16215 (starrocks_be) score 629 or sacrifice child

Tue Jan 16 15:26:20 2024] StarRocks-Sink- invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

[Tue Jan 16 15:26:20 2024] StarRocks-Sink- cpuset=/ mems_allowed=0
[Tue Jan 16 15:26:20 2024] CPU: 6 PID: 13881 Comm: StarRocks-Sink- Kdump: loaded Tainted: G OE ------------ T 3.10.0-1160.90.1.el7.x86_64 #1
[Tue Jan 16 15:26:20 2024] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[Tue Jan 16 15:26:20 2024] Call Trace:
[Tue Jan 16 15:26:20 2024] [] dump_stack+0x19/0x1f
[Tue Jan 16 15:26:20 2024] [] dump_header+0x90/0x22d
[Tue Jan 16 15:26:20 2024] [] ? ktime_get_ts64+0x52/0xf0
[Tue Jan 16 15:26:20 2024] [] ? delayacct_end+0x8f/0xc0
[Tue Jan 16 15:26:20 2024] [] ? virtballoon_oom_notify+0x2a/0x80 [virtio_balloon]
[Tue Jan 16 15:26:20 2024] [] oom_kill_process+0x2d5/0x4a0
[Tue Jan 16 15:26:20 2024] [] ? oom_unkillable_task+0xcd/0x120
[Tue Jan 16 15:26:20 2024] [] out_of_memory+0x31a/0x500
[Tue Jan 16 15:26:20 2024] [] __alloc_pages_nodemask+0xae4/0xbf0
[Tue Jan 16 15:26:20 2024] [] alloc_pages_current+0x98/0x110
[Tue Jan 16 15:26:20 2024] [] __page_cache_alloc+0x97/0xb0
[Tue Jan 16 15:26:20 2024] [] filemap_fault+0x270/0x420
[Tue Jan 16 15:26:20 2024] [] ext4_filemap_fault+0x36/0x50 [ext4]
[Tue Jan 16 15:26:20 2024] [] __do_fault.isra.61+0x8a/0x100
[Tue Jan 16 15:26:20 2024] [] do_read_fault.isra.63+0x4c/0x1b0
[Tue Jan 16 15:26:20 2024] [] handle_mm_fault+0xa33/0x1190
[Tue Jan 16 15:26:20 2024] [] __do_page_fault+0x213/0x510
[Tue Jan 16 15:26:20 2024] [] trace_do_page_fault+0x56/0x150
[Tue Jan 16 15:26:20 2024] [] do_async_page_fault+0x22/0x100
[Tue Jan 16 15:26:20 2024] [] async_page_fault+0x28/0x30
[Tue Jan 16 15:26:20 2024] Mem-Info:
[Tue Jan 16 15:26:20 2024] active_anon:15626930 inactive_anon:248 isolated_anon:0

这个是 insert into 造成的 be crash, 跟spark 无关。

现在向 be 写数据, 数据量比较大之后, 太容易造成 be crash了。

https://forum.mirrorship.cn/t/topic/10600, 说是在优化了

可以先限制下be内存使用,避免oom

看近期很多的issue,be的内存有增长趋势,这一块是不是考虑优化一下?

可以看下be mem相关的监控,或者在be日志里搜下Current memory statistics,看下oom前哪个模块内存占用较多

update 占用比较多, 已经限制了 be 的内存,好像还是会oom

有一部分是 主键表的主键忘记做索引持久化了