starrocks3.0 streamload导入报错"Status": "Fail","Message": "cancel"

为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
curl --location-trusted -u root -T /data1/software/tpcds1000x/store_sales_ext.dat -H “column_separator:|” -H “columns:ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_cdemo_sk,ss_hdemo_sk,ss_addr_sk,ss_store_sk,ss_promo_sk,ss_ticket_number,ss_quantity,ss_wholesale_cost,ss_list_price,ss_sales_price,ss_ext_discount_amt,ss_ext_sales_price,ss_ext_wholesale_cost,ss_ext_list_price,ss_ext_tax,ss_coupon_amt,ss_net_paid,ss_net_paid_inc_tax,ss_net_profit” http://127.0.0.1:8030/api/tpcds1000x/store_sales_ext/_stream_load
Enter host password for user ‘root’:
{
“TxnId”: 1136,
“Label”: “bf0ee091-152c-4a01-872b-d54ba81ce3b0”,
“Status”: “Fail”,
“Message”: “cancel”,
“NumberTotalRows”: 0,
“NumberLoadedRows”: 0,
“NumberFilteredRows”: 0,
“NumberUnselectedRows”: 0,
“LoadBytes”: 36456763392,
“LoadTimeMs”: 1574081,
“BeginTxnTimeMs”: 3,
“StreamLoadPlanTimeMs”: 11,
“ReadDataTimeMs”: 566592,
“WriteDataTimeMs”: 601421,
“CommitAndPublishTimeMs”: 0
}
real 26m15.167s
user 0m45.578s
sys 9m13.597s
【背景】做过哪些操作?
在be.conf中调大streaming_load_max_mb=450000
【业务影响】数据无法导入,影响后续操作
【StarRocks版本】3.0.0
【集群规模】例如:3fe(1 leader+1 follower+1observer)+3be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,16C/64G/万兆
【表模型】明细模型
【导入或者导出方式】stream_load
【联系方式】rfgnet@163.com
【附件】

  • fe.log/be.INFO/相应截图
    fe.log
    2023-09-05 10:21:58,300 INFO (replayer|79) [GlobalStateMgr.replayJournalInner():2044] replayed journal from 443172 - 443173
    2023-09-05 10:22:00,699 WARN (tablet stat mgr|34) [TabletStatMgr.updateLocalTabletStat():149] task exec error. backend[10021]
    org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.13.0.jar:0.13.0]
    at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:144) ~[starrocks-fe.jar:?]
    at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:129) ~[starrocks-fe.jar:?]
    at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.3.jar:2.3]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1036) ~[commons-pool2-2.3.jar:2.3]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.3.jar:2.3]
    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:278) ~[commons-pool2-2.3.jar:2.3]
    at com.starrocks.common.GenericPool.borrowObject(GenericPool.java:101) ~[starrocks-fe.jar:?]
    at com.starrocks.catalog.TabletStatMgr.updateLocalTabletStat(TabletStatMgr.java:141) [starrocks-fe.jar:?]
    at com.starrocks.catalog.TabletStatMgr.runAfterCatalogReady(TabletStatMgr.java:90) [starrocks-fe.jar:?]
    at com.starrocks.common.util.LeaderDaemon.runOneCycle(LeaderDaemon.java:73) [starrocks-fe.jar:?]
    at com.starrocks.common.util.Daemon.run(Daemon.java:115) [starrocks-fe.jar:?]
    Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_191]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_191]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_191]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_191]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_191]
    at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_191]
    at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.13.0.jar:0.13.0]
    … 11 more
    2023-09-05 10:22:00,703 INFO (tablet stat mgr|34) [TabletStatMgr.updateLocalTabletStat():158] finished to get local tablet stat of all backends. cost: 10 ms
    2023-09-05 10:22:00,703 INFO (tablet stat mgr|34) [TabletStatMgr.runAfterCatalogReady():126] finished to update index row num of all databases. cost: 0 ms
    2023-09-05 10:22:03,302 INFO (nioEventLoopGroup-4-1|92) [RestBaseAction.handleRequest():70] receive http request. url=/api/bootstrap?cluster_id=608598752&token=5c91b4f0-6b3c-4935-b030-f235525a810e
    2023-09-05 10:22:03,308 INFO (replayer|79) [GlobalStateMgr.replayJournalInner():2044] replayed journal from 443173 - 443174
    2023-09-05 10:22:07,667 INFO (replayer|79) [GlobalStateMgr.replayJournalInner():2044] replayed journal from 443174 - 443175

be.INFO
I0905 10:32:54.427083 6590 heartbeat_server.cpp:76] get heartbeat from FE.host:192.168.239.241, port:9020, cluster id:608598752, counter:121
I0905 10:32:59.531713 6512 starlet.cc:83] Empty starmanager address, skip reporting!
I0905 10:33:03.012465 5806 daemon.cpp:201] Current memory statistics: process(155427480), query_pool(17503032), load(131342960), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0905 10:33:09.541242 6512 starlet.cc:83] Empty starmanager address, skip reporting!
I0905 10:33:18.015282 5806 daemon.cpp:201] Current memory statistics: process(59941096), query_pool(18205576), load(34023536), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0905 10:33:19.552177 6512 starlet.cc:83] Empty starmanager address, skip reporting!
I0905 10:33:29.569106 6512 starlet.cc:83] Empty starmanager address, skip reporting!
I0905 10:33:33.018689 5806 daemon.cpp:201] Current memory statistics: process(264877512), query_pool(18223832), load(239480688), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
I0905 10:33:39.583196 6512 starlet.cc:83] Empty starmanager address, skip reporting!
I0905 10:33:42.013247 5942 plan_fragment_executor.cpp:364] cancel(): fragment_instance_id=7f401e0a-be27-8876-923c-1cb1908dce9f
I0905 10:33:42.013649 5942 fragment_mgr.cpp:543] FragmentMgr cancel worker going to cancel timeout fragment 7f401e0a-be27-8876-923c-1cb1908dce9f
W0905 10:33:42.013741 6007 tablet_sink.cpp:1539] NodeChannel[10020], tablet add chunk failed, load_id=7f401e0a-be27-8876-923c-1cb1908dce9e, txn_id: 1136, parallel=1, compress_type=2, node=192.168.239.124:8060, errmsg=cancel
I0905 10:33:42.017438 6408 local_tablets_channel.cpp:557] cancel LocalTabletsChannel txn_id: 1136 load_id: 7f401e0abe278876-923c1cb1908dce9e index_id: 11181 #tablet:2 tablet_ids:11192,11186
W0905 10:33:42.779381 6007 fragment_mgr.cpp:199] Fail to open fragment 7f401e0a-be27-8876-923c-1cb1908dce9f: Cancelled: cancel
/build/starrocks/be/src/exec/tablet_sink.cpp:1469 _send_chunk_by_node(chunk, _channels[i].get(), _validate_select_idx)
/build/starrocks/be/src/runtime/plan_fragment_executor.cpp:249 _sink->send_chunk(runtime_state(), chunk.get())
I0905 10:33:42.779985 6007 plan_fragment_executor.cpp:492] Fragment 7f401e0a-be27-8876-923c-1cb1908dce9f:(Active: 10m, non-child: 0.44%)

  • InstanceAllocatedMemoryUsage: 117.78 GB
  • InstanceDeallocatedMemoryUsage: 75.89 GB
  • InstancePeakMemoryUsage: 21.49 MB
  • MemoryLimit: -1.00 B
  • RowsProduced: 254.33M
    OlapTableSink:(Active: 5m10s, non-child: 51.73%)
    • TxnID: 1136
    • IndexNum: 1
    • ReplicatedStorage: true
    • AutomaticPartition: false
    • AllocAutoIncrementTime: 25.360ms
    • CloseWaitTime: 0.000ns
    • OpenTime: 4.188ms
    • PrepareDataTime: 39s755ms
      • ConvertChunkTime: 190.440ms
      • ValidateDataTime: 31s906ms
    • RowsFiltered: 0
    • RowsRead: 254.33M
    • RowsReturned: 254.33M
    • RpcClientSideTime: 5m35s
    • RpcServerSideTime: 0.000ns
    • SendDataTime: 4m30s
      • PackChunkTime: 1m44s
      • SendRpcTime: 0.000ns
        • CompressTime: 0.000ns
        • SerializeChunkTime: 0.000ns
      • WaitResponseTime: 2m7s
        FILE_SCAN_NODE (id=0):(Active: 4m47s, non-child: 47.83%)I0905 10:32:54.427083 6590 heartbeat_server.cpp:76] get heartbeat from FE.host:192.168.239.241, port:9020, cluster id:608598752, counter:121
        I0905 10:32:59.531713 6512 starlet.cc:83] Empty starmanager address, skip reporting!
        I0905 10:33:03.012465 5806 daemon.cpp:201] Current memory statistics: process(155427480), query_pool(17503032), load(131342960), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
        I0905 10:33:09.541242 6512 starlet.cc:83] Empty starmanager address, skip reporting!
        I0905 10:33:18.015282 5806 daemon.cpp:201] Current memory statistics: process(59941096), query_pool(18205576), load(34023536), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
        I0905 10:33:19.552177 6512 starlet.cc:83] Empty starmanager address, skip reporting!
        I0905 10:33:29.569106 6512 starlet.cc:83] Empty starmanager address, skip reporting!
        I0905 10:33:33.018689 5806 daemon.cpp:201] Current memory statistics: process(264877512), query_pool(18223832), load(239480688), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
        I0905 10:33:39.583196 6512 starlet.cc:83] Empty starmanager address, skip reporting!
        I0905 10:33:42.013247 5942 plan_fragment_executor.cpp:364] cancel(): fragment_instance_id=7f401e0a-be27-8876-923c-1cb1908dce9f
        I0905 10:33:42.013649 5942 fragment_mgr.cpp:543] FragmentMgr cancel worker going to cancel timeout fragment 7f401e0a-be27-8876-923c-1cb1908dce9f
        W0905 10:33:42.013741 6007 tablet_sink.cpp:1539] NodeChannel[10020], tablet add chunk failed, load_id=7f401e0a-be27-8876-923c-1cb1908dce9e, txn_id: 1136, parallel=1, compress_type=2, node=192.168.239.124:8060, errmsg=cancel
        I0905 10:33:42.017438 6408 local_tablets_channel.cpp:557] cancel LocalTabletsChannel txn_id: 1136 load_id: 7f401e0abe278876-923c1cb1908dce9e index_id: 11181 #tablet:2 tablet_ids:11192,11186
        W0905 10:33:42.779381 6007 fragment_mgr.cpp:199] Fail to open fragment 7f401e0a-be27-8876-923c-1cb1908dce9f: Cancelled: cancel
        /build/starrocks/be/src/exec/tablet_sink.cpp:1469 _send_chunk_by_node(chunk, _channels[i].get(), _validate_select_idx)
        /build/starrocks/be/src/runtime/plan_fragment_executor.cpp:249 _sink->send_chunk(runtime_state(), chunk.get())
        I0905 10:33:42.779985 6007 plan_fragment_executor.cpp:492] Fragment 7f401e0a-be27-8876-923c-1cb1908dce9f:(Active: 10m, non-child: 0.44%)
  • InstanceAllocatedMemoryUsage: 117.78 GB
  • InstanceDeallocatedMemoryUsage: 75.89 GB
  • InstancePeakMemoryUsage: 21.49 MB
  • MemoryLimit: -1.00 B
  • RowsProduced: 254.33M
    OlapTableSink:(Active: 5m10s, non-child: 51.73%)
    • TxnID: 1136
    • IndexNum: 1
    • ReplicatedStorage: true
    • AutomaticPartition: false
    • AllocAutoIncrementTime: 25.360ms
    • CloseWaitTime: 0.000ns
    • OpenTime: 4.188ms
    • PrepareDataTime: 39s755ms
      • ConvertChunkTime: 190.440ms
      • ValidateDataTime: 31s906ms
    • RowsFiltered: 0
    • RowsRead: 254.33M
    • RowsReturned: 254.33M
    • RpcClientSideTime: 5m35s
    • RpcServerSideTime: 0.000ns
    • SendDataTime: 4m30s
      • PackChunkTime: 1m44s
      • SendRpcTime: 0.000ns
        • CompressTime: 0.000ns
        • SerializeChunkTime: 0.000ns
      • WaitResponseTime: 2m7s
        FILE_SCAN_NODE (id=0):(Active: 4m47s, non-child: 47.83%)
    • BytesRead: 0
    • NumDiskAccess: 0
    • PeakMemoryUsage: 0
    • RowsRead: 0
    • RowsReturned: 254.33M
    • RowsReturnedRate: 885.53 K/sec
    • ScanTime: 8m20s
    • ScannerQueueCounter: 767
    • ScannerQueueTime: 10m
    • ScannerThreadsInvoluntaryContextSwitches: 0
    • ScannerThreadsTotalWallClockTime: 0.000ns
      • MaterializeTupleTime(*): 0.000ns
      • ScannerThreadsSysTime: 0.000ns
      • ScannerThreadsUserTime: 0.000ns
    • ScannerThreadsVoluntaryContextSwitches: 0
    • ScannerTotalTime: 0.000ns
    • TotalRawReadTime(*): 0.000ns
    • TotalReadThroughput: 0.00 /sec
      FileScanner:
      • CastChunkTime: 0.000ns
      • CreateChunkTime: 0.000ns
      • FillTime: 0.000ns
      • MaterializeTime: 0.000ns
      • ReadTime: 0.000ns
        FilePRead:
        • FileReadTime: 0.000ns
          W0905 10:33:42.780006 6007 stream_load_executor.cpp:100] fragment execute failed, query_id=7f401e0abe278876-923c1cb1908dce9e, err_msg=cancel, id=7f401e0abe278876-923c1cb1908dce9e, job_id=-1, txn_id: 1136, label=bf0ee091-152c-4a01-872b-d54ba81ce3b0, db=tpcds1000x
          W0905 10:33:42.780122 6582 stream_load.cpp:353] append body content failed. errmsg=Cancelled: cancel
          /build/starrocks/be/src/exec/tablet_sink.cpp:1469 _send_chunk_by_node(chunk, _channels[i].get(), _validate_select_idx)
          /build/starrocks/be/src/runtime/plan_fragment_executor.cpp:249 _sink->send_chunk(runtime_state(), chunk.get()) context=id=7f401e0abe278876-923c1cb1908dce9e, job_id=-1, txn_id: 1136, label=bf0ee091-152c-4a01-872b-d54ba81ce3b0, db=tpcds1000x
          I0905 10:33:44.508596 6590 heartbeat_server.cpp:93] Updating master info: TMasterInfo(network_address=TNetworkAddress(hostname=192.168.239.241, port=9020), cluster_id=608598752, epoch=2, token=5c91b4f0-6b3c-4935-b030-f235525a810e, backend_ip=192.168.239.138, http_port=8030, heartbeat_flags=0, backend_id=10021, min_active_txn_id=1137)
          I0905 10:33:48.022119 5806 daemon.cpp:201] Current memory statistics: process(15787928), query_pool(15096696), load(0), metadata(119785), compaction(0), schema_change(0), column_pool(0), page_cache(0), update(0), chunk_allocator(0), clone(0), consistency(0)
          I0905 10:33:48.415796 6412 load_channel_mgr.cpp:239] Memory consumption(bytes) limit=16374613708 current=0 peak=442638224
          I0905 10:33:49.607372 6512 starlet.cc:83] Empty starmanager address, skip reporting!
          I0905 10:33:53.438124 6479 tablet_manager.cpp:834] Report all 60 tablets info
  • 完整的报错异常栈

Sorry 漏回了您的帖~以后烦请多顶个一两次帖我们就可以看到了


参考 https://docs.starrocks.io/zh-cn/main/loading/StreamLoad 改大超时时间

您好问下。TMasterInfo(network_address=TNetworkAddress(hostname=172.22.65.4, port=9020这个IP可以设置成固定的吗