3.2.11 be crash, json解析内存消耗过大导致内存溢出, be的 memTracker显示内存只有10G

colagy · 2024年10月29日 03:42

为了更快的定位您的问题，请提供以下信息，谢谢
【详述】问题详细描述

有个sql一跑be就会挂掉, 看be.warnning日志是json解析导致的, 但是看be 8040的memTracker显示内存使用只有10G, 可能是内存泄露没有被内存分配记录. 超过内核限制的40G被内核kill了, be.warn日志里面还有大量的fe rpc error.

W1029 11:01:52.103124   727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:1073741825 bytes, stack:
    @          0x6e80f75  malloc
    @          0xab5b5cc  operator new()
    @          0x3fcc5a3  std::__cxx11::basic_string<>::reserve()
    @          0xabe8c83  std::__cxx11::basic_stringbuf<>::overflow()
    @          0xabf0d9a  std::basic_streambuf<>::xsputn()
    @          0xabe19c7  std::ostream::write()
    @          0x7f80c87  WriteData()
    @          0x85706de  chop_write
    @          0x857e41a  Curl_readwrite
    @          0x8568e50  multi_runsingle
    @          0x856a8b6  curl_multi_perform
    @          0x85367fb  curl_easy_perform
    @          0x7f84e4a  Aws::Http::CurlHttpClient::MakeRequest()
    @          0x7ef736e  _ZNSt17_Function_handlerIFSt10shared_ptrIN3Aws4Http12HttpResponseEEvEZNKS1_6Client9AWSClient17AttemptOneRequestERKS0_INS2_11HttpRequestEERKNS1_23AmazonWebServiceRequestEPKcSG_SG_EUlvE1_E9_M_invokeERKSt9_Any_data
    @          0x7f652b5  smithy::components::tracing::TracingUtils::MakeCallWithTiming<>()
    @          0x7f3ef5b  Aws::Client::AWSClient::AttemptOneRequest()
    @          0x7f4715b  Aws::Client::AWSClient::AttemptExhaustively()
    @          0x7f49096  Aws::Client::AWSClient::MakeRequestWithUnparsedResponse()
    @          0x7f49561  Aws::Client::AWSClient::MakeRequestWithUnparsedResponse()
    @          0x7e8f9d7  _ZZNK3Aws2S38S3Client9GetObjectERKNS0_5Model16GetObjectRequestEENKUlvE_clEv
    @          0x7e8fb78  _ZNSt17_Function_handlerIFN3Aws5Utils7OutcomeINS0_2S35Model15GetObjectResultENS3_7S3ErrorEEEvEZNKS3_8S3Client9GetObjectERKNS4_16GetObjectRequestEEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x7ed22ec  smithy::components::tracing::TracingUtils::MakeCallWithTiming<>()
    @          0x7e5ccb2  Aws::S3::S3Client::GetObject()
    @          0x4045263  starrocks::io::S3InputStream::read()
    @          0x6a07c99  starrocks::JsonReader::_read_file_broker()
    @          0x6a08a6f  starrocks::JsonReader::_read_and_parse_json()
    @          0x6a0f250  starrocks::JsonScanner::_open_next_reader()
    @          0x6a0f4a2  starrocks::JsonScanner::get_next()
    @          0x69de129  starrocks::connector::FileDataSource::get_next()
    @          0x6aa956f  starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x6691dcd  starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x63ebf60  _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
W1029 11:01:57.219687   727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:1302446720 bytes, stack:
    @          0x6e80f75  malloc
    @          0xab5b5cc  operator new()
    @          0xab5b74d  operator new[]()
    @          0x6a17bd9  starrocks::JsonArrayParser::parse()
    @          0x6a08ba6  starrocks::JsonReader::_read_and_parse_json()
    @          0x6a0f250  starrocks::JsonScanner::_open_next_reader()
    @          0x6a0f4a2  starrocks::JsonScanner::get_next()
    @          0x69de129  starrocks::connector::FileDataSource::get_next()
    @          0x6aa956f  starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x6691dcd  starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x63ebf60  _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0x65de30c  starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x7013a6c  starrocks::ThreadPool::dispatch_thread()
    @          0x700c81a  starrocks::Thread::supervise_thread()
    @     0x7f1db2c93ac3  (unknown)
    @     0x7f1db2d24a04  clone
    @              (nil)  (unknown)
W1029 11:01:57.873612   727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:3125872164 bytes, stack:
    @          0x6e80f75  malloc
    @          0xab5b5cc  operator new()
    @          0xab5b74d  operator new[]()
    @          0xa0064a0  simdjson::haswell::dom_parser_implementation::set_capacity()
    @          0x9fed87d  simdjson::haswell::implementation::create_dom_parser_implementation()
    @          0x6a181f2  starrocks::JsonArrayParser::parse()
    @          0x6a08ba6  starrocks::JsonReader::_read_and_parse_json()
    @          0x6a0f250  starrocks::JsonScanner::_open_next_reader()
    @          0x6a0f4a2  starrocks::JsonScanner::get_next()
    @          0x69de129  starrocks::connector::FileDataSource::get_next()
    @          0x6aa956f  starrocks::pipeline::ConnectorChunkSource::_read_chunk()
    @          0x6691dcd  starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
    @          0x63ebf60  _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
    @          0x65de30c  starrocks::workgroup::ScanExecutor::worker_thread()
    @          0x7013a6c  starrocks::ThreadPool::dispatch_thread()
    @          0x700c81a  starrocks::Thread::supervise_thread()
    @     0x7f1db2c93ac3  (unknown)
    @     0x7f1db2d24a04  clone
    @              (nil)  (unknown)
W1029 11:19:15.469710   713 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:15.484565  1259 exec_state_reporter.cpp:188] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:15.484618  1259 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=8b5b8285-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:15.466470   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:19:18.493906  1260 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:18.493947  1260 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=ff6b894c-95a3-11ef-af04-02421b3f38da, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:20.073419   714 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:20.073475   714 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=8f49d3b7-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:52.140303   755 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:20:45.897744   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:20:54.534216   729 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:20:54.842470   731 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:21:19.015545   751 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:22:46.362960   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:23:16.814267   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:23:47.491279   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:17.831781   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:48.138307   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:50.195354   832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:50.195395   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:24:51.760294   833 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:51.760327   833 task_worker_pool.cpp:613] Fail to report task to kkshu-node00:9020, err=-1
W1029 11:24:53.138372   794 query_context.cpp:665] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:24:56.254365   832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:56.254410   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:24:58.755378   831 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:58.755421   831 task_worker_pool.cpp:758] Fail to report workgroup to kkshu-node00:9020, err=-1
W1029 11:25:02.254326   832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:02.254370   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:02.557188   724 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:25:06.761379   833 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:06.761425   833 task_worker_pool.cpp:613] Fail to report task to kkshu-node00:9020, err=-1
W1029 11:25:07.657123   724 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=THRIFT_EAGAIN (timed out)
W1029 11:25:07.757511   724 pipeline_driver.cpp:357] push_chunk returns not ok status Internal error: auto increment allocate failed, err msg:
be/src/exec/tablet_sink.cpp:779 StorageEngine::instance()->get_next_increment_id_interval(table_id, null_rows, ids)
be/src/exec/tablet_sink.cpp:719 _fill_auto_increment_id_internal(chunk, slot, _schema->table_id())
be/src/exec/tablet_sink.cpp:602 _fill_auto_increment_id(chunk)
W1029 11:25:07.757570   724 pipeline_driver_executor.cpp:170] [Driver] Process error, query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, instance_id=ff6b894c-95a3-11ef-af04-02421b3f38d0, status=Internal error: auto increment allocate failed, err msg:
be/src/exec/tablet_sink.cpp:779 StorageEngine::instance()->get_next_increment_id_interval(table_id, null_rows, ids)
be/src/exec/tablet_sink.cpp:719 _fill_auto_increment_id_internal(chunk, slot, _schema->table_id())
be/src/exec/tablet_sink.cpp:602 _fill_auto_increment_id(chunk)
W1029 11:25:07.924798   724 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:08.255371   832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:08.255417   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:08.756377   831 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:08.756421   831 task_worker_pool.cpp:758] Fail to report workgroup to kkshu-node00:9020, err=-1
W1029 11:25:11.676950   713 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:25:11.676999   713 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:25:11.729423  1562 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:25:11.729468  1562 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d1, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:25:13.025063   719 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=THRIFT_EAGAIN (timed out)
W1029 11:25:14.255106   832 utils.cpp:120] Fail to report to master: THRIFT_EAGAIN (timed out)
W1029 11:25:14.255159   832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:16.247275   716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.275563   716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf
W1029 11:25:16.275615   716 pipeline_driver.cpp:771] [Driver] failed to finish operator called by cancelling operator [fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0] [driver=query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0 driver=driver_65_12, status=OUTPUT_FULL, operator-chain: [analytic_source_65_0x7faeb93f9590(O) -> project_66_0x7faeb93f9f90(X) -> olap_table_sink_-1_0x7faeb93fa990(X)]] [operator=olap_table_sink_-1_0x7faeb93fa990(X)] [error=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf]
W1029 11:25:16.280792   716 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.281117   716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf
W1029 11:25:16.281149   716 pipeline_driver.cpp:771] [Driver] failed to finish operator called by cancelling operator [fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0] [driver=query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0 driver=driver_65_15, status=OUTPUT_FULL, operator-chain: [analytic_source_65_0x7faeb9461e90(O) -> project_66_0x7faeb9462890(X) -> olap_table_sink_-1_0x7faeb94bf290(X)]] [operator=olap_table_sink_-1_0x7faeb94bf290(X)] [error=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf]
W1029 11:25:16.285954   719 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.285954   722 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.310937   716 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.316108   718 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.377255   926 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=invalid TType
W1029 11:25:23.142292   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:25:53.145725   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:26:23.222381   794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:26:50.080577   749 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.632344   925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.736292   925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.854425   925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.

be.out

start time: Mon Sep  9 21:30:36 CST 2024, server uptime:  21:30:36 up 3 days,  6:28,  0 users,  load average: 0.72, 0.40, 0.21
start time: Mon Sep  9 21:31:09 CST 2024, server uptime:  21:31:09 up 3 days,  6:28,  0 users,  load average: 0.57, 0.40, 0.22
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1387117716.
No nodes to print
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1701788696.
No nodes to print
start time: Sun Sep 29 14:41:33 CST 2024, server uptime:  14:41:33 up 22 days, 23:39,  0 users,  load average: 9.01, 8.48, 7.36
start time: Sun Sep 29 19:27:01 CST 2024, server uptime:  19:27:02 up 23 days,  4:24,  0 users,  load average: 5.86, 6.23, 6.82
start time: Sun Sep 29 23:12:13 CST 2024, server uptime:  23:12:13 up 23 days,  8:09,  0 users,  load average: 2.89, 2.01, 1.88
start time: Sun Sep 29 23:12:41 CST 2024, server uptime:  23:12:41 up 23 days,  8:10,  0 users,  load average: 2.09, 1.91, 1.86
start time: Fri Oct 25 23:55:33 CST 2024, server uptime:  23:55:33 up 49 days,  8:53,  0 users,  load average: 45.64, 58.69, 35.39
start time: Sat Oct 26 01:17:11 CST 2024, server uptime:  01:17:11 up 49 days, 10:14,  0 users,  load average: 41.18, 13.21, 5.22
start time: Sat Oct 26 03:05:08 CST 2024, server uptime:  03:05:08 up 49 days, 12:02,  0 users,  load average: 101.67, 40.64, 15.55
3.2.11 RELEASE (build 10a5f0e)
query_id:1f32cd62-9304-11ef-a20a-02421b3f38cf, fragment_instance:1f32cd62-9304-11ef-a20a-02421b3f38d7
tracker:process consumption: 2410412008
tracker:query_pool consumption: 1209501832
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 377603936
tracker:metadata consumption: 116042398
tracker:tablet_metadata consumption: 86865075
tracker:rowset_metadata consumption: 25761925
tracker:segment_metadata consumption: 173270
tracker:column_metadata consumption: 3242128
tracker:tablet_schema consumption: 4198203
tracker:segment_zonemap consumption: 70132
tracker:short_key_index consumption: 8735
tracker:column_zonemap_index consumption: 135008
tracker:ordinal_index consumption: 1266320
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:update consumption: 43842870
tracker:chunk_allocator consumption: 221170448
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1729883139 (unix time) try "date -d @1729883139" if you are using GNU date ***
PC: @     0x7f0d733ad007 (unknown)
*** SIGSEGV (@0x7f0c5421bbb0) received by PID 25 (TID 0x7f0cdd1c6640) from PID 1411496880; stack trace: ***
    @          0x849d15a google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f0d73240520 (unknown)
    @     0x7f0d733ad007 (unknown)
    @          0x3fa234d starrocks::FixedLengthColumnBase<>::append()
    @          0x4238dca starrocks::Chunk::append()
    @          0x65cca63 starrocks::spill::OrderedMemTable::append()
    @          0x654deee starrocks::spill::RawSpillerWriter::spill<>()
    @          0x6550245 starrocks::spill::Spiller::spill<>()
    @          0x6647e56 starrocks::pipeline::SpillProcessOperator::pull_chunk()
    @          0x63e184e starrocks::pipeline::PipelineDriver::process()
    @          0x6c745be starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x7013a6c starrocks::ThreadPool::dispatch_thread()
    @          0x700c81a starrocks::Thread::supervise_thread()
    @     0x7f0d73292ac3 (unknown)
    @     0x7f0d73323a04 clone
    @                0x0 (unknown)
start time: Mon Oct 28 15:03:49 CST 2024, server uptime:  15:03:49 up 52 days, 1 min,  0 users,  load average: 50.32, 37.67, 32.82
start time: Mon Oct 28 15:09:37 CST 2024, server uptime:  15:09:37 up 52 days, 7 min,  0 users,  load average: 56.39, 39.37, 33.98
F1028 16:02:35.853122   848 storage_engine.cpp:596] meet too many error disks, process exit. max_ra    @     0x7fc7afb05ac3 (unknown)
    @     0x7fc7afb96a04 clone
    @                0x0 (unknown)
start time: Mon Oct 28 16:03:20 CST 2024, server uptime:  16:03:20 up 52 days,  1:00,  0 users,  load average: 28.10, 16.79, 15.96
start time: Tue Oct 29 03:17:56 CST 2024, server uptime:  03:17:56 up 52 days, 12:15,  0 users,  load average: 37.72, 15.73, 8.24
start time: Tue Oct 29 03:19:37 CST 2024, server uptime:  03:19:37 up 52 days, 12:17,  0 users,  load average: 42.87, 20.38, 10.48
start time: Tue Oct 29 03:21:13 CST 2024, server uptime:  03:21:13 up 52 days, 12:18,  0 users,  load average: 45.21, 23.66, 12.40
start time: Tue Oct 29 03:34:34 CST 2024, server uptime:  03:34:34 up 52 days, 12:32,  0 users,  load average: 32.37, 11.18, 8.71
start time: Tue Oct 29 03:36:04 CST 2024, server uptime:  03:36:04 up 52 days, 12:33,  0 users,  load average: 40.16, 17.54, 11.10
start time: Tue Oct 29 09:47:29 CST 2024, server uptime:  09:47:29 up 52 days, 18:44,  0 users,  load average: 34.14, 14.90, 6.84
start time: Tue Oct 29 10:01:46 CST 2024, server uptime:  10:01:46 up 52 days, 18:59,  0 users,  load average: 49.03, 21.86, 11.94
start time: Tue Oct 29 10:41:04 CST 2024, server uptime:  10:41:04 up 52 days, 19:38,  0 users,  load average: 39.92, 26.24, 16.39
start time: Tue Oct 29 10:45:04 CST 2024, server uptime:  10:45:04 up 52 days, 19:42,  0 users,  load average: 43.09, 31.92, 20.77
start time: Tue Oct 29 10:50:27 CST 2024, server uptime:  10:50:27 up 52 days, 19:47,  0 users,  load average: 68.81, 38.93, 25.85
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1789525831.
Dropping nodes with <= 59.8 MB; edges with <= 12.0 abs(MB)
start time: Tue Oct 29 11:11:26 CST 2024, server uptime:  11:11:26 up 52 days, 20:08,  0 users,  load average: 39.03, 24.79, 19.11

【背景】做过哪些操作？
【业务影响】
【是否存算分离】
【StarRocks版本】例如：1.18.2
【集群规模】例如：3fe（1 follower+2observer）+5be（fe与be混部）
【机器信息】CPU虚拟核/内存/网卡，例如：48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息，请补充下您的联系方式，例如：社区群4-小李或者邮箱，谢谢
【附件】

fe.log/beINFO/相应截图
慢查询：
- Profile信息，获取Profile，通过Profile分析查询瓶颈
- 并行度：show variables like ‘%parallel_fragment_exec_instance_num%’;
- pipeline是否开启：show variables like ‘%pipeline%’;
- be节点cpu和内存使用率截图
查询报错：
- query_dump，怎么获取query_dump文件
be crash
- be.out
- coredump，如何获取coredump
外表查询报错
- be.out和fe.warn.log

huyufei · 2025年03月21日 02:33

这个问题有办法规避吗

colagy · 2025年03月21日 02:35

降低并行度会好一点

huyufei · 2025年03月21日 02:43

我是一个insert into select的sql，并行度只有1

colagy · 2025年03月21日 03:12

这个json解析内存消耗会比较大不知道社区有没有办法解决这个问题目前好像没有什么太好的办法绕过