为了更快的定位您的问题,请提供以下信息,谢谢
【详述】问题详细描述
有个sql一跑be就会挂掉, 看be.warnning日志是json解析导致的, 但是看be 8040的memTracker显示内存使用只有10G, 可能是内存泄露没有被内存分配记录. 超过内核限制 的40G被内核kill了, be.warn日志里面还有大量的fe rpc error.
W1029 11:01:52.103124 727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:1073741825 bytes, stack:
@ 0x6e80f75 malloc
@ 0xab5b5cc operator new()
@ 0x3fcc5a3 std::__cxx11::basic_string<>::reserve()
@ 0xabe8c83 std::__cxx11::basic_stringbuf<>::overflow()
@ 0xabf0d9a std::basic_streambuf<>::xsputn()
@ 0xabe19c7 std::ostream::write()
@ 0x7f80c87 WriteData()
@ 0x85706de chop_write
@ 0x857e41a Curl_readwrite
@ 0x8568e50 multi_runsingle
@ 0x856a8b6 curl_multi_perform
@ 0x85367fb curl_easy_perform
@ 0x7f84e4a Aws::Http::CurlHttpClient::MakeRequest()
@ 0x7ef736e _ZNSt17_Function_handlerIFSt10shared_ptrIN3Aws4Http12HttpResponseEEvEZNKS1_6Client9AWSClient17AttemptOneRequestERKS0_INS2_11HttpRequestEERKNS1_23AmazonWebServiceRequestEPKcSG_SG_EUlvE1_E9_M_invokeERKSt9_Any_data
@ 0x7f652b5 smithy::components::tracing::TracingUtils::MakeCallWithTiming<>()
@ 0x7f3ef5b Aws::Client::AWSClient::AttemptOneRequest()
@ 0x7f4715b Aws::Client::AWSClient::AttemptExhaustively()
@ 0x7f49096 Aws::Client::AWSClient::MakeRequestWithUnparsedResponse()
@ 0x7f49561 Aws::Client::AWSClient::MakeRequestWithUnparsedResponse()
@ 0x7e8f9d7 _ZZNK3Aws2S38S3Client9GetObjectERKNS0_5Model16GetObjectRequestEENKUlvE_clEv
@ 0x7e8fb78 _ZNSt17_Function_handlerIFN3Aws5Utils7OutcomeINS0_2S35Model15GetObjectResultENS3_7S3ErrorEEEvEZNKS3_8S3Client9GetObjectERKNS4_16GetObjectRequestEEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7ed22ec smithy::components::tracing::TracingUtils::MakeCallWithTiming<>()
@ 0x7e5ccb2 Aws::S3::S3Client::GetObject()
@ 0x4045263 starrocks::io::S3InputStream::read()
@ 0x6a07c99 starrocks::JsonReader::_read_file_broker()
@ 0x6a08a6f starrocks::JsonReader::_read_and_parse_json()
@ 0x6a0f250 starrocks::JsonScanner::_open_next_reader()
@ 0x6a0f4a2 starrocks::JsonScanner::get_next()
@ 0x69de129 starrocks::connector::FileDataSource::get_next()
@ 0x6aa956f starrocks::pipeline::ConnectorChunkSource::_read_chunk()
@ 0x6691dcd starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
@ 0x63ebf60 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
W1029 11:01:57.219687 727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:1302446720 bytes, stack:
@ 0x6e80f75 malloc
@ 0xab5b5cc operator new()
@ 0xab5b74d operator new[]()
@ 0x6a17bd9 starrocks::JsonArrayParser::parse()
@ 0x6a08ba6 starrocks::JsonReader::_read_and_parse_json()
@ 0x6a0f250 starrocks::JsonScanner::_open_next_reader()
@ 0x6a0f4a2 starrocks::JsonScanner::get_next()
@ 0x69de129 starrocks::connector::FileDataSource::get_next()
@ 0x6aa956f starrocks::pipeline::ConnectorChunkSource::_read_chunk()
@ 0x6691dcd starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
@ 0x63ebf60 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
@ 0x65de30c starrocks::workgroup::ScanExecutor::worker_thread()
@ 0x7013a6c starrocks::ThreadPool::dispatch_thread()
@ 0x700c81a starrocks::Thread::supervise_thread()
@ 0x7f1db2c93ac3 (unknown)
@ 0x7f1db2d24a04 clone
@ (nil) (unknown)
W1029 11:01:57.873612 727 mem_hook.cpp:90] large memory alloc, query_id:1242dbc9-9d5e-4e0b-8778-991861c73ae3 instance: 1242dbc9-9d5e-4e0b-8778-991861c73ae4 acquire:3125872164 bytes, stack:
@ 0x6e80f75 malloc
@ 0xab5b5cc operator new()
@ 0xab5b74d operator new[]()
@ 0xa0064a0 simdjson::haswell::dom_parser_implementation::set_capacity()
@ 0x9fed87d simdjson::haswell::implementation::create_dom_parser_implementation()
@ 0x6a181f2 starrocks::JsonArrayParser::parse()
@ 0x6a08ba6 starrocks::JsonReader::_read_and_parse_json()
@ 0x6a0f250 starrocks::JsonScanner::_open_next_reader()
@ 0x6a0f4a2 starrocks::JsonScanner::get_next()
@ 0x69de129 starrocks::connector::FileDataSource::get_next()
@ 0x6aa956f starrocks::pipeline::ConnectorChunkSource::_read_chunk()
@ 0x6691dcd starrocks::pipeline::ChunkSource::buffer_next_batch_chunks_blocking()
@ 0x63ebf60 _ZZN9starrocks8pipeline12ScanOperator18_trigger_next_scanEPNS_12RuntimeStateEiENKUlvE_clEv
@ 0x65de30c starrocks::workgroup::ScanExecutor::worker_thread()
@ 0x7013a6c starrocks::ThreadPool::dispatch_thread()
@ 0x700c81a starrocks::Thread::supervise_thread()
@ 0x7f1db2c93ac3 (unknown)
@ 0x7f1db2d24a04 clone
@ (nil) (unknown)
W1029 11:19:15.469710 713 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:15.484565 1259 exec_state_reporter.cpp:188] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:15.484618 1259 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=8b5b8285-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:15.466470 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:19:18.493906 1260 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:18.493947 1260 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=ff6b894c-95a3-11ef-af04-02421b3f38da, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:20.073419 714 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:19:20.073475 714 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=8f49d3b7-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:19:52.140303 755 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:20:45.897744 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:20:54.534216 729 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:20:54.842470 731 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:21:19.015545 751 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:22:46.362960 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:23:16.814267 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:23:47.491279 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:17.831781 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:48.138307 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:24:50.195354 832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:50.195395 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:24:51.760294 833 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:51.760327 833 task_worker_pool.cpp:613] Fail to report task to kkshu-node00:9020, err=-1
W1029 11:24:53.138372 794 query_context.cpp:665] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:24:56.254365 832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:56.254410 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:24:58.755378 831 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:24:58.755421 831 task_worker_pool.cpp:758] Fail to report workgroup to kkshu-node00:9020, err=-1
W1029 11:25:02.254326 832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:02.254370 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:02.557188 724 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:25:06.761379 833 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:06.761425 833 task_worker_pool.cpp:613] Fail to report task to kkshu-node00:9020, err=-1
W1029 11:25:07.657123 724 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=THRIFT_EAGAIN (timed out)
W1029 11:25:07.757511 724 pipeline_driver.cpp:357] push_chunk returns not ok status Internal error: auto increment allocate failed, err msg:
be/src/exec/tablet_sink.cpp:779 StorageEngine::instance()->get_next_increment_id_interval(table_id, null_rows, ids)
be/src/exec/tablet_sink.cpp:719 _fill_auto_increment_id_internal(chunk, slot, _schema->table_id())
be/src/exec/tablet_sink.cpp:602 _fill_auto_increment_id(chunk)
W1029 11:25:07.757570 724 pipeline_driver_executor.cpp:170] [Driver] Process error, query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, instance_id=ff6b894c-95a3-11ef-af04-02421b3f38d0, status=Internal error: auto increment allocate failed, err msg:
be/src/exec/tablet_sink.cpp:779 StorageEngine::instance()->get_next_increment_id_interval(table_id, null_rows, ids)
be/src/exec/tablet_sink.cpp:719 _fill_auto_increment_id_internal(chunk, slot, _schema->table_id())
be/src/exec/tablet_sink.cpp:602 _fill_auto_increment_id(chunk)
W1029 11:25:07.924798 724 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:08.255371 832 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:08.255417 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:08.756377 831 utils.cpp:126] Fail to report to master. host=kkshu-node00, port=9020, code=OK
W1029 11:25:08.756421 831 task_worker_pool.cpp:758] Fail to report workgroup to kkshu-node00:9020, err=-1
W1029 11:25:11.676950 713 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:25:11.676999 713 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d0, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:25:11.729423 1562 exec_state_reporter.cpp:177] ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out)
W1029 11:25:11.729468 1562 pipeline_driver_executor.cpp:333] [Driver] Fail to report exec state: fragment_instance_id=586fc69f-95a4-11ef-af04-02421b3f38d1, status: Internal error: ReportExecStatus() to TNetworkAddress(hostname=kkshu-node00, port=9020) failed:
THRIFT_EAGAIN (timed out), retry_times=1
W1029 11:25:13.025063 719 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=THRIFT_EAGAIN (timed out)
W1029 11:25:14.255106 832 utils.cpp:120] Fail to report to master: THRIFT_EAGAIN (timed out)
W1029 11:25:14.255159 832 task_worker_pool.cpp:807] Fail to report resource_usage to kkshu-node00:9020, err=-1
W1029 11:25:16.247275 716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.275563 716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf
W1029 11:25:16.275615 716 pipeline_driver.cpp:771] [Driver] failed to finish operator called by cancelling operator [fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0] [driver=query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0 driver=driver_65_12, status=OUTPUT_FULL, operator-chain: [analytic_source_65_0x7faeb93f9590(O) -> project_66_0x7faeb93f9f90(X) -> olap_table_sink_-1_0x7faeb93fa990(X)]] [operator=olap_table_sink_-1_0x7faeb93fa990(X)] [error=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf]
W1029 11:25:16.280792 716 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.281117 716 tablet_sink_sender.cpp:248] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf
W1029 11:25:16.281149 716 pipeline_driver.cpp:771] [Driver] failed to finish operator called by cancelling operator [fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0] [driver=query_id=ff6b894c-95a3-11ef-af04-02421b3f38cf fragment_id=ff6b894c-95a3-11ef-af04-02421b3f38d0 driver=driver_65_15, status=OUTPUT_FULL, operator-chain: [analytic_source_65_0x7faeb9461e90(O) -> project_66_0x7faeb9462890(X) -> olap_table_sink_-1_0x7faeb94bf290(X)]] [operator=olap_table_sink_-1_0x7faeb94bf290(X)] [error=no associated load channel ff6b894c-95a3-11ef-af04-02421b3f38cf]
W1029 11:25:16.285954 719 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.285954 722 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.310937 716 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.316108 718 tablet_sink_sender.cpp:296] close channel failed. channel_name=NodeChannel[10001], load_info=load_id=ff6b894c-95a3-11ef-af04-02421b3f38cf, txn_id: 1134856, parallel=1, compress_type=2, error_msg=Cancelled by pipeline engine
W1029 11:25:16.377255 926 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=invalid TType
W1029 11:25:23.142292 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:25:53.145725 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:26:23.222381 794 query_context.cpp:651] Retrying ReportExecStatus: write() send(): Broken pipe
W1029 11:26:50.080577 749 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.632344 925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.736292 925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
W1029 11:27:19.854425 925 thrift_rpc_helper.cpp:129] Rpc error: FE RPC failure, address=TNetworkAddress(hostname=kkshu-node00, port=9020), reason=No more data to read.
be.out
start time: Mon Sep 9 21:30:36 CST 2024, server uptime: 21:30:36 up 3 days, 6:28, 0 users, load average: 0.72, 0.40, 0.21
start time: Mon Sep 9 21:31:09 CST 2024, server uptime: 21:31:09 up 3 days, 6:28, 0 users, load average: 0.57, 0.40, 0.22
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1387117716.
No nodes to print
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1701788696.
No nodes to print
start time: Sun Sep 29 14:41:33 CST 2024, server uptime: 14:41:33 up 22 days, 23:39, 0 users, load average: 9.01, 8.48, 7.36
start time: Sun Sep 29 19:27:01 CST 2024, server uptime: 19:27:02 up 23 days, 4:24, 0 users, load average: 5.86, 6.23, 6.82
start time: Sun Sep 29 23:12:13 CST 2024, server uptime: 23:12:13 up 23 days, 8:09, 0 users, load average: 2.89, 2.01, 1.88
start time: Sun Sep 29 23:12:41 CST 2024, server uptime: 23:12:41 up 23 days, 8:10, 0 users, load average: 2.09, 1.91, 1.86
start time: Fri Oct 25 23:55:33 CST 2024, server uptime: 23:55:33 up 49 days, 8:53, 0 users, load average: 45.64, 58.69, 35.39
start time: Sat Oct 26 01:17:11 CST 2024, server uptime: 01:17:11 up 49 days, 10:14, 0 users, load average: 41.18, 13.21, 5.22
start time: Sat Oct 26 03:05:08 CST 2024, server uptime: 03:05:08 up 49 days, 12:02, 0 users, load average: 101.67, 40.64, 15.55
3.2.11 RELEASE (build 10a5f0e)
query_id:1f32cd62-9304-11ef-a20a-02421b3f38cf, fragment_instance:1f32cd62-9304-11ef-a20a-02421b3f38d7
tracker:process consumption: 2410412008
tracker:query_pool consumption: 1209501832
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 377603936
tracker:metadata consumption: 116042398
tracker:tablet_metadata consumption: 86865075
tracker:rowset_metadata consumption: 25761925
tracker:segment_metadata consumption: 173270
tracker:column_metadata consumption: 3242128
tracker:tablet_schema consumption: 4198203
tracker:segment_zonemap consumption: 70132
tracker:short_key_index consumption: 8735
tracker:column_zonemap_index consumption: 135008
tracker:ordinal_index consumption: 1266320
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:column_pool consumption: 0
tracker:page_cache consumption: 0
tracker:update consumption: 43842870
tracker:chunk_allocator consumption: 221170448
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:replication consumption: 0
*** Aborted at 1729883139 (unix time) try "date -d @1729883139" if you are using GNU date ***
PC: @ 0x7f0d733ad007 (unknown)
*** SIGSEGV (@0x7f0c5421bbb0) received by PID 25 (TID 0x7f0cdd1c6640) from PID 1411496880; stack trace: ***
@ 0x849d15a google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f0d73240520 (unknown)
@ 0x7f0d733ad007 (unknown)
@ 0x3fa234d starrocks::FixedLengthColumnBase<>::append()
@ 0x4238dca starrocks::Chunk::append()
@ 0x65cca63 starrocks::spill::OrderedMemTable::append()
@ 0x654deee starrocks::spill::RawSpillerWriter::spill<>()
@ 0x6550245 starrocks::spill::Spiller::spill<>()
@ 0x6647e56 starrocks::pipeline::SpillProcessOperator::pull_chunk()
@ 0x63e184e starrocks::pipeline::PipelineDriver::process()
@ 0x6c745be starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x7013a6c starrocks::ThreadPool::dispatch_thread()
@ 0x700c81a starrocks::Thread::supervise_thread()
@ 0x7f0d73292ac3 (unknown)
@ 0x7f0d73323a04 clone
@ 0x0 (unknown)
start time: Mon Oct 28 15:03:49 CST 2024, server uptime: 15:03:49 up 52 days, 1 min, 0 users, load average: 50.32, 37.67, 32.82
start time: Mon Oct 28 15:09:37 CST 2024, server uptime: 15:09:37 up 52 days, 7 min, 0 users, load average: 56.39, 39.37, 33.98
F1028 16:02:35.853122 848 storage_engine.cpp:596] meet too many error disks, process exit. max_ra @ 0x7fc7afb05ac3 (unknown)
@ 0x7fc7afb96a04 clone
@ 0x0 (unknown)
start time: Mon Oct 28 16:03:20 CST 2024, server uptime: 16:03:20 up 52 days, 1:00, 0 users, load average: 28.10, 16.79, 15.96
start time: Tue Oct 29 03:17:56 CST 2024, server uptime: 03:17:56 up 52 days, 12:15, 0 users, load average: 37.72, 15.73, 8.24
start time: Tue Oct 29 03:19:37 CST 2024, server uptime: 03:19:37 up 52 days, 12:17, 0 users, load average: 42.87, 20.38, 10.48
start time: Tue Oct 29 03:21:13 CST 2024, server uptime: 03:21:13 up 52 days, 12:18, 0 users, load average: 45.21, 23.66, 12.40
start time: Tue Oct 29 03:34:34 CST 2024, server uptime: 03:34:34 up 52 days, 12:32, 0 users, load average: 32.37, 11.18, 8.71
start time: Tue Oct 29 03:36:04 CST 2024, server uptime: 03:36:04 up 52 days, 12:33, 0 users, load average: 40.16, 17.54, 11.10
start time: Tue Oct 29 09:47:29 CST 2024, server uptime: 09:47:29 up 52 days, 18:44, 0 users, load average: 34.14, 14.90, 6.84
start time: Tue Oct 29 10:01:46 CST 2024, server uptime: 10:01:46 up 52 days, 18:59, 0 users, load average: 49.03, 21.86, 11.94
start time: Tue Oct 29 10:41:04 CST 2024, server uptime: 10:41:04 up 52 days, 19:38, 0 users, load average: 39.92, 26.24, 16.39
start time: Tue Oct 29 10:45:04 CST 2024, server uptime: 10:45:04 up 52 days, 19:42, 0 users, load average: 43.09, 31.92, 20.77
start time: Tue Oct 29 10:50:27 CST 2024, server uptime: 10:50:27 up 52 days, 19:47, 0 users, load average: 68.81, 38.93, 25.85
Using local file /opt/starrocks/be/lib/starrocks_be.
Argument "MSWin32" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Argument "linux" isn't numeric in numeric eq (==) at /opt/starrocks/be/bin/jeprof line 5222.
Using local file /opt/starrocks/be/log/heap_profile.25.1789525831.
Dropping nodes with <= 59.8 MB; edges with <= 12.0 abs(MB)
start time: Tue Oct 29 11:11:26 CST 2024, server uptime: 11:11:26 up 52 days, 20:08, 0 users, load average: 39.03, 24.79, 19.11
【背景】做过哪些操作?
【业务影响】
【是否存算分离】
【StarRocks版本】例如:1.18.2
【集群规模】例如:3fe(1 follower+2observer)+5be(fe与be混部)
【机器信息】CPU虚拟核/内存/网卡,例如:48C/64G/万兆
【联系方式】为了在解决问题过程中能及时联系到您获取一些日志信息,请补充下您的联系方式,例如:社区群4-小李或者邮箱,谢谢
【附件】
- fe.log/beINFO/相应截图
- 慢查询:
- Profile信息,获取Profile,通过Profile分析查询瓶颈
- 并行度:show variables like ‘%parallel_fragment_exec_instance_num%’;
- pipeline是否开启:show variables like ‘%pipeline%’;
- be节点cpu和内存使用率截图
- 查询报错:
- query_dump,怎么获取query_dump文件
- be crash
- be.out
- coredump,如何获取coredump
- 外表查询报错
- be.out和fe.warn.log