【详述】starRocks be启动失败以及查询挂掉be
【背景】启动一台be失败
【业务影响】
【StarRocks版本】2.5.3
【集群规模】例如:2fe + 3 be
【机器信息】x86_64 16 cpu, mem 32G
【联系方式】starRocks社区群3, @ θ
【附件】
ldd --version
ldd (GNU libc) 2.28
OS:
BigCloud Enterprise Linux For Euler release 21.10 (LTS-SP2)
be.INFO
I0620 08:42:47.104465 1205922 txn_manager.cpp:285] Commit txn successfully. tablet: 10208, txn_id: 2770, rowsetid: 02000000000005133943c0eb2dd73034f1b1f54563526bb8 #segment:0 #delfile:0
I0620 08:42:47.104466 1205922 data_dir.cpp:337] Added committed rowset=02000000000005133943c0eb2dd73034f1b1f54563526bb8 tablet=10208 schema hash=1612420876 txn_id: 2770
I0620 08:42:47.110098 1205988 fragment_mgr.cpp:516] FragmentMgr cancel worker start working.
I0620 08:42:47.116009 1205837 exec_env.cpp:173] [PIPELINE] Exec thread pool: thread_num=16
I0620 08:42:47.179982 1206390 runtime_filter_worker.cpp:760] RuntimeFilterWorker start working.
I0620 08:42:47.180194 1206392 profile_report_worker.cpp:99] ProfileReportWorker start working.
I0620 08:42:47.180380 1206393 result_buffer_mgr.cpp:132] result buffer manager cancel thread begin.
I0620 08:42:47.182157 1205837 load_path_mgr.cpp:55] Load path configured to [/opt/cluster/data/mpp/storage/mini_download]
I0620 08:42:47.191814 1206495 compaction_manager.cpp:57] start compaction scheduler
I0620 08:42:47.192272 1206497 storage_engine.cpp:609] start to check compaction
I0620 08:42:47.193277 1206503 olap_server.cpp:667] begin to do tablet meta checkpoint:/opt/cluster/data/mpp/storage
I0620 08:42:47.193517 1206506 olap_server.cpp:617] try to perform path gc by tablet!
I0620 08:42:47.193552 1205837 olap_server.cpp:208] All backgroud threads of storage engine have started.
I0620 08:42:47.194677 1205837 thrift_server.cpp:375] heartbeat has started listening port on 9050
I0620 08:42:47.194700 1205837 backend_base.cpp:66] StarRocksInternalService has started listening port on 9060
I0620 08:42:47.194977 1205837 thrift_server.cpp:375] BackendService has started listening port on 9060
I0620 08:42:47.201607 1205837 server.cpp:1070] Server[starrocks::BackendInternalServiceImplstarrocks::PInternalService+starrocks::LakeServiceImpl+starrocks::BackendInternalServiceImpldoris::PBackendService] is serving on port=8060.
I0620 08:42:47.201661 1205837 server.cpp:1073] Check out h
每天的错误不太一样
be.out
tracker:clone consumption: 0
tracker:consistency consumption: 0
*** Aborted at 1687221767 (unix time) try “date -d @1687221767” if you are using GNU date ***
PC: @ 0x7fa9f0e5c60b gsignal
*** SIGABRT (@0x3e80012664d) received by PID 1205837 (TID 0x7fa9f0e05fc0) from PID 1205837; stack trace: ***
@ 0x5769222 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fa9f11834c0 (unknown)
@ 0x7fa9f0e5c60b gsignal
@ 0x7fa9f0e5d931 abort
@ 0x2a31bdc _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x7c308b6 __cxxabiv1::__terminate()
@ 0x7c30921 std::terminate()
@ 0x7c30a74 __cxa_throw
@ 0x2a33ce0 std::__throw_system_error()
@ 0x7cabc59 std::_M_start_thread()
@ 0x4cb50d0 starrocks::EvHttpServer::start()
@ 0x4740d7a starrocks::HttpServiceBE::start()
@ 0x473e91c start_be()
@ 0x2a37450 main
@ 0x7fa9f0e48b27 __libc_start_main
@ 0x2b5172f (unknown)
@ 0x0 (unknown)
be.WARNING
W0620 08:42:47.262043 1205837 stack_util.cpp:128] 2023-06-20 08:42:47.262009, query_id=00000000-0000-0000-0000-000000000000, fragment_instance_id=0000000
dmsg -T
[二 6月 20 08:44:05 2023] audit: type=1110 audit(1687221841.941:1287572): pid=1206665 uid=0 auid=1002 ses=176348 msg=‘op=PAM:setcred grantors=pam_env,pam_faillock,pam_unix acct=“aspmon” exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success’
[二 6月 20 08:44:05 2023] audit: type=1105 audit(1687221841.943:1287573): pid=1206666 uid=0 auid=1002 ses=176349 msg=‘op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct=“aspmon” exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success’
[二 6月 20 08:44:05 2023] audit: type=1110 audit(1687221841.944:1287574): pid=1206666 uid=0 auid=1002 ses=176349 msg=‘op=PAM:setcred grantors=pam_env,pam_faillock,pam_unix acct=“aspmon” exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success’
问题二:
380W数据,从数据来源ES外表 使用insert into select 语句插入到内表,6个字段的DDL,有按照时间字段分区和分桶
select colum1, count(1) from table_1 group by colum1; —没有问题
select count(1) from table_1 ; —没有问题
select * from table_1 limit 10000, 10 ; —没有问题
select * from table_1; —直接挂掉be
select * from table_1 order by time_type_column limit 10000, 10; – 直接挂掉be