部署情况:
主机 角色
FE BE Broker
ip1 √ √ √
ip2 √ √ √
ip3 √ √ √
3台服务器情况 (2016年采购的):
系统:CentOS Linux release 7.6.1810 (Core)
CPU:Intel® Xeon® CPU E5-2658A v3 @ 2.20GHz
CPU核数(逻辑核数):48核
内存:256G
磁盘:1.1TB*10 (STS)
问题描述:
之前集群使用的是 StarRocks 2.3.0-RC01,运行一段时间未发现问题,近期发现 所有be日志下 be.out 里面有如下报错:
咨询官方技术人员后,说是要升级到2.3.3。
于是把BE、FE从2.3 升级到2.3.3。升级后能启动成功,运行一小会,所有Fe健在,所有Be又会挂掉,仍然报出
集群Be的配置:
INFO, WARNING, ERROR, FATAL
sys_log_level = INFO
ports for admin, web, heartbeat service
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060
Choose one if there are more than one ip except loopback address.
Note that there should at most one ip match this list.
If no ip match this rule, will choose one randomly.
use CIDR format, e.g. 10.10.10.0/24
Default value is empty.
priority_networks = 192.168.235.0/24
data root path, separate by ‘;’
you can specify the storage medium of each root path, HDD or SSD, seperate by ‘,’
eg:
storage_root_path = /data1,medium:HDD;/data2,medium:SSD;/data3
/data1, HDD;
/data2, SSD;
/data3, HDD(default);
Default value is ${STARROCKS_HOME}/storage, you should create it by hand.
storage_root_path = /data2/StarRocks/storage,medium:SSD;/data3/StarRocks/storage,medium:SSD;/data4/StarRocks/storage,medium:SSD;/data5/StarRocks/storage,medium:SSD;/data6/StarRocks/storage,medium:SSD;/data7/StarRocks/storage,medium:SSD;/data8/StarRocks/storage,medium:SSD;/data9/StarRocks/storage,medium:SSD;/data10/StarRocks/storage,medium:SSD
Advanced configurations
sys_log_dir = ${STARROCKS_HOME}/log
sys_log_roll_mode = SIZE-MB-1024
sys_log_roll_num = 10
sys_log_verbose_modules = *
log_buffer_level = -1
default_rowset_type = beta
参数优化
default_storage_medium = SSD
push_worker_count_normal_priority=5
sys_log_roll_mode=SIZE-MB-300
sys_log_roll_num=50
fragment_pool_queue_size=2048
mem_limit=200G
单条SQL最大使用内存16G
exec_mem_limit=17179869184
解决 too many tablet versions 问题
cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 10
#tablet_max_pending_versions=10000
tablet_max_versions=20000
解决 Txn number exceeds the limit. txn_count: 116, limit: 100
max_runnings_transactions_per_txn_map=1000
实时压测配置测试
stream_load_default_timeout_second = 10800
streaming_load_max_mb=102400
streaming_load_max_batch_size_mb=102400
flush_thread_num_per_store=8
olap_table_sink_send_interval_ms=1
load_process_max_memory_limit_percent=50
enable_new_load_on_memory_limit_exceeded=true
#设置base 合并时间
base_compaction_start_hour=3
base_compaction_end_hour=8
合并优化
base_compaction_check_interval_seconds=300