3.5.3 no-axv2版本的be编译后还是报Illegal instruction

【详述】3.5.3 no-axv2版本的be编译后还是报Illegal instruction
【背景】rm -rf /var/local/thirdparty/installed
cd /root/starrocks/thirdparty
export THIRD PARTY BUILD WITH AVX2=OFF
bash build-thirdparty.sh
In-s /root/starrocks/thirdparty/installed /var/local/thirdparty/installed
cd /root/starrocks
bash build.sh --be --without-avx2
【是否存算分离】
【StarRocks版本】3.5.2
【机器信息】32g,centos7.9,不支持axv2的虚拟机,使用docker编译
【联系方式】邮箱972263102@qq.com
【附件】
image

能弄一个coredump出来看一下crash的栈吗? 需要看一下是在哪条指令出问题的.

1赞

感谢解答,不知道这里怎么获取coredump,是启动be,没有日志,只有Illegal instruction,没有进程

ulimit -c unlimited

允许生成core

cat /proc/sys/kernel/core_pattern

看看core生成的位置, 如果不确定, 可以

echo 'core.%p' > /proc/sys/kernel/core_pattern

将coredump生成文件的位置改到当前路径下.

(gdb) bt full
#0 0x0000000003fcce41 in _GLOBAL__sub_I_type.cc ()
No symbol table info available.
#1 0x000000000ef4230d in __libc_csu_init ()
No symbol table info available.
#2 0x00002ad86120c4e5 in __libc_start_main (main=0x3c7c230 , argc=1, argv=0x7ffe3b73c928, init=0xef422c0 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7ffe3b73c918)
at …/csu/libc-start.c:225
result =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47108808184876, 47108808184384, 7698728878082, 42949672968, 0, 1, 140729895864616, 140729895864632}, mask_was_saved = 1609011864}}, priv = {pad = {0x1, 0x2ad85fe79290,
0x2ad85eaf79c3 <_dl_init+275>, 0x2ad85ed0b150}, data = {prev = 0x1, cleanup = 0x2ad85fe79290, canceltype = 1588558275}}}
not_first_call =
#3 0x0000000003feaf0a in _start ()
No symbol table info available.

gdb里运行

disass 0x0000000003fcce41

Dump of assembler code for function _GLOBAL__sub_I_type.cc:
0x0000000003fcce40 <+0>: push %rbp
0x0000000003fcce41 <+1>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fcce45 <+5>: lea 0x10dbac84(%rip),%rsi # 0x14d87ad0 <_ZN5arrow12_GLOBAL__N_118g_signed_int_typesE>
0x0000000003fcce4c <+12>: mov %rsp,%rbp
0x0000000003fcce4f <+15>: push %r12
0x0000000003fcce51 <+17>: lea 0xfc26570(%rip),%r12 # 0x13bf33c8
0x0000000003fcce58 <+24>: push %rbx
0x0000000003fcce59 <+25>: lea 0x7f4bf40(%rip),%rbx # 0xbf18da0 <_ZNSt6vectorISt10shared_ptrIN5arrow8DataTypeEESaIS3_EED2Ev>
0x0000000003fcce60 <+32>: mov %r12,%rdx
0x0000000003fcce63 <+35>: vmovdqa %xmm0,0x10dbac65(%rip) # 0x14d87ad0 <_ZN5arrow12_GLOBAL__N_118g_signed_int_typesE>
0x0000000003fcce6b <+43>: movq $0x0,0x10dbac6a(%rip) # 0x14d87ae0 <_ZN5arrow12_GLOBAL__N_118g_signed_int_typesE+16>
0x0000000003fcce76 <+54>: mov %rbx,%rdi
0x0000000003fcce79 <+57>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fcce7e <+62>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fcce82 <+66>: mov %r12,%rdx
0x0000000003fcce85 <+69>: mov %rbx,%rdi
0x0000000003fcce88 <+72>: lea 0x10dbac21(%rip),%rsi # 0x14d87ab0 <_ZN5arrow12_GLOBAL__N_120g_unsigned_int_typesE>
—Type to continue, or q to quit—
0x0000000003fcce8f <+79>: vmovdqa %xmm0,0x10dbac19(%rip) # 0x14d87ab0 <_ZN5arrow12_GLOBAL__N_120g_unsigned_int_typesE>
0x0000000003fcce97 <+87>: movq $0x0,0x10dbac1e(%rip) # 0x14d87ac0 <_ZN5arrow12_GLOBAL__N_120g_unsigned_int_typesE+16>
0x0000000003fccea2 <+98>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccea7 <+103>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fcceab <+107>: mov %r12,%rdx
0x0000000003fcceae <+110>: mov %rbx,%rdi
0x0000000003fcceb1 <+113>: lea 0x10dbabd8(%rip),%rsi # 0x14d87a90 <_ZN5arrow12_GLOBAL__N_111g_int_typesE>
0x0000000003fcceb8 <+120>: vmovdqa %xmm0,0x10dbabd0(%rip) # 0x14d87a90 <_ZN5arrow12_GLOBAL__N_111g_int_typesE>
0x0000000003fccec0 <+128>: movq $0x0,0x10dbabd5(%rip) # 0x14d87aa0 <_ZN5arrow12_GLOBAL__N_111g_int_typesE+16>
0x0000000003fccecb <+139>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fcced0 <+144>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fcced4 <+148>: mov %r12,%rdx
0x0000000003fcced7 <+151>: mov %rbx,%rdi
0x0000000003fcceda <+154>: lea 0x10dbab8f(%rip),%rsi # 0x14d87a70 <_ZN5arrow12_GLOBAL__N_116g_floating_typesE>
0x0000000003fccee1 <+161>: vmovdqa %xmm0,0x10dbab87(%rip) # 0x14d87a70 <_ZN5arrow12_GLOBAL__N_116g_floating_typesE>
0x0000000003fccee9 <+169>: movq $0x0,0x10dbab8c(%rip) # 0x14d87a80 <_ZN5arrow12_GLOBAL__N_116g_fl—Type to continue, or q to quit—
oating_typesE+16>
0x0000000003fccef4 <+180>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccef9 <+185>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fccefd <+189>: mov %r12,%rdx
0x0000000003fccf00 <+192>: mov %rbx,%rdi
0x0000000003fccf03 <+195>: lea 0x10dbab46(%rip),%rsi # 0x14d87a50 <_ZN5arrow12_GLOBAL__N_115g_numeric_typesE>
0x0000000003fccf0a <+202>: vmovdqa %xmm0,0x10dbab3e(%rip) # 0x14d87a50 <_ZN5arrow12_GLOBAL__N_115g_numeric_typesE>
0x0000000003fccf12 <+210>: movq $0x0,0x10dbab43(%rip) # 0x14d87a60 <_ZN5arrow12_GLOBAL__N_115g_numeric_typesE+16>
0x0000000003fccf1d <+221>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccf22 <+226>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fccf26 <+230>: mov %r12,%rdx
0x0000000003fccf29 <+233>: mov %rbx,%rdi
0x0000000003fccf2c <+236>: lea 0x10dbaafd(%rip),%rsi # 0x14d87a30 <_ZN5arrow12_GLOBAL__N_119g_base_binary_typesE>
0x0000000003fccf33 <+243>: vmovdqa %xmm0,0x10dbaaf5(%rip) # 0x14d87a30 <_ZN5arrow12_GLOBAL__N_119g_base_binary_typesE>
0x0000000003fccf3b <+251>: movq $0x0,0x10dbaafa(%rip) # 0x14d87a40 <_ZN5arrow12_GLOBAL__N_119g_base_binary_typesE+16>
0x0000000003fccf46 <+262>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccf4b <+267>: vpxor %xmm0,%xmm0,%xmm0
—Type to continue, or q to quit—
0x0000000003fccf4f <+271>: mov %r12,%rdx
0x0000000003fccf52 <+274>: mov %rbx,%rdi
0x0000000003fccf55 <+277>: lea 0x10dbaab4(%rip),%rsi # 0x14d87a10 <_ZN5arrow12_GLOBAL__N_116g_temporal_typesE>
0x0000000003fccf5c <+284>: vmovdqa %xmm0,0x10dbaaac(%rip) # 0x14d87a10 <_ZN5arrow12_GLOBAL__N_116g_temporal_typesE>
0x0000000003fccf64 <+292>: movq $0x0,0x10dbaab1(%rip) # 0x14d87a20 <_ZN5arrow12_GLOBAL__N_116g_temporal_typesE+16>
0x0000000003fccf6f <+303>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccf74 <+308>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fccf78 <+312>: mov %r12,%rdx
0x0000000003fccf7b <+315>: mov %rbx,%rdi
0x0000000003fccf7e <+318>: lea 0x10dbaa6b(%rip),%rsi # 0x14d879f0 <_ZN5arrow12_GLOBAL__N_116g_interval_typesE>
0x0000000003fccf85 <+325>: vmovdqa %xmm0,0x10dbaa63(%rip) # 0x14d879f0 <_ZN5arrow12_GLOBAL__N_116g_interval_typesE>
0x0000000003fccf8d <+333>: movq $0x0,0x10dbaa68(%rip) # 0x14d87a00 <_ZN5arrow12_GLOBAL__N_116g_interval_typesE+16>
0x0000000003fccf98 <+344>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccf9d <+349>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fccfa1 <+353>: mov %r12,%rdx
0x0000000003fccfa4 <+356>: mov %rbx,%rdi
0x0000000003fccfa7 <+359>: lea 0x10dbaa22(%rip),%rsi # 0x14d879d0 <_ZN5arrow12_GLOBAL__N_116g_du—Type to continue, or q to quit—
ration_typesE>
0x0000000003fccfae <+366>: vmovdqa %xmm0,0x10dbaa1a(%rip) # 0x14d879d0 <_ZN5arrow12_GLOBAL__N_116g_duration_typesE>
0x0000000003fccfb6 <+374>: movq $0x0,0x10dbaa1f(%rip) # 0x14d879e0 <_ZN5arrow12_GLOBAL__N_116g_duration_typesE+16>
0x0000000003fccfc1 <+385>: callq 0x3610130 __cxa_atexit@plt
0x0000000003fccfc6 <+390>: vpxor %xmm0,%xmm0,%xmm0
0x0000000003fccfca <+394>: mov %r12,%rdx
0x0000000003fccfcd <+397>: mov %rbx,%rdi
0x0000000003fccfd0 <+400>: lea 0x10dba9d9(%rip),%rsi # 0x14d879b0 <_ZN5arrow12_GLOBAL__N_117g_primitive_typesE>
0x0000000003fccfd7 <+407>: pop %rbx
0x0000000003fccfd8 <+408>: pop %r12
0x0000000003fccfda <+410>: movq $0x0,0x10dba9db(%rip) # 0x14d879c0 <_ZN5arrow12_GLOBAL__N_117g_primitive_typesE+16>
0x0000000003fccfe5 <+421>: pop %rbp
0x0000000003fccfe6 <+422>: vmovdqa %xmm0,0x10dba9c2(%rip) # 0x14d879b0 <_ZN5arrow12_GLOBAL__N_117g_primitive_typesE>
0x0000000003fccfee <+430>: jmpq 0x3610130 __cxa_atexit@plt
End of assembler dump.

looks like something related to the arrow thirdparty library build with avx2 enabled.

https://github.com/StarRocks/starrocks/blob/main/thirdparty/build-thirdparty.sh#L878-L879

手动把-DARROW_SIMD_LEVEL=AVX2 改成-DARROW_SIMD_LEVEL=DEFAULT

重新编译一下三方看看.

不太理解, 你的三方编译和SR编译, 以及后来运行的机器是不同的机器吗?

三方,sr编译是同一台虚拟机的docker,be运行也是一样的宿主机分出来的虚拟机,是一样的

lscpu

看看CPU支持的指令集.

BE运行的虚拟机里也跑一下lscpu

怀疑物理机本身是支持AVX2指令的, 只不过虚拟机里被禁用了.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 15
Model: 107
Model name: QEMU Virtual CPU version 2.5+
Stepping: 1
CPU MHz: 2194.916
BogoMIPS: 4389.83
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-3
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology eagerfpu pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
这是be启动的虚拟机

编译代码的虚拟机也跑一下lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 15
Model: 107
Model name: QEMU Virtual CPU version 2.5+
Stepping: 1
CPU MHz: 2194.916
BogoMIPS: 4389.83
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-3
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology eagerfpu pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm
这是编译be的docker容器钟运行的

这不科学啊, 编译代码的CPU不支持avx2指令, 咋能编译出带有avx2指令的二进制执行代码.

懵逼啊大佬,我在虚拟机上用docker版本的starrocks,be启动也是一样的,docker版本的也没axv2版本的image么

官方提供的image是开启了avx2指令的, 在你的机器上肯定跑不起来.

你得重头编译三方, 编译代码, 生成starrocks_be再运行.

感谢帮助,已经这样重新编译,成功启动了,非常感谢大佬的帮助