rpc timeout能否调大些?

版本2.5.11
用户反应导出数据1小时自动断开
查看日志发现BE有以下报错,时间与前端报错退出吻合:

W1110 16:29:29.511858 26917 tablet_updates.cpp:1286] wait_for_version slow(8601ms) version:5225010.1 tablet:266800 #version:301 [5224726 5225010.1@299 5225011] pending: rowsets:6[id/seg/row/del/byte/compaction]: [0/1/2955550/1/370.67 MB/-114.67 MB],[1/1/2024671/772/280.80 MB/-24.27 MB],[2/1/6464395/194770/897.69 MB/-506.45 MB],[214668/0/0/0/0/256.00 MB],[214669/1/1120181/0/151.63 MB/104.37 MB],[214670/0/0/0/0/256.00 MB]
W1110 16:29:29.690209 26788 tablet_updates.cpp:1286] wait_for_version slow(3388ms) version:5225011 tablet:266800 #version:301 [5224726 5225011@300 5225011] pending: rowsets:6[id/seg/row/del/byte/compaction]: [0/1/2955550/1/370.67 MB/-114.67 MB],[1/1/2024671/772/280.80 MB/-24.27 MB],[2/1/6464395/194770/897.69 MB/-506.45 MB],[214668/0/0/0/0/256.00 MB],[214669/1/1120181/0/151.63 MB/104.37 MB],[214670/0/0/0/0/256.00 MB]
W1110 16:29:31.995507 26889 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:31.995609 26889 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:31.999105 26987 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:31.999157 26987 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.033440 26978 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.033533 26978 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.067183 26999 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.067219 26999 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.093322 26969 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.093392 26969 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.098137 26972 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.098204 26972 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.115928 26971 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.116001 26971 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.119375 26963 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.119410 26963 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:32.120908 26972 disposable_closure.h:38] brpc failed, error=RPC call is timed out, error_text=[E1008]Reached timeout=3600000ms @10.133.58.42:8060
W1110 16:29:32.120980 26972 sink_buffer.cpp:356] transmit chunk rpc failed:e3cb0bc9-7f9a-11ee-87d6-28e424bb04d1
W1110 16:29:52.848373   371 agent_server.cpp:477] fail to make_snapshot. tablet_id:266831 msg:Not found: get_rowsets_for_snapshot: no version to clone tablet:266831 #version:304 [5239223 5239512@303 5239512] #pending:0 request_version:5239513,
W1110 16:30:00.883904 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fbaea891410, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaea493b10(X) -> local_sort_sink_70_0x7fbaea890510(X)] cancels operator local_sort_sink_70_0x7fbaea890510(X) with finished error runtime state is cancelled
W1110 16:30:00.883970 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fbaea893210, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaea891b90(X) -> local_sort_sink_70_0x7fbaea892810(X)] cancels operator local_sort_sink_70_0x7fbaea892810(X) with finished error runtime state is cancelled
W1110 16:30:00.883989 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fbaea894d90, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaea893710(O) -> local_sort_sink_70_0x7fbaea894390(X)] cancels operator local_sort_sink_70_0x7fbaea894390(X) with finished error runtime state is cancelled
W1110 16:30:00.884001 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fbaec39b910, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaec39a510(X) -> local_sort_sink_70_0x7fbaec39af10(X)] cancels operator local_sort_sink_70_0x7fbaec39af10(X) with finished error runtime state is cancelled
W1110 16:30:00.884012 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fbaec39d710, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaec39c090(X) -> local_sort_sink_70_0x7fbaec39cd10(X)] cancels operator local_sort_sink_70_0x7fbaec39cd10(X) with finished error runtime state is cancelled
W1110 16:30:00.884023 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fba8adbd290, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fbaec39de90(X) -> local_sort_sink_70_0x7fbaec39e890(X)] cancels operator local_sort_sink_70_0x7fbaec39e890(X) with finished error runtime state is cancelled
W1110 16:30:00.884037 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fba8adbee10, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fba8adbda10(X) -> local_sort_sink_70_0x7fba8adbe410(X)] cancels operator local_sort_sink_70_0x7fba8adbe410(X) with finished error runtime state is cancelled
W1110 16:30:00.884055 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fba8adc0710, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fba8adbf310(X) -> local_sort_sink_70_0x7fba8adbfd10(X)] cancels operator local_sort_sink_70_0x7fba8adbfd10(X) with finished error runtime state is cancelled
W1110 16:30:00.884083 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fb996b4b010, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fba8adc0c10(X) -> local_sort_sink_70_0x7fba8adc1610(X)] cancels operator local_sort_sink_70_0x7fba8adc1610(X) with finished error runtime state is cancelled
W1110 16:30:00.884110 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fb996b4c910, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fb996b4b510(X) -> local_sort_sink_70_0x7fb996b4bf10(X)] cancels operator local_sort_sink_70_0x7fb996b4bf10(X) with finished error runtime state is cancelled
W1110 16:30:00.884124 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fb996b4e210, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fb996b4ce10(X) -> local_sort_sink_70_0x7fb996b4d810(X)] cancels operator local_sort_sink_70_0x7fb996b4d810(X) with finished error runtime state is cancelled
W1110 16:30:00.884136 25800 pipeline_driver.cpp:613] fragment_id 53e000f3-7fa3-11ee-8a42-fa163e281a6c driver query_id=53e000f3-7fa3-11ee-8a42-fa163e281a59 fragment_id=53e000f3-7fa3-11ee-8a42-fa163e281a6c driver=0x7fb996b4fd90, status=INPUT_EMPTY, operator-chain: [aggregate_blocking_source_69_0x7fb996b4e710(X) -> local_sort_sink_70_0x7fb996b4f110(X)] cancels operator local_sort_sink_70_0x7fb996b4f110(X) with finished error runtime state is cancelled

想问下这个timeout值能调大些吗?

这个节点日志还有吗?没有重启的话打下pstack,pstack $be_pid > pstack.log

用户反应现在1小时或1小时09分几乎准时断开 :joy:
下面是根据断开时间点搜集的BE FE和pstack BE日志

be.WARNING.log (5.4 KB) fe.log (23.4 KB) pstack.log (1.1 MB)

后面的版本这个会跟查询保持一致