Skip to content

rdma_performance偶发 E110 E112 E101报错 #3365

Description

@daming6

Describe the bug
单机回环执行rdma_performance测试,具体执行命令如下
taskset -c 50-86 ./server -use_rdma 0 -thread_num 37 &
taskset -c 6-32 ./client -thread_num 27 -queue_depth 78 -attachment_size 16788 -use_rdma 0 -connection_type pooled
会偶发报错类似如下:
2026-06-25 14:51:11,661 INFO: 49491 ScriptThread Module:session_manager.py:run_cmd[367] Log:kp_receive:I0625 14:51:10.816046 23666 0 src/brpc/server.cpp:1262 StartInternal] Server[DummyServerOf(/home/ci/brpc/pkgs/example/rdma_performance/client)] is serving on port=8001.
I0625 14:51:10.816567 23666 0 src/brpc/server.cpp:1265 StartInternal] Check out http://localhost.localdomain:8001 in web browser.
[Threads: 27, Depth: 78, Attachment: 16788B, RDMA: no, Echo: no]
E0625 14:51:11.348613 23671 8589936900 client.cpp:158 HandleResponse] RPC call failed: [E112]Not connected to 0.0.0.0:8002 yet, server_id=1
E0625 14:51:11.348613 23674 4294969867 client.cpp:158 HandleResponse] RPC call failed: [E101]Fail to connect brpc::Socket{id=499 addr=0.0.0.0:8002} (0xffff640436c0): Network is unreachable
E0625 14:51:11.348663 23674 4294969870 client.cpp:158 HandleResponse] RPC call failed: [E101]Fail to connect brpc::Socket{id=110 addr=0.0.0.0:8002} (0xffff5803f000): Network is unreachable
E0625 14:51:11.348655 23677 4294969093 client.cpp:158 HandleResponse] RPC call failed: [E112]Not connected to 0.0.0.0:8002 yet, server_id=1
E0625 14:51:11.348683 23674 4294969877 client.cpp:158 HandleResponse] RPC call failed: [E110]Fail to connect brpc::Socket{id=268 addr=0.0.0.0:8002} (0xffff5003f680): Connection timed out
E0625 14:51:11.348672 23675 4294968835 client.cpp:158 HandleResponse] RPC call failed: [E112]Not connected to 0.0.0.0:8002 yet, server_id=1
E0625 14:51:11.348705 23677 4294969092 client.cpp:158 HandleResponse] RPC call failed: [E112]Not connected to 0.0.0.0:8002 yet, server_id=1
E0625 14:51:11.348702 23674 4294969902 client.cpp:158 HandleResponse] RPC call failed: [E110]Fail to connect brpc::Socket{id=120 addr=0.0.0.0:8002} (0xffff58041080): Connection timed out

重新执行1-3次又没有上述报错,正常结束性能测试。
将系统参数sysctl net.core.somaxconn调到跟硬编码的backlog全连接队列最大长度一样,即65535,重新多次性能测试,E112 E101报错就不报了,只剩偶发E110超时报错。
报错是什么原因?先不考虑客户端的并发数据量比较大,服务端在默认配置时间内处理不过来的情况,上述举例用例客户端并发数据量并不算大。客户端的并发数据量比较大的场景也是重测1-3次也能正常结束不报错。
有什么方法解决或者规避?

To Reproduce
单机回环执行rdma_performance测试,具体执行命令如下
taskset -c 50-86 ./server -use_rdma 0 -thread_num 37 &
taskset -c 6-32 ./client -thread_num 27 -queue_depth 78 -attachment_size 16788 -use_rdma 0 -connection_type pooled

Expected behavior
期望客户端并发数据量并不算大场景下能第一次就成功执行用例,没有E112 E101 E110报错

Versions
OS: openEuler 24.03 (LTS-SP2)
Compiler: gcc 12.3.1
brpc: 1.16
protobuf: protobuf-25.1-12.oe2403sp2.aarch64

Additional context/screenshots

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions