QTcpSocket spamming EALREADY
-
Hello everyone
I encountered a problem with QTcpSocket spamming connect system call.I'm using Qt-5.15.12 and Debian 12.2.
I have a remote machine (also Debian 12.2) where a QTcpServer is listening for incoming connections, on my local machine I want to connect to the QTcpServer.
For that purpose I create a QTcpSocket and firstly bind to specific address since the local machine has serveral interfaces. Then I start a timer which makes the QTcpSocket in Unconnected or Bound state to reconnect every 30 seconds since network could be unstable and route to the remote machine won't always be available. I want for the QTcpSocket to reconnect after unexpected disconnection or errors and after network failure or outage.As long as the network is stable everything is fine, connection establishes, when remote side for some reason stops listening local socket properly informs about that and gets disconnected then tries to connect back. Problems start when the network is unstable. When after some work destination host becomes unreachable the socket goes haywire, hogging CPU usage. I'm tracking my program's system calls with strace and seeing a lot connect system calls which return -1 and EAGAIN, EALREADY or EINPROGRESS.
So I decided to test this and start with the network already down. I used a rather brute way of simulating a downed network. Both remote and local machines are virtual machines in Proxmox Virtual Environment and there you can just check the Disconnect checkbox (link_down=1).
So the program starts as usual, bind is successful and timer is started but when first connection attempt happens following occurres:- in strace output first connect system call returns EINPROGRESS, which is okay, since the socket is non-blocking and connect operation cannot be completed immediately
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EINPROGRESS (Operation now in progress)- second connect returns EHOSTUNREACH which is also understandable
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)- next connect returns EINPROGRESS again which is also probably fine
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EINPROGRESS (Operation now in progress)- and finally the program starts to call connect system call dozens or even hundreds of times per second with every connect call returning EAGAIN and preceded by poll system call, these rapid calls cause high CPU usage every time the socket trying to connect to a unreachable host. This series of poll+connect calls ends with EHOSTUNREACH and with next socket connection attempt it repeats again
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 415) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 414) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 412) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) ... ... ... poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 344) = 1 ([{fd=10, revents=POLLOUT|POLLERR|POLLHUP}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)Even after the network stabilisation if the socket successfully establishes the connection the system call spamming doesn't stop, returning EALREADY.
connect(2) manual page says the following:
EALREADY The socket is nonblocking and a previous connection attempt has not yet been completed.I get that previous attempts could have been not completed but extending reconnect interval didn't help and watch -n1 "ss -ptna | grep 9543" showed that sockets were closing in between reconnections even with original time interval.
I don't want to use blocking socket, I want to rely on the signal/socket mechanism.So I wanted to know what could cause this behaviour, I can't tell where Linux and Qt acting as they should and where problems begin.
I omitted some details in methods' bodies (mainly logging) and my own implementation of keepalive/heartbeat check for established connection.
Please let me know if you need any additional information
Thanks in advance,
Danielmyclass.hpp
class MyClass : public QObject { Q_OBJECT public: MyClass(); ~MyClass(); bool Init(); private: QTcpSocket socket; int timerId{ 0 }; private: void timerEvent(QTimerEvent *) override private slots: void onConnected(); void onReadyRead(); void onDisconnected(); void onErrorOccurred(QTcpSocket::SocketError); void onStateChanged(QTcpSocket::SocketState); }myclass.cpp
MyClass::MyClass() { connect(&socket, &QTcpSocket::connected, this, &MyClass::onConnected); connect(&socket, &QTcpSocket::readyRead, this, &MyClass::onReadyRead); connect(&socket, &QTcpSocket::disconnected, this, &MyClass::onDisconnected); connect(&socket, &QTcpSocket::errorOccurred, this, &MyClass::onErrorOccurred); connect(&socket, &QTcpSocket::stateChanged, this, &MyClass::onStateChanged); } MyClass::~MyClass() { socket.abort(); disconnect(&socket, &QTcpSocket::stateChanged, this, &MyClass::onStateChanged); disconnect(&socket, &QTcpSocket::errorOccurred, this, &MyClass::onErrorOccurred); disconnect(&socket, &QTcpSocket::disconnected, this, &MyClass::onDisconnected); disconnect(&socket, &QTcpSocket::readyRead, this, &MyClass::onReadyRead); disconnect(&socket, &QTcpSocket::connected, this, &MyClass::onConnected); } bool MyClass::Init() { QHostAddress bindAddress("172.20.20.1"); if (!socket.bind(bindAddress)) return false; timerId = startTimer(std::chrono::milliseconds(30000), Qt::TimerType::VeryCoarseTimer); if (!timerId) return false; return true; } void MyClass::timerEvent(QTimerEvent *) /*override*/ { switch(socket.state()) { case QTcpSocket::SocketState::UnconnectedState: case QTcpSocket::SocketState::BoundState: { QHostAddress hostAddress("172.26.20.10"); socket.connectToHost(hostAddress, 9543, QIODevice::OpenModeFlag::ReadWrite); break; } default: break; } } /* slots */ void MyClass::onConnected() { // inform about successful connection } void MyClass::onReadyRead() { // internal details omitted } void MyClass::onDisconnected() { socket.close(); QHostAddress bindAddress("172.20.20.1"); if (!socket.bind(bindAddress)) return; // warning about failed bind omitted } void MyClass::onErrorOccurred(QTcpSocket::SocketError socketError) { QTcpSocket *pSenderTcpSocket{ qobject_cast<QTcpSocket *>(sender()) }; if (!pSenderTcpSocket) return; // warn about failed cast omitted switch (socketError) { case QTcpSocket::SocketError::ConnectionRefusedError: break; // don't want to reset socket just because other side is not listening default: { // inform about occurred error if (pSenderTcpSocket->state() == QTcpSocket::SocketState::ConnectedState) pSenderTcpSocket->abort(); } } } void MyClass::onStateChanged(QTcpSocket::SocketState socketState) { // inform about changed state } -
Hello everyone
I encountered a problem with QTcpSocket spamming connect system call.I'm using Qt-5.15.12 and Debian 12.2.
I have a remote machine (also Debian 12.2) where a QTcpServer is listening for incoming connections, on my local machine I want to connect to the QTcpServer.
For that purpose I create a QTcpSocket and firstly bind to specific address since the local machine has serveral interfaces. Then I start a timer which makes the QTcpSocket in Unconnected or Bound state to reconnect every 30 seconds since network could be unstable and route to the remote machine won't always be available. I want for the QTcpSocket to reconnect after unexpected disconnection or errors and after network failure or outage.As long as the network is stable everything is fine, connection establishes, when remote side for some reason stops listening local socket properly informs about that and gets disconnected then tries to connect back. Problems start when the network is unstable. When after some work destination host becomes unreachable the socket goes haywire, hogging CPU usage. I'm tracking my program's system calls with strace and seeing a lot connect system calls which return -1 and EAGAIN, EALREADY or EINPROGRESS.
So I decided to test this and start with the network already down. I used a rather brute way of simulating a downed network. Both remote and local machines are virtual machines in Proxmox Virtual Environment and there you can just check the Disconnect checkbox (link_down=1).
So the program starts as usual, bind is successful and timer is started but when first connection attempt happens following occurres:- in strace output first connect system call returns EINPROGRESS, which is okay, since the socket is non-blocking and connect operation cannot be completed immediately
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EINPROGRESS (Operation now in progress)- second connect returns EHOSTUNREACH which is also understandable
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)- next connect returns EINPROGRESS again which is also probably fine
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EINPROGRESS (Operation now in progress)- and finally the program starts to call connect system call dozens or even hundreds of times per second with every connect call returning EAGAIN and preceded by poll system call, these rapid calls cause high CPU usage every time the socket trying to connect to a unreachable host. This series of poll+connect calls ends with EHOSTUNREACH and with next socket connection attempt it repeats again
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 415) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 414) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 412) = 1 ([{fd=10, revents=POLLERR}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EALREADY (Operation already in progress) ... ... ... poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLOUT}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 6, 344) = 1 ([{fd=10, revents=POLLOUT|POLLERR|POLLHUP}]) connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)Even after the network stabilisation if the socket successfully establishes the connection the system call spamming doesn't stop, returning EALREADY.
connect(2) manual page says the following:
EALREADY The socket is nonblocking and a previous connection attempt has not yet been completed.I get that previous attempts could have been not completed but extending reconnect interval didn't help and watch -n1 "ss -ptna | grep 9543" showed that sockets were closing in between reconnections even with original time interval.
I don't want to use blocking socket, I want to rely on the signal/socket mechanism.So I wanted to know what could cause this behaviour, I can't tell where Linux and Qt acting as they should and where problems begin.
I omitted some details in methods' bodies (mainly logging) and my own implementation of keepalive/heartbeat check for established connection.
Please let me know if you need any additional information
Thanks in advance,
Danielmyclass.hpp
class MyClass : public QObject { Q_OBJECT public: MyClass(); ~MyClass(); bool Init(); private: QTcpSocket socket; int timerId{ 0 }; private: void timerEvent(QTimerEvent *) override private slots: void onConnected(); void onReadyRead(); void onDisconnected(); void onErrorOccurred(QTcpSocket::SocketError); void onStateChanged(QTcpSocket::SocketState); }myclass.cpp
MyClass::MyClass() { connect(&socket, &QTcpSocket::connected, this, &MyClass::onConnected); connect(&socket, &QTcpSocket::readyRead, this, &MyClass::onReadyRead); connect(&socket, &QTcpSocket::disconnected, this, &MyClass::onDisconnected); connect(&socket, &QTcpSocket::errorOccurred, this, &MyClass::onErrorOccurred); connect(&socket, &QTcpSocket::stateChanged, this, &MyClass::onStateChanged); } MyClass::~MyClass() { socket.abort(); disconnect(&socket, &QTcpSocket::stateChanged, this, &MyClass::onStateChanged); disconnect(&socket, &QTcpSocket::errorOccurred, this, &MyClass::onErrorOccurred); disconnect(&socket, &QTcpSocket::disconnected, this, &MyClass::onDisconnected); disconnect(&socket, &QTcpSocket::readyRead, this, &MyClass::onReadyRead); disconnect(&socket, &QTcpSocket::connected, this, &MyClass::onConnected); } bool MyClass::Init() { QHostAddress bindAddress("172.20.20.1"); if (!socket.bind(bindAddress)) return false; timerId = startTimer(std::chrono::milliseconds(30000), Qt::TimerType::VeryCoarseTimer); if (!timerId) return false; return true; } void MyClass::timerEvent(QTimerEvent *) /*override*/ { switch(socket.state()) { case QTcpSocket::SocketState::UnconnectedState: case QTcpSocket::SocketState::BoundState: { QHostAddress hostAddress("172.26.20.10"); socket.connectToHost(hostAddress, 9543, QIODevice::OpenModeFlag::ReadWrite); break; } default: break; } } /* slots */ void MyClass::onConnected() { // inform about successful connection } void MyClass::onReadyRead() { // internal details omitted } void MyClass::onDisconnected() { socket.close(); QHostAddress bindAddress("172.20.20.1"); if (!socket.bind(bindAddress)) return; // warning about failed bind omitted } void MyClass::onErrorOccurred(QTcpSocket::SocketError socketError) { QTcpSocket *pSenderTcpSocket{ qobject_cast<QTcpSocket *>(sender()) }; if (!pSenderTcpSocket) return; // warn about failed cast omitted switch (socketError) { case QTcpSocket::SocketError::ConnectionRefusedError: break; // don't want to reset socket just because other side is not listening default: { // inform about occurred error if (pSenderTcpSocket->state() == QTcpSocket::SocketState::ConnectedState) pSenderTcpSocket->abort(); } } } void MyClass::onStateChanged(QTcpSocket::SocketState socketState) { // inform about changed state }hi @xstyx
The issue is probably in your onErrorOccurred handler. You only call abort() when the socket is in ConnectedState, but when the network is down, errors happen while the socket is still in ConnectingState therefore abort() never gets called and the socket gets stuck there indefinitely.
Qt's event loop then keeps polling for POLLOUT, gets POLLERR back, calls connect() again internally, receives EALREADY because the previous attempt is still "in progress", and repeats that hundreds of times per second. That's your CPU spike.
Your timer doesn't help either since it only reconnects from UnconnectedState or BoundState. A socket stuck in ConnectingState just falls through to the default case and gets ignored.
I would suggest calling abort() unconditionally in the default branch, then re-bind manually afterwards, since abort() won't emit disconnected() if the socket wasn't fully connected yet.
default: pSenderTcpSocket->abort(); pSenderTcpSocket->bind(QHostAddress("172.20.20.1")); break; -
hi @xstyx
The issue is probably in your onErrorOccurred handler. You only call abort() when the socket is in ConnectedState, but when the network is down, errors happen while the socket is still in ConnectingState therefore abort() never gets called and the socket gets stuck there indefinitely.
Qt's event loop then keeps polling for POLLOUT, gets POLLERR back, calls connect() again internally, receives EALREADY because the previous attempt is still "in progress", and repeats that hundreds of times per second. That's your CPU spike.
Your timer doesn't help either since it only reconnects from UnconnectedState or BoundState. A socket stuck in ConnectingState just falls through to the default case and gets ignored.
I would suggest calling abort() unconditionally in the default branch, then re-bind manually afterwards, since abort() won't emit disconnected() if the socket wasn't fully connected yet.
default: pSenderTcpSocket->abort(); pSenderTcpSocket->bind(QHostAddress("172.20.20.1")); break; -
hi @xstyx
The issue is probably in your onErrorOccurred handler. You only call abort() when the socket is in ConnectedState, but when the network is down, errors happen while the socket is still in ConnectingState therefore abort() never gets called and the socket gets stuck there indefinitely.
Qt's event loop then keeps polling for POLLOUT, gets POLLERR back, calls connect() again internally, receives EALREADY because the previous attempt is still "in progress", and repeats that hundreds of times per second. That's your CPU spike.
Your timer doesn't help either since it only reconnects from UnconnectedState or BoundState. A socket stuck in ConnectingState just falls through to the default case and gets ignored.
I would suggest calling abort() unconditionally in the default branch, then re-bind manually afterwards, since abort() won't emit disconnected() if the socket wasn't fully connected yet.
default: pSenderTcpSocket->abort(); pSenderTcpSocket->bind(QHostAddress("172.20.20.1")); break;@J.Hilk So I've tried your solution and discovered some things
As I understood (please correct me if I'm wrong) if the socket wasn't able to connect fully with the network already down the socket will not reset its bind address even after abort() in the default section, which is strange since abort() should reset the socket.Here what I saw: the socket is successfully bound at the start of the program, after first connection attempt fails it emits QTcpSocket::SocketError::NetworkError and calls my handler onErrorOccurred. In it the socket aborts and tries to re-bind() itself and fails with bind system call returning -1 EINVAL (Invalid argument). I have to note that address is clearly not invalid as seen in a bind system call further.
The re-bind happens inside if statement and the failed bind immediatelly emits QTcpSocket::SocketError::NetworkError (strangely not AddressInUseError nor SocketAddressNotAvailableError) and calls onErrorOccurred recursively, so I don't get to proceed to the if's body and get rapid logging messages (several hundreds per second) about occurred error and failing bind system calls.
bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.20.20.1")}, 16) = -1 EINVAL (Invalid argument)Did I get it right that in this case bind address simply doesn't get reset?
Interestingly, this time first connect system call didn't return EINPROGRESS but EHOSTUNREACH
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)And one detail I found that using Proxmox link_down=1 and using ip route add unreachable cause different behaviour. If I use link_down=1 then I get the original error with system call spamming, but using ip route add ureachable seem to work just fine, no rapid system calls, although I wasn't able to verify if bind address stays the same or gets reset.
The default section of onErrorOccurred now looks like this:
default: { pSenderTcpSocket->abort(); if (!pSenderTcpSocket->bind(QHostAddress("172.20.20.1"))) { // log some debug messages and pSenderTcpSocket->errorString() // execution never gets here } break; } -
@J.Hilk So I've tried your solution and discovered some things
As I understood (please correct me if I'm wrong) if the socket wasn't able to connect fully with the network already down the socket will not reset its bind address even after abort() in the default section, which is strange since abort() should reset the socket.Here what I saw: the socket is successfully bound at the start of the program, after first connection attempt fails it emits QTcpSocket::SocketError::NetworkError and calls my handler onErrorOccurred. In it the socket aborts and tries to re-bind() itself and fails with bind system call returning -1 EINVAL (Invalid argument). I have to note that address is clearly not invalid as seen in a bind system call further.
The re-bind happens inside if statement and the failed bind immediatelly emits QTcpSocket::SocketError::NetworkError (strangely not AddressInUseError nor SocketAddressNotAvailableError) and calls onErrorOccurred recursively, so I don't get to proceed to the if's body and get rapid logging messages (several hundreds per second) about occurred error and failing bind system calls.
bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.20.20.1")}, 16) = -1 EINVAL (Invalid argument)Did I get it right that in this case bind address simply doesn't get reset?
Interestingly, this time first connect system call didn't return EINPROGRESS but EHOSTUNREACH
connect(10, {sa_family=AF_INET, sin_port=htons(9543), sin_addr=inet_addr("172.26.20.10")}, 16) = -1 EHOSTUNREACH (No route to host)And one detail I found that using Proxmox link_down=1 and using ip route add unreachable cause different behaviour. If I use link_down=1 then I get the original error with system call spamming, but using ip route add ureachable seem to work just fine, no rapid system calls, although I wasn't able to verify if bind address stays the same or gets reset.
The default section of onErrorOccurred now looks like this:
default: { pSenderTcpSocket->abort(); if (!pSenderTcpSocket->bind(QHostAddress("172.20.20.1"))) { // log some debug messages and pSenderTcpSocket->errorString() // execution never gets here } break; }@xstyx I also noticed one thing
I assume now it's a Proxmox thing because the behaviour is different in case of using link_down and ip route add unreachable.I verified that with ip route add unreachable everything works fine, the socket tries to connect, enters HostLookupState, then ConnectingState, then UnconnectedState and finally emits NetworkError and in an interval tries again, without spamming system calls.
But with link_down the socket tries to connect, enters HostLookupState, then ConnectingState and here system calls spamming starts, even before the socket enters UnconnectedState and emits NetworkError, by the time onErrorOccurred gets called spamming has already ended, so I kinda have no way of handling occurred error.Strange behaviour, maybe someone could clearify this.
-
The
EINVALon re-bind right after abort() might be because abort() closes and recreates the socket descriptor internally, and that process possibly isn't fully completed synchronously. Calling bind() immediately within the same slot invocation could be too early — and the failed bind() then emitting NetworkError and re-entering onErrorOccurred`would explain your new spamming loop.It might be worth trying to defer the re-bind using QTimer::singleShot(0, ...) — it's more of a hack than a proper fix, but it could help narrow down whether timing is actually the issue here:
default: pSenderTcpSocket->abort(); QTimer::singleShot(0, this, [this]() { socket.bind(QHostAddress("172.20.20.1")); }); break;As for the link_down case — I think you're right that there's not much you can do in onErrorOccurred, because the spamming seems to happen entirely before it fires. My guess is that the kernel keeps signaling
POLLERRon the fd while Qt's event loop keeps responding to it, all while the socket is still stuck in ConnectingState. That would also explain why ip route add unreachable behaves differently — the kernel can reject that immediately rather than leaving the connection attempt hanging.One approach that might help is implementing a connection timeout manually alongside your connectToHost call, since Qt doesn't expose one directly. I'd make the timer a member so it can be stopped cleanly when the connection succeeds:
// in your class definition QTimer m_connectTimeoutTimer; // in your constructor m_connectTimeoutTimer.setSingleShot(true); connect(&m_connectTimeoutTimer, &QTimer::timeout, this, [this]() { if (socket.state() == QAbstractSocket::ConnectingState || socket.state() == QAbstractSocket::HostLookupState) { socket.abort(); QTimer::singleShot(0, this, [this]() { socket.bind(QHostAddress("172.20.20.1")); }); } }); // when starting a connection attempt socket.connectToHost(QHostAddress("172.26.20.10"), 9543); m_connectTimeoutTimer.start(10000); // in onConnected() m_connectTimeoutTimer.stop();This would at least force a clean reset if the socket gets stuck in ConnectingState indefinitely. Not sure if that fully solves the underlying issue, but it should prevent the CPU spike from running unchecked. Adjust the timeout to whatever makes sense for your reconnect interval.
-
The
EINVALon re-bind right after abort() might be because abort() closes and recreates the socket descriptor internally, and that process possibly isn't fully completed synchronously. Calling bind() immediately within the same slot invocation could be too early — and the failed bind() then emitting NetworkError and re-entering onErrorOccurred`would explain your new spamming loop.It might be worth trying to defer the re-bind using QTimer::singleShot(0, ...) — it's more of a hack than a proper fix, but it could help narrow down whether timing is actually the issue here:
default: pSenderTcpSocket->abort(); QTimer::singleShot(0, this, [this]() { socket.bind(QHostAddress("172.20.20.1")); }); break;As for the link_down case — I think you're right that there's not much you can do in onErrorOccurred, because the spamming seems to happen entirely before it fires. My guess is that the kernel keeps signaling
POLLERRon the fd while Qt's event loop keeps responding to it, all while the socket is still stuck in ConnectingState. That would also explain why ip route add unreachable behaves differently — the kernel can reject that immediately rather than leaving the connection attempt hanging.One approach that might help is implementing a connection timeout manually alongside your connectToHost call, since Qt doesn't expose one directly. I'd make the timer a member so it can be stopped cleanly when the connection succeeds:
// in your class definition QTimer m_connectTimeoutTimer; // in your constructor m_connectTimeoutTimer.setSingleShot(true); connect(&m_connectTimeoutTimer, &QTimer::timeout, this, [this]() { if (socket.state() == QAbstractSocket::ConnectingState || socket.state() == QAbstractSocket::HostLookupState) { socket.abort(); QTimer::singleShot(0, this, [this]() { socket.bind(QHostAddress("172.20.20.1")); }); } }); // when starting a connection attempt socket.connectToHost(QHostAddress("172.26.20.10"), 9543); m_connectTimeoutTimer.start(10000); // in onConnected() m_connectTimeoutTimer.stop();This would at least force a clean reset if the socket gets stuck in ConnectingState indefinitely. Not sure if that fully solves the underlying issue, but it should prevent the CPU spike from running unchecked. Adjust the timeout to whatever makes sense for your reconnect interval.
@J.Hilk
Did a little digging on the Qt Jira and found a similar problem to my own
I'm now almost completely sure that this spamming is a Qt thing
Since spamming on first connection before evenonErrorOccurredcould shot happens only if socket's connection attempt gets rejected via Proxmoxlink_down=1or the following usage ofnftablesnft add table ip filter nft add chain ip filter output { type filter hook output priority 0 \; } nft add rule ip filter output ip daddr 172.26.20.10 rejectBut if we change
rejecttodropnft add rule ip filter output ip daddr 172.26.20.10 dropeverything works as intended as it was with
ip add unreachablewhich also drops connection attempts.Also found out that dropped connection attempts are just silently dropped, but rejected connections are informed about via ICMP-response. I get that ICMP is the network layer of TCP/IP and we're sitting on the transport layer, but it seems that Qt maybe doesn't properly handles ICMP-responses of rejected connections? Again, please correct me if I'm wrong.
I downloaded Qt source code from here and applied changes shown in the codereview, then recompiled the Network module. With new
libQt5Network.so.5.15.8first connection attempt is still spamming but not as much, following connection attempts stop immediately, no spamming, looks promising.
Sadly, as soon as I delete rejected rule with first established connection the socket again start spammingEAGAIN, so this solution doesn't save from this. Will keep to look into that.Maybe with this additional info something could be said or found about possible complete solution.
One thing I also want to mention in the original post I wrote that I use Qt-5.15.12 but it is actually Qt-5.15.8.
-
dear op: there is a difference in TCP syn reject vs tcp syn drop. The reject would send a (nak) signal back.. the dropping would leave the sender unknowing whether the recipient ever received the syn signal in the first place. I think you need to re-look at your methodology for determining whether the server is "available". I'm less inclined to believe it is a QT framework bug. You may need to do a deep dive on the QAbstractSocket::SocketError type and understand exactly why each of those conditions can occur, and react accordingly.