bareos: master a79789a4

Author Committer Branch Timestamp Parent
Sebastian Sura Bareos Bot master 2024-01-23 10:32 master cd082f1b Pending
Changeset connection-pool: fix data race

Some operations were improperly synchronized. For example take
cleanup() for example:

```
 |for (i = connections_->size() - 1; i >= 0; i--) {
1| connection = connections_->get(i);
 | Dmsg2(800, "checking connection %s (%d)\n", connection->name(), i);
2| if (!connection->check()) {
 | Dmsg2(120, "connection %s (%d) is terminated => removed\n",
 | connection->name(), i);
 | connections_->remove(i);
4| delete (connection);
 | }
 |}
```
We dont lock connections_ or connection in anyway here. This means
that not only could we get a NULL returned at (1), we also have to
account for the fact that at any moment connection could get deleted
from under us from a different thread -- even if we are currently
holding its lock. This will happen if two threads call cleanup at
the same time and one is at (2) while the other one is at (4).

Similarly the check() function just calls WaitDataIntr() on the socket
without ensuring exclusive access (for example by locking the
connection!). WaitDataIntr is not a const function so its not safe to
call without ensuring exclusive access. Even though it might look
like this should be safe since the function just waits, but it in fact
can write to some internal data (e.g. b_errno in case of an error)
which can definitely cause problems.

Connection::in_use is also very misleading. While it does not suffer
from the data race problem (as its an atomic value), its
interpretation does: If you read false from it, you do not actually know
whether some thread is using the connection (and has yet to update the
bool) or if the connection is actually unused.

All these problems and some more lead to the decision to rewrite this
code completely.

The basic idea is that the connection pool now is simply a vector of
connections protected by one lock. The connections itself do not have
a lock.

The locks are owned by the vector. The only way to interact with the
connections inside the pool is by locking the whole vector. This
eliminates all the problems above.

The connections itself are now also an raii type. They own the socket
they hold. That means that they will take care of closing/destroying
the socket once they leave the scope (similarly to a unique pointer).
mod - core/src/dird/fd_cmds.cc Diff File
mod - core/src/dird/fd_cmds.h Diff File
mod - core/src/dird/job.cc Diff File
mod - core/src/dird/socket_server.cc Diff File
mod - core/src/dird/ua_status.cc Diff File
mod - core/src/lib/connection_pool.cc Diff File
mod - core/src/lib/connection_pool.h Diff File