View Issue Details

IDProjectCategoryView StatusLast Update
0001082bareos-coredirectorpublic2019-07-10 17:45
Reporterjurgengoedbloed Assigned To 
PrioritynormalSeveritycrashReproducibilitysometimes
Status acknowledgedResolutionopen 
PlatformLinuxOSCentOSOS Version7
Product Version18.2.5 
Summary0001082: Bareos director crashes with segfault when restarting or reload from console
DescriptionAfter a config change, I reloaded the bareos director and it crashed with a segfault.

After I restart the Bareos director, it seems to run for about a minute but after then it crashes again. Sometimes almost directly, sometimes after a couple of minutes.

At startup, Bareos doesn't complain about a config error. After a while it just stops.

I already had the same issue in the past, but then managed to get the system up and running after waiting a considerable amount of time (> 1 day) and then restarting the director. It had then been running and backing up for over two weeks.

The director and storage daemon are running 18.2.5 all clients run 17.2.4 or 18.2.5, all runnining on Centos 7. The director and storage (both on the same machine) run on a fully patched Centos 7 machine.

I've had the same issue with the director on version 17.2.4 and a self-compiled 17.2.5

I suspect that it has to do with the fact that all clients use the 'client initiated connection' and something goes wrong as soon as clients reconnect after restart of the director. A race condition, lack of resources..?
Steps To ReproduceWhen the crash occurs:
- Start the bareos director
- Within a minute, the direct will crash again.
Additional InformationAs requested by Andreas, created this bug and attached the traceback file.
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

jurgengoedbloed

jurgengoedbloed

2019-04-30 14:23

reporter  

bareos-dir.1640.bactrace (1,079 bytes)   
Attempt to dump current JCRs. njcrs=2
threadid=0x0000007f8e268bb8 JobId=0 JobStatus=R jcr=0x12a4b58 name=*JobMonitor*.2019-04-30_13.10.42_01
threadid=0x6200007f8e268bb8 killable=0 JobId=0 JobStatus=R jcr=0x12a4b58 name=*JobMonitor*.2019-04-30_13.10.42_01
	UseCount=1
	JobType=I JobLevel= 
	sched_time=30-Apr-2019 13:10 start_time=30-Apr-2019 13:10
	end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
	db=(nil) db_batch=(nil) batch_started=0
threadid=0xf000007f8e13fff7 JobId=0 JobStatus=R jcr=0x7f8e0c0008e8 name=*StatisticsCollector*.2019-04-30_13.10.42_02
threadid=0x6600007f8e13fff7 killable=0 JobId=0 JobStatus=R jcr=0x7f8e0c0008e8 name=*StatisticsCollector*.2019-04-30_13.10.42_02
	UseCount=1
	JobType=I JobLevel= 
	sched_time=30-Apr-2019 13:10 start_time=30-Apr-2019 13:10
	end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
	db=0x7f8e0c0024e8 db_batch=(nil) batch_started=0
BareosDb=0x7f8e0c0024e8 db_name=bareos db_user=bareos connected=true
	cmd="cats/sql_create.cc:390 mediatype record File already exists
" changes=0
	RWLOCK=0x7f8e0c0024f0 w_active=0 w_wait=0
bareos-dir.1640.bactrace (1,079 bytes)   
arogge

arogge

2019-04-30 15:06

developer   ~0003349

Does the problem persist if you disable statistics collection?
jurgengoedbloed

jurgengoedbloed

2019-05-02 17:12

reporter   ~0003352

Yes. Statistics collection was already turned off.
The database tables 'devicestats' and 'jobstats' are also empty.
jurgengoedbloed

jurgengoedbloed

2019-05-02 17:25

reporter   ~0003353

To add to this...
The director had no statistics enabled.
The storage daemon has.
I have disabled it and restarted the storage daemon.
Then I restarted the director and it crashed again.

What I did then was the following:
Stop the storage daemon
start the director and monitor if it would keep running. It keeps on running
Then start the storage daemon
The director now keeps running.
Did a small test backup job: runs fine.
Tonight a batch of backup jobs will run, tomorrow I will let you know the outcome.
jurgengoedbloed

jurgengoedbloed

2019-05-03 09:13

reporter   ~0003354

All backups ran fine this night.
Is there anything I can test or try?
arogge

arogge

2019-05-03 09:58

developer   ~0003355

You can check if you have a meaningful 'traceback' file next to the bactrace you attached.
If you have gdb and the debug packages installed (no performance penalties) then a crash will produce a traceback file where we can see exactly in what function on what line the crash has happened. This helps us tracking down the crash a lot.
jurgengoedbloed

jurgengoedbloed

2019-05-03 10:12

reporter  

bareos.1640.traceback (1,657 bytes)   
Created /var/lib/bareos/bareos-dir.core.1640 for doing postmortem debugging

warning: the debug information found in "/usr/lib/debug//usr/sbin/bareos-dir.debug" does not match "/usr/sbin/bareos-dir" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/sbin/bareos-dir.debug" does not match "/usr/sbin/bareos-dir" (CRC mismatch).

[New LWP 1643]
[New LWP 1644]
[New LWP 1645]
[New LWP 1789]
[New LWP 1790]
[New LWP 1791]
[New LWP 1792]
[New LWP 1793]
[New LWP 1794]
[New LWP 1795]
[New LWP 1796]
[New LWP 1797]
[New LWP 1798]
[New LWP 1799]
[New LWP 1800]
[New LWP 1801]
[New LWP 1802]
[New LWP 1803]
[New LWP 1804]
[New LWP 1805]
[New LWP 1806]
[New LWP 1807]
[New LWP 1808]
[New LWP 1809]
[New LWP 1810]
[New LWP 1811]
[New LWP 1812]
[New LWP 1813]
[New LWP 1814]
[New LWP 1815]
[New LWP 1816]
[New LWP 1817]
[New LWP 1818]
[New LWP 1640]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: the debug information found in "/usr/lib/debug//usr/lib64/bareos/backends/libbareoscats-postgresql.so.debug" does not match "/usr/lib64/bareos/backends/libbareoscats-postgresql.so" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/lib64/bareos/backends/libbareoscats-postgresql.so.debug" does not match "/usr/lib64/bareos/backends/libbareoscats-postgresql.so" (CRC mismatch).

Core was generated by `/usr/sbin/bareos-dir'.
#0  0x00007f8e236e420d in poll () from /lib64/libc.so.6
$1 = 1701994850
$2 = 19234424
$3 = 19234488
/usr/lib/bareos/scripts/btraceback.gdb:4: Error in sourced command file:
No symbol table is loaded.  Use the "file" command.
bareos.1640.traceback (1,657 bytes)   
jurgengoedbloed

jurgengoedbloed

2019-05-03 10:12

reporter   ~0003356

Yes, I have. Here is the corresponding traceback file.
arogge

arogge

2019-05-03 10:16

developer   ~0003357

From the traceback file (it is a simple text file) it looks like your debug packages don't match the binary packages you've got installed. Could you check this?
jurgengoedbloed

jurgengoedbloed

2019-05-03 10:55

reporter   ~0003358

I installed from the bareos repository.
Noticed that the package bareos-debuginfo was still 18.2.4rc. Updated to 18.2.5 and restarted director and storage (they are on the same machine).
If that is what you meant, then the versions should be the same now.
jurgengoedbloed

jurgengoedbloed

2019-05-03 11:29

reporter   ~0003359

Here is a new traceback file
bareos.63319.traceback (12,707 bytes)   
Created /var/lib/bareos/bareos-dir.core.63319 for doing postmortem debugging
[New LWP 63321]
[New LWP 63323]
[New LWP 63325]
[New LWP 63333]
[New LWP 63374]
[New LWP 63375]
[New LWP 63376]
[New LWP 63377]
[New LWP 63378]
[New LWP 63379]
[New LWP 63380]
[New LWP 63381]
[New LWP 63382]
[New LWP 63383]
[New LWP 63384]
[New LWP 63385]
[New LWP 63319]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/bareos-dir'.
#0  0x00007f1739d3397d in accept () from /lib64/libpthread.so.0
$1 = '\000' <repeats 127 times>
$2 = 0x22d2e78 "bareos-dir"
$3 = 0x22d2eb8 "/usr/sbin/bareos-dir"
$4 = 0x232ef68 "PostgreSQL"
$5 = 0x7f173a68ea25 "18.2.5 (30 January 2019)"
$6 = 0x7f173a68ea0b "Linux-4.4.92-6.18-default"
$7 = 0x7f173a68ea04 "redhat"
$8 = 0x7f173a68ef68 "CentOS Linux release 7.6.1810 (Core) "
$9 = "prd-mgmt-bareosstore1", '\000' <repeats 234 times>
$10 = 0x7f173a68ef90 "redhat CentOS Linux release 7.6.1810 (Core) "
Environment variable "TestName" not defined.
#0  0x00007f1739d3397d in accept () from /lib64/libpthread.so.0
#1  0x00007f173a63b6bb in BnetThreadServerTcp (addr_list=addr_list@entry=0x242a818, max_clients=<optimized out>, sockfds=<optimized out>, client_wq=client_wq@entry=0x6d6180 <directordaemon::socket_workq>, nokeepalive=<optimized out>, HandleConnectionRequest=HandleConnectionRequest@entry=0x446c40 <directordaemon::HandleConnectionRequest(ConfigurationParser*, void*)>, config=<optimized out>, server_state=<optimized out>, server_state@entry=0x6d6160 <directordaemon::server_state>) at /usr/src/debug/bareos-18.2.5/src/lib/bnet_server_tcp.cc:356
#2  0x0000000000446c34 in directordaemon::connect_thread (arg=0x242a818) at /usr/src/debug/bareos-18.2.5/src/dird/socket_server.cc:132
#3  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f173b7a6880 (LWP 63319)):
#0  0x00007f1739d33e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f173a64a344 in Bmicrosleep (sec=sec@entry=60, usec=usec@entry=0) at /usr/src/debug/bareos-18.2.5/src/lib/bsys.cc:171
#2  0x000000000044cd0d in directordaemon::wait_for_next_job (one_shot_job_to_run=<optimized out>) at /usr/src/debug/bareos-18.2.5/src/dird/scheduler.cc:131
#3  0x000000000041dc81 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/bareos-18.2.5/src/dird/dird.cc:449

Thread 16 (Thread 0x7f16feffd700 (LWP 63385)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f16ff7fe700 (LWP 63384)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f16fffff700 (LWP 63383)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f171cff9700 (LWP 63382)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f171d7fa700 (LWP 63381)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f171e7fc700 (LWP 63380)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f171effd700 (LWP 63379)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f171f7fe700 (LWP 63378)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f16c3fff700 (LWP 63377)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f171dffb700 (LWP 63376)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f16a3fff700 (LWP 63375)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f172c830700 (LWP 63374)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a677a5e in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:210
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f172d832700 (LWP 63333)):
#0  0x00007f1739d34179 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f173a669624 in SignalHandler (sig=11) at /usr/src/debug/bareos-18.2.5/src/lib/signal.cc:241
#2  <signal handler called>
#3  Connection::check (this=this@entry=0xaaaaaaaaaaaaaaaa, timeout_data=timeout_data@entry=0) at /usr/src/debug/bareos-18.2.5/src/lib/connection_pool.cc:61
#4  0x00007f173a64e092 in ConnectionPool::cleanup (this=this@entry=0x2344718) at /usr/src/debug/bareos-18.2.5/src/lib/connection_pool.cc:140
#5  0x00007f173a64e17f in ConnectionPool::add (this=this@entry=0x2344718, connection=connection@entry=0x7f1724007aa8) at /usr/src/debug/bareos-18.2.5/src/lib/connection_pool.cc:156
#6  0x00007f173a64e29d in ConnectionPool::add_connection (this=this@entry=0x2344718, name=name@entry=0x7f172d831c90 "tst-civ-nominatim", fd_protocol_version=fd_protocol_version@entry=54, socket=socket@entry=0x7f1728038478, authenticated=authenticated@entry=true) at /usr/src/debug/bareos-18.2.5/src/lib/connection_pool.cc:168
#7  0x0000000000490d6f in directordaemon::HandleFiledConnection (connections=0x2344718, fd=fd@entry=0x7f1728038478, client_name=client_name@entry=0x7f172d831c90 "tst-civ-nominatim", fd_protocol_version=54) at /usr/src/debug/bareos-18.2.5/src/dird/fd_cmds.cc:1365
#8  0x0000000000446f4e in directordaemon::HandleConnectionRequest (config=0x22d43d0, arg=0x7f1728038478) at /usr/src/debug/bareos-18.2.5/src/dird/socket_server.cc:117
#9  0x00007f173a677b9d in workq_server (arg=0x6d6180 <directordaemon::socket_workq>) at /usr/src/debug/bareos-18.2.5/src/lib/workq.cc:232
#10 0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f171ffff700 (LWP 63325)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000044de37 in wait_for_next_run () at /usr/src/debug/bareos-18.2.5/src/dird/stats.cc:110
#2  directordaemon::statistics_thread (arg=<optimized out>) at /usr/src/debug/bareos-18.2.5/src/dird/stats.cc:293
#3  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f172d031700 (LWP 63323)):
#0  0x00007f1739d30d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f173a6771b7 in watchdog_thread (arg=<optimized out>) at /usr/src/debug/bareos-18.2.5/src/lib/watchdog.cc:313
#2  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f17385d9ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f172e033700 (LWP 63321)):
#0  0x00007f1739d3397d in accept () from /lib64/libpthread.so.0
#1  0x00007f173a63b6bb in BnetThreadServerTcp (addr_list=addr_list@entry=0x242a818, max_clients=<optimized out>, sockfds=<optimized out>, client_wq=client_wq@entry=0x6d6180 <directordaemon::socket_workq>, nokeepalive=<optimized out>, HandleConnectionRequest=HandleConnectionRequest@entry=0x446c40 <directordaemon::HandleConnectionRequest(ConfigurationParser*, void*)>, config=<optimized out>, server_state=<optimized out>, server_state@entry=0x6d6160 <directordaemon::server_state>) at /usr/src/debug/bareos-18.2.5/src/lib/bnet_server_tcp.cc:356
#2  0x0000000000446c34 in directordaemon::connect_thread (arg=0x242a818) at /usr/src/debug/bareos-18.2.5/src/dird/socket_server.cc:132
#3  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f17385d9ead in clone () from /lib64/libc.so.6
#0  0x00007f1739d3397d in accept () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f173a63b6bb in BnetThreadServerTcp (addr_list=addr_list@entry=0x242a818, max_clients=<optimized out>, sockfds=<optimized out>, client_wq=client_wq@entry=0x6d6180 <directordaemon::socket_workq>, nokeepalive=<optimized out>, HandleConnectionRequest=HandleConnectionRequest@entry=0x446c40 <directordaemon::HandleConnectionRequest(ConfigurationParser*, void*)>, config=<optimized out>, server_state=<optimized out>, server_state@entry=0x6d6160 <directordaemon::server_state>) at /usr/src/debug/bareos-18.2.5/src/lib/bnet_server_tcp.cc:356
356	               newsockfd = accept(fd_ptr->fd, &cli_addr, &clilen);
cnt = <optimized out>
cli_addr = {sa_family = 2, sa_data = "\334\374\271\252\a\214\000\000\000\000\000\000\000"}
tlog = <optimized out>
value = 1
ipaddr = 0x0
newsockfd = <optimized out>
clilen = 16
fd_ptr = 0x7f172e0321b0
events = 195
pfds = 0x7f172e032190
status = <optimized out>
buf = "185.170.7.250", '\000' <repeats 114 times>
allbuf = '\000' <repeats 1080 times>...
cleanup_object = {sockfds_ = <optimized out>, client_wq_ = 0x6d6180 <directordaemon::socket_workq>}
next = <optimized out>
to_free = <optimized out>
nfds = <optimized out>
#2  0x0000000000446c34 in directordaemon::connect_thread (arg=0x242a818) at /usr/src/debug/bareos-18.2.5/src/dird/socket_server.cc:132
132	                      HandleConnectionRequest, my_config, &server_state);
No locals.
#3  0x00007f1739d2cdd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007f17385d9ead in clone () from /lib64/libc.so.6
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
bareos.63319.traceback (12,707 bytes)   
arogge

arogge

2019-05-03 11:58

developer   ~0003360

From what I see you're right: you're using client initiated connection and something is wrong with the connection-pool.
When the client 'tst-civ-nominatim' connects to the director the director then should add that connection to the connection pool. However, it looks like the connection pool had already been destroyed at this point.
Is this reproducible with just one client?
I will have to reproduce it myself so we can write a test for this, and I would be glad if there was a simple way to reproduce it.
jurgengoedbloed

jurgengoedbloed

2019-05-03 17:10

reporter   ~0003361

I will test this after the weekend and let you know.
jurgengoedbloed

jurgengoedbloed

2019-05-09 15:34

reporter   ~0003363

I did some tests, but at the moment the director is running find and I cannot reproduce the crash.
I'll let you know once the director crashes again.
jurgengoedbloed

jurgengoedbloed

2019-05-15 08:29

reporter   ~0003369

I got another crash.
Disabled access from all filedaemons by blocking them with iptables, except for one host.
In this situation, the director keeps on running.
jurgengoedbloed

jurgengoedbloed

2019-05-15 08:42

reporter   ~0003370

After a minute or so, I removed the iptables block rule. All clients are now connected, the director now seems to run fine.
jurgengoedbloed

jurgengoedbloed

2019-05-24 15:03

reporter   ~0003381

It seems that we have crossed a threshold in the number of clients.

I block access to the director except for a small number of clients (<30).
Start the director -> runs fine and shows client initiated connection clients.
As soon as I remove the blockage, the director crashes.

To rule out 'bad clients', I have tried to block different parts of the network, but no solution.

The only succes I'm now having is this:
- Block all clients
- Stop storage daemon (runs on the same machine)
- Start director
- Allow clients subnet by subnet
- Remove last blockage
- Start storage.

Anything I can do to test?

Issue History

Date Modified Username Field Change
2019-04-30 14:23 jurgengoedbloed New Issue
2019-04-30 14:23 jurgengoedbloed File Added: bareos-dir.1640.bactrace
2019-04-30 15:06 arogge Assigned To => arogge
2019-04-30 15:06 arogge Status new => feedback
2019-04-30 15:06 arogge Note Added: 0003349
2019-05-02 17:12 jurgengoedbloed Note Added: 0003352
2019-05-02 17:12 jurgengoedbloed Status feedback => assigned
2019-05-02 17:25 jurgengoedbloed Note Added: 0003353
2019-05-03 09:13 jurgengoedbloed Note Added: 0003354
2019-05-03 09:58 arogge Note Added: 0003355
2019-05-03 10:01 arogge Status assigned => feedback
2019-05-03 10:12 jurgengoedbloed File Added: bareos.1640.traceback
2019-05-03 10:12 jurgengoedbloed Note Added: 0003356
2019-05-03 10:12 jurgengoedbloed Status feedback => assigned
2019-05-03 10:16 arogge Status assigned => feedback
2019-05-03 10:16 arogge Note Added: 0003357
2019-05-03 10:55 jurgengoedbloed Note Added: 0003358
2019-05-03 10:55 jurgengoedbloed Status feedback => assigned
2019-05-03 11:29 jurgengoedbloed File Added: bareos.63319.traceback
2019-05-03 11:29 jurgengoedbloed Note Added: 0003359
2019-05-03 11:58 arogge Status assigned => acknowledged
2019-05-03 11:58 arogge Note Added: 0003360
2019-05-03 17:10 jurgengoedbloed Note Added: 0003361
2019-05-09 15:34 jurgengoedbloed Note Added: 0003363
2019-05-15 08:29 jurgengoedbloed Note Added: 0003369
2019-05-15 08:42 jurgengoedbloed Note Added: 0003370
2019-05-24 15:03 jurgengoedbloed Note Added: 0003381
2019-07-10 17:45 arogge Assigned To arogge =>