View Issue Details

IDProjectCategoryView StatusLast Update
0000414bareos-corestorage daemonpublic2019-12-18 15:25
Reporterscottles Assigned To 
PrioritynormalSeveritycrashReproducibilityrandom
Status closedResolutionfixed 
PlatformLinuxOSDebianOS Version7
Product Version14.2.2 
Summary0000414: Bareos storage daemon crashes during backups
DescriptionSince upgrading to Bareos 14.2.2-46.1 from 13.2.3-33.1, on the 13th of January, the storage daemon on our file storage server crashes almost every night whilst backups or copy migrations are running, preventing nightly backups from completing.

The log file for the storage daemon outputs a message along the following lines:

28-Jan 02:57 bareos-storage: ABORTING due to ERROR in smartall.c:209
Overrun buffer: len=129 addr=15ef608 allocated: vol_mgr.c:896 called from dlist.c:326


The file volumes are are stored on a btrfs volume (we are using upstream 3.14.26 kernel compiled from kernel.org). Usually once a week a copy migration job runs to migrate the latest file based backups to tape storage, however this process also triggers the storage daemon on our file storage server to crash.
Steps To ReproduceIt appears to occur randomly during our routine nightly file based backups.
Additional InformationI have included the log from the storage daemon. If you need any further logs or information I will be happy to provide as needed.
TagsNo tags attached.

Relationships

related to 0000522 closedstephand storage daemon crashes ocassionally when starting a new job 
related to 0000552 closed SD crashes in -current 
related to 0000483 closedpstorz bareos-sd crash during backup 
child of 0000474 closed Release bareos-14.2.6 

Activities

scottles

scottles

2015-01-28 18:56

reporter  

bareos.log (12,349 bytes)   
26-Jan 13:29 bareos-storage JobId 16312: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes).
26-Jan 13:32 bareos-storage JobId 16312: Forward spacing Volume "0062" to file:block 4:2905371111.
26-Jan 14:33 bareos-storage JobId 16312: End of Volume at file 24 on device "FileStorage1" (/filestorage/volumes), Volume "0062"
26-Jan 14:33 bareos-storage JobId 16312: Ready to read from volume "0055" on device "FileStorage1" (/filestorage/volumes).
26-Jan 14:33 bareos-storage JobId 16312: Forward spacing Volume "0055" to file:block 0:1048784.
26-Jan 15:09 bareos-storage: ABORTING due to ERROR in smartall.c:209
Overrun buffer: len=129 addr=17d70c8 allocated: vol_mgr.c:277 called from vol_mgr.c:306
26-Jan 18:46 bareos-storage JobId 16372: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes).
26-Jan 19:03 bareos-storage JobId 16372: Forward spacing Volume "0062" to file:block 4:2871816679.
26-Jan 19:05 bareos-storage JobId 16374: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes).
26-Jan 19:05 bareos-storage JobId 16374: Forward spacing Volume "0062" to file:block 4:2831970791.
26-Jan 19:07 bareos-storage JobId 16376: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes).
26-Jan 19:07 bareos-storage JobId 16376: Forward spacing Volume "0038" to file:block 0:64719.
26-Jan 19:09 bareos-storage JobId 16378: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes).
26-Jan 19:09 bareos-storage JobId 16378: Forward spacing Volume "0038" to file:block 0:207.
26-Jan 19:11 bareos-storage: ABORTING due to ERROR in smartall.c:209
Overrun buffer: len=129 addr=ce56a8 allocated: vol_mgr.c:277 called from vol_mgr.c:306
27-Jan 11:45 bareos-storage JobId 16516: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes).
27-Jan 11:45 bareos-storage JobId 16516: Forward spacing Volume "0062" to file:block 4:2871816679.
27-Jan 11:48 bareos-storage JobId 16518: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes).
27-Jan 11:48 bareos-storage JobId 16518: Forward spacing Volume "0062" to file:block 4:2831970791.
27-Jan 11:50 bareos-storage JobId 16520: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes).
27-Jan 11:50 bareos-storage JobId 16520: Forward spacing Volume "0038" to file:block 0:64719.
27-Jan 22:00 bareos-storage JobId 16553: Volume "0053" previously written, moving to end of data.
27-Jan 22:00 bareos-storage JobId 16553: Warning: For Volume "0053":
The sizes do not match! Volume=103332549838 Catalog=103307384014
Correcting Catalog
27-Jan 22:00 bareos-storage JobId 16551: Volume "0046" previously written, moving to end of data.
27-Jan 22:00 bareos-storage JobId 16551: Warning: For Volume "0046":
The sizes do not match! Volume=14670917709 Catalog=14387802195
Correcting Catalog
27-Jan 22:00 bareos-storage JobId 16557: Volume "0096" previously written, moving to end of data.
27-Jan 22:00 bareos-storage JobId 16557: Warning: For Volume "0096":
The sizes do not match! Volume=28462383680 Catalog=28023030336
Correcting Catalog
27-Jan 22:00 bareos-storage JobId 16553: Elapsed time=00:00:53, Transfer rate=8.926 M Bytes/second
27-Jan 22:00 bareos-storage JobId 16553: Sending spooled attrs to the Director. Despooling 8,033,467 bytes ...
27-Jan 22:01 bareos-storage JobId 16561: Elapsed time=00:01:36, Transfer rate=3.810 M Bytes/second
27-Jan 22:01 bareos-storage JobId 16561: Sending spooled attrs to the Director. Despooling 1,143,180 bytes ...
27-Jan 22:04 bareos-storage JobId 16558: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes).
27-Jan 22:04 bareos-storage JobId 16558: End of medium on Volume "0053" Bytes=107,373,990,340 Blocks=102,381 at 27-Jan-2015 22:04.
27-Jan 22:20 bareos-storage JobId 16558: Recycled volume "0058" on device "FileStorage1" (/filestorage/volumes), all previous data lost.
27-Jan 22:20 bareos-storage JobId 16558: New volume "0058" mounted on device "FileStorage1" (/filestorage/volumes) at 27-Jan-2015 22:20.
27-Jan 22:23 bareos-storage JobId 16554: Elapsed time=00:23:15, Transfer rate=2.530 M Bytes/second
27-Jan 22:23 bareos-storage JobId 16554: Sending spooled attrs to the Director. Despooling 36,091,410 bytes ...
27-Jan 22:47 bareos-storage JobId 16559: Elapsed time=00:24:06, Transfer rate=7.633 M Bytes/second
27-Jan 22:47 bareos-storage JobId 16559: Sending spooled attrs to the Director. Despooling 26,338,623 bytes ...
27-Jan 22:49 bareos-storage JobId 16552: Elapsed time=00:49:21, Transfer rate=11.04 M Bytes/second
27-Jan 22:49 bareos-storage JobId 16552: Sending spooled attrs to the Director. Despooling 9,153,557 bytes ...
27-Jan 22:54 bareos-storage JobId 16563: Elapsed time=00:54:12, Transfer rate=1.627 M Bytes/second
27-Jan 22:54 bareos-storage JobId 16563: Sending spooled attrs to the Director. Despooling 445,297,833 bytes ...
27-Jan 22:57 bareos-storage JobId 16568: Elapsed time=00:33:38, Transfer rate=10.13 M Bytes/second
27-Jan 22:57 bareos-storage JobId 16568: Sending spooled attrs to the Director. Despooling 13,976,559 bytes ...
27-Jan 23:00 bareos-storage JobId 16564: Elapsed time=00:10:41, Transfer rate=2.705 M Bytes/second
27-Jan 23:00 bareos-storage JobId 16564: Sending spooled attrs to the Director. Despooling 12,219,655 bytes ...
27-Jan 23:00 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes).
27-Jan 23:00 bareos-storage JobId 16562: End of medium on Volume "0046" Bytes=107,374,025,657 Blocks=102,133 at 27-Jan-2015 23:00.
27-Jan 23:01 bareos-storage JobId 16562: Recycled volume "0007" on device "FileStorage2" (/filestorage/volumes), all previous data lost.
27-Jan 23:01 bareos-storage JobId 16562: New volume "0007" mounted on device "FileStorage2" (/filestorage/volumes) at 27-Jan-2015 23:01.
27-Jan 23:10 bareos-storage JobId 16569: Elapsed time=00:22:19, Transfer rate=2.785 M Bytes/second
27-Jan 23:10 bareos-storage JobId 16569: Sending spooled attrs to the Director. Despooling 15,894,901 bytes ...
27-Jan 23:56 bareos-storage JobId 16566: Elapsed time=00:46:37, Transfer rate=5.636 M Bytes/second
27-Jan 23:56 bareos-storage JobId 16566: Sending spooled attrs to the Director. Despooling 54,140,188 bytes ...
28-Jan 00:01 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes).
28-Jan 00:01 bareos-storage JobId 16562: End of medium on Volume "0007" Bytes=107,373,139,932 Blocks=102,400 at 28-Jan-2015 00:01.
28-Jan 00:03 bareos-storage JobId 16562: Recycled volume "0036" on device "FileStorage2" (/filestorage/volumes), all previous data lost.
28-Jan 00:03 bareos-storage JobId 16562: New volume "0036" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 00:03.
28-Jan 00:15 bareos-storage JobId 16551: Elapsed time=02:15:21, Transfer rate=4.592 M Bytes/second
28-Jan 00:15 bareos-storage JobId 16551: Sending spooled attrs to the Director. Despooling 56,252,739 bytes ...
28-Jan 00:24 bareos-storage JobId 16560: Elapsed time=02:22:13, Transfer rate=2.871 M Bytes/second
28-Jan 00:24 bareos-storage JobId 16560: Sending spooled attrs to the Director. Despooling 117,011,081 bytes ...
28-Jan 00:30 bareos-storage JobId 16576: Elapsed time=00:33:39, Transfer rate=2.690 M Bytes/second
28-Jan 00:30 bareos-storage JobId 16576: Sending spooled attrs to the Director. Despooling 36,243,056 bytes ...
28-Jan 00:48 bareos-storage JobId 16579: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes).
28-Jan 00:48 bareos-storage JobId 16579: End of medium on Volume "0058" Bytes=107,373,290,049 Blocks=102,401 at 28-Jan-2015 00:48.
28-Jan 00:48 bareos-storage JobId 16579: Recycled volume "0104" on device "FileStorage1" (/filestorage/volumes), all previous data lost.
28-Jan 00:48 bareos-storage JobId 16579: New volume "0104" mounted on device "FileStorage1" (/filestorage/volumes) at 28-Jan-2015 00:48.
28-Jan 01:06 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes).
28-Jan 01:06 bareos-storage JobId 16562: End of medium on Volume "0036" Bytes=107,373,238,407 Blocks=102,400 at 28-Jan-2015 01:06.
28-Jan 01:07 bareos-storage JobId 16562: Recycled volume "0071" on device "FileStorage2" (/filestorage/volumes), all previous data lost.
28-Jan 01:07 bareos-storage JobId 16562: New volume "0071" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 01:07.
28-Jan 01:36 bareos-storage JobId 16565: Elapsed time=01:05:24, Transfer rate=2.499 M Bytes/second
28-Jan 01:36 bareos-storage JobId 16565: Sending spooled attrs to the Director. Despooling 282,891,508 bytes ...
28-Jan 01:59 bareos-storage JobId 16575: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes).
28-Jan 01:59 bareos-storage JobId 16575: End of medium on Volume "0071" Bytes=107,373,179,552 Blocks=102,400 at 28-Jan-2015 01:59.
28-Jan 02:00 bareos-storage JobId 16575: Recycled volume "0022" on device "FileStorage2" (/filestorage/volumes), all previous data lost.
28-Jan 02:00 bareos-storage JobId 16575: New volume "0022" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 02:00.
28-Jan 02:04 bareos-storage JobId 16575: Elapsed time=01:48:12, Transfer rate=8.891 M Bytes/second
28-Jan 02:04 bareos-storage JobId 16575: Sending spooled attrs to the Director. Despooling 84,680,155 bytes ...
28-Jan 02:07 bareos-storage JobId 16581: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes).
28-Jan 02:07 bareos-storage JobId 16581: End of medium on Volume "0104" Bytes=107,373,133,951 Blocks=102,399 at 28-Jan-2015 02:07.
28-Jan 02:08 bareos-storage JobId 16581: Recycled volume "0028" on device "FileStorage1" (/filestorage/volumes), all previous data lost.
28-Jan 02:08 bareos-storage JobId 16581: New volume "0028" mounted on device "FileStorage1" (/filestorage/volumes) at 28-Jan-2015 02:08.
28-Jan 02:09 bareos-storage JobId 16562: Elapsed time=04:06:41, Transfer rate=8.062 M Bytes/second
28-Jan 02:09 bareos-storage JobId 16562: Sending spooled attrs to the Director. Despooling 344,122,159 bytes ...
28-Jan 02:21 bareos-storage JobId 16556: Elapsed time=04:21:32, Transfer rate=4.728 M Bytes/second
28-Jan 02:21 bareos-storage JobId 16556: Sending spooled attrs to the Director. Despooling 285,120,110 bytes ...
28-Jan 02:26 bareos-storage JobId 16557: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage3" (/filestorage/volumes).
28-Jan 02:26 bareos-storage JobId 16557: End of medium on Volume "0096" Bytes=107,374,019,002 Blocks=101,982 at 28-Jan-2015 02:26.
28-Jan 02:27 bareos-storage JobId 16557: Recycled volume "0014" on device "FileStorage3" (/filestorage/volumes), all previous data lost.
28-Jan 02:27 bareos-storage JobId 16557: New volume "0014" mounted on device "FileStorage3" (/filestorage/volumes) at 28-Jan-2015 02:27.
28-Jan 02:29 bareos-storage JobId 16573: Elapsed time=00:24:55, Transfer rate=5.192 M Bytes/second
28-Jan 02:29 bareos-storage JobId 16573: Sending spooled attrs to the Director. Despooling 82,710,650 bytes ...
28-Jan 02:40 bareos-storage JobId 16558: Elapsed time=04:23:36, Transfer rate=5.757 M Bytes/second
28-Jan 02:40 bareos-storage JobId 16558: Sending spooled attrs to the Director. Despooling 478,272,299 bytes ...
28-Jan 02:55 bareos-storage JobId 16580: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes).
28-Jan 02:55 bareos-storage JobId 16580: End of medium on Volume "0022" Bytes=107,373,414,339 Blocks=102,401 at 28-Jan-2015 02:55.
28-Jan 02:57 bareos-storage: ABORTING due to ERROR in smartall.c:209
Overrun buffer: len=129 addr=15ef608 allocated: vol_mgr.c:896 called from dlist.c:326
28-Jan 02:57 bareos-storage JobId 16580: Recycled volume "0015" on device "FileStorage2" (/filestorage/volumes), all previous data lost.
28-Jan 02:57 bareos-storage JobId 16580: New volume "0015" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 02:57.
bareos.log (12,349 bytes)   
pstorz

pstorz

2015-01-29 10:11

administrator   ~0001231

Hello scottles,

thank you for submitting this bugreport. Now we need to catch the bug.

In order to do this I would like to ask you to do the following:

Can you please install the packages "bareos-dbg" and "gdb"?
This will enable us to get backtraces of the crashed daemon.

This should produce backtraces when the next crash happens, and that will hopefully give us a bit more information when this problem happens.

Please provide the info that the backtrace produces when it happens.


Thank you very much

Philipp
scottles

scottles

2015-01-29 12:12

reporter   ~0001233

Hi Philipp,

I have installed bareos-dbg and gdb. I assume the debugging output just goes in to the mail spool?
pstorz

pstorz

2015-01-29 12:30

administrator   ~0001234

Hello,

the output goes to /var/lib/bareos. There will be the .traceback file and the .core files:



Here is an example of a trace ouput:
============================
Created /var/lib/bareos/bareos-dir.core.5118 for doing postmortem debugging
[New LWP 5120]
[New LWP 5122]
[New LWP 5123]
[New LWP 5124]
[New LWP 5118]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/bareos-dir'.
#0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
$1 = 0x6987c0 "ci-Debian-7-0-64-918a1ff57eeaf1d1-dir"
$2 = 0x1767088 "bareos-dir"
$3 = 0x17670c8 "/usr/sbin/bareos-dir"
$4 = 0x179fa18 "PostgreSQL"
$5 = 0x7fbcb3650d55 "14.2.2 (12 December 2014)"
$6 = 0x7fbcb3650d41 "x86_64-pc-linux-gnu"
$7 = 0x7fbcb3650d3a "debian"
$8 = 0x7fbcb3650d1c "Debian GNU/Linux 7.0 (wheezy)"
$9 = "ci-Debian-7-0-64-918a1ff57eeaf1d1", '\000' <repeats 222 times>
$10 = 0x7fbcb36512b8 "debian Debian GNU/Linux 7.0 (wheezy)"
Environment variable "TestName" not defined.
#0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567
0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000005 0x0000000000000000 in ?? ()

Thread 5 (LWP 5118):
#0 0x00007fbcb2752c1d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007fbcb3640d01 in signal_handler (sig=11) at signal.c:228
0000002 <signal handler called>
0000003 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
0000004 0x00007fbcb3622c59 in bmicrosleep (sec=60, usec=0) at bsys.c:100
0000005 0x000000000044097e in wait_for_next_job (one_shot_job_to_run=<optimized out>) at scheduler.c:124
0000006 0x000000000040ed37 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:375

Thread 4 (LWP 5124):
#0 0x00007fbcb274f64b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007fbcb3633210 in bthread_cond_timedwait_p (cond=0x699620, m=0x6995e0, abstime=0x7fbcacc18d10, file=0x47b05d "stats.c", line=117) at lockmgr.c:817
0000002 0x0000000000440f9a in wait_for_next_run () at stats.c:117
0000003 statistics_thread_runner (arg=arg@entry=0x0) at stats.c:290
0000004 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1799178) at lockmgr.c:932
0000005 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000006 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000007 0x0000000000000000 in ?? ()

Thread 3 (LWP 5123):
#0 0x00007fbcb274f64b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007fbcb3633210 in bthread_cond_timedwait_p (cond=0x7fbcb3863680, m=0x7fbcb3863640, abstime=0x7fbcad419de0, file=0x7fbcb3655152 "watchdog.c", line=313) at lockmgr.c:817
0000002 0x00007fbcb364a40a in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313
0000003 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1795ff8) at lockmgr.c:932
0000004 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000005 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000006 0x0000000000000000 in ?? ()

Thread 2 (LWP 5122):
#0 0x00007fbcb16d8223 in poll () from /lib/x86_64-linux-gnu/libc.so.6
0000001 0x00007fbcb3617933 in bnet_thread_server_tcp (addr_list=0x7fbcadc19f30, max_clients=20, sockfds=0x1798048, client_wq=0x6996a0, nokeepalive=false, handle_client_request=0x460790 <handle_UA_client_request(void*)>) at bnet_server_tcp.c:298
0000002 0x000000000046037f in connect_thread (arg=arg@entry=0x1767f88) at ua_server.c:92
0000003 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1796e28) at lockmgr.c:932
0000004 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000005 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000006 0x0000000000000000 in ?? ()

Thread 1 (LWP 5120):
#0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567
0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000005 0x0000000000000000 in ?? ()
#0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
100 bsys.c: No such file or directory.
timeout = {tv_sec = 30, tv_nsec = 0}
tv = {tv_sec = 0, tv_usec = 140448440198644}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
status = <optimized out>
0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567
567 lockmgr.c: No such file or directory.
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 2602541748363333316, 140448424612256, 140448402151872, 140448458272832, 3, -2637007306732990780, -2637012440356090172}, __mask_was_saved = 0}}, __pad = {0x7fbcb11e8f60, 0x0, 0x18, 0x7fbcb11e8f60}}
__cancel_routine = 0x7fbcb3632b50 <cln_hdl(void*)>
__not_first_call = <optimized out>
old = 0
0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
0000005 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
----- END of /var/lib/bareos/bareos.5118.traceback ----
----- checking for /usr/bin/bareos-dir in traceback ----
Created /var/lib/bareos/bareos-dir.core.5118 for doing postmortem debugging
Core was generated by `/usr/sbin/bareos-dir'.
$1 = 0x6987c0 "ci-Debian-7-0-64-918a1ff57eeaf1d1-dir"
$2 = 0x1767088 "bareos-dir"
$3 = 0x17670c8 "/usr/sbin/bareos-dir"
0000006 0x000000000040ed37 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:375
100 bsys.c: No such file or directory.
567 lockmgr.c: No such file or directory.

=======================



scottles

scottles

2015-01-29 12:44

reporter   ~0001235

Great, thanks. I will provide a traceback and core files the next time it crashes (I suspect it will crash sometime over the weekend).
scottles

scottles

2015-02-03 12:15

reporter   ~0001253

Hi, last night the bareos-sd daemon crashed, but the traceback file is less than helpful and no core file was created. It contained the following:

root@bareos-storage:/var/lib/bareos# cat bareos.3315.traceback
Created /var/lib/bareos/bareos-sd.core.3315 for doing postmortem debugging
ptrace: Operation not permitted.
/var/lib/bareos/3315: No such file or directory.
$1 = 0
/usr/lib/bareos/scripts/btraceback.gdb:2: Error in sourced command file:
No symbol "exename" in current context.


Googling around suggests I need to ensure /usr/bin/gdb has the setuid flag set as gdb needs to run as root. I have set the setuid bit on gdb and will have to wait for bareos-sd to crash again.
scottles

scottles

2015-02-04 11:45

reporter   ~0001256

Success! Here is the traceback:

Created /var/lib/bareos/bareos-sd.core.10735 for doing postmortem debugging
[New LWP 10736]
[New LWP 10739]
[New LWP 11434]
[New LWP 11435]
[New LWP 11436]
[New LWP 11439]
[New LWP 13094]
[New LWP 13146]
[New LWP 13164]
[New LWP 13172]
[New LWP 13215]
[New LWP 13293]
[New LWP 13957]
[New LWP 14035]
[New LWP 14072]
[New LWP 14092]
[New LWP 10735]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff0c00000
Core was generated by `/usr/sbin/bareos-sd'.
#0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
$1 = 0x626220 "bareos-storage"
$2 = 0x1068058 "bareos-sd"
$3 = 0x1068098 "/usr/sbin/bareos-sd"
$4 = 0x0
$5 = 0x7f9c64035d55 "14.2.2 (12 December 2014)"
$6 = 0x7f9c64035d41 "x86_64-pc-linux-gnu"
$7 = 0x7f9c64035d3a "debian"
$8 = 0x7f9c64035d1c "Debian GNU/Linux 7.0 (wheezy)"
$9 = "bareos-storage", '\000' <repeats 241 times>
$10 = 0x7f9c640362b8 "debian Debian GNU/Linux 7.0 (wheezy)"
Environment variable "TestName" not defined.
#0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567
0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000005 0x0000000000000000 in ?? ()

Thread 17 (LWP 10735):
#0 0x00007f9c62091ac3 in poll () from /lib/x86_64-linux-gnu/libc.so.6
0000001 0x00007f9c63ffc933 in bnet_thread_server_tcp (addr_list=0x7ffff0a7b4c0, max_clients=41, sockfds=0x108fe48, client_wq=0x626600, nokeepalive=false, handle_client_request=0x410f60 <handle_connection_request(void*)>) at bnet_server_tcp.c:298
0000002 0x0000000000409021 in main (argc=<optimized out>, argv=<optimized out>) at stored.c:299

Thread 16 (LWP 14092):
#0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64022e86 in rwl_writelock_p (rwl=0x7f9c646b28a0, file=<optimized out>, line=<optimized out>) at rwlock.c:236
0000002 0x00007f9c64490a51 in _lock_reservations (file=file@entry=0x7f9c644a6685 "reserve.c", line=line@entry=354) at reserve.c:104
0000003 0x00007f9c644933e7 in use_device_cmd (jcr=0x13b75f8) at reserve.c:354
0000004 use_cmd (jcr=0x13b75f8) at reserve.c:67
0000005 0x000000000041136f in handle_director_connection (dir=0x7f9c5c30f7e8, job_name=<optimized out>) at dir_cmd.c:281
0000006 handle_connection_request (arg=0x7f9c5c30f7e8) at dir_cmd.c:362
0000007 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000008 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c009248) at lockmgr.c:932
0000009 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000010 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000011 0x0000000000000000 in ?? ()

Thread 15 (LWP 14072):
#0 0x00007f9c63126c8d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64025d01 in signal_handler (sig=11) at signal.c:228
0000002 <signal handler called>
0000003 e_msg (file=<optimized out>, line=<optimized out>, type=<optimized out>, level=<optimized out>, fmt=0x7f9c640388e8 "Overrun buffer: len=%d addr=%p allocated: %s:%d called from %s:%d\n") at message.c:1403
0000004 0x00007f9c64026a81 in sm_free (file=0x7f9c6403440c "dlist.c", line=326, fp=0x10b2e08) at smartall.c:209
0000005 0x00007f9c64010cb2 in dlist::destroy (this=0x10a74a8) at dlist.c:326
0000006 0x00007f9c6449b894 in ~dlist (this=<optimized out>, __in_chrg=<optimized out>) at ../lib/dlist.h:62
0000007 free_volume_list () at vol_mgr.c:788
0000008 0x00007f9c6449d705 in free_temp_vol_list (temp_vol_list=0x10a74a8) at vol_mgr.c:923
0000009 0x00007f9c64492c67 in find_suitable_device_for_job (jcr=jcr@entry=0x11b5f28, rctx=...) at reserve.c:506
0000010 0x00007f9c6449352b in use_device_cmd (jcr=0x11b5f28) at reserve.c:330
0000011 use_cmd (jcr=0x11b5f28) at reserve.c:67
0000012 0x000000000041136f in handle_director_connection (dir=0x7f9c5c415fa8, job_name=<optimized out>) at dir_cmd.c:281
0000013 handle_connection_request (arg=0x7f9c5c415fa8) at dir_cmd.c:362
0000014 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000015 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c412098) at lockmgr.c:932
0000016 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000017 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()

Thread 14 (LWP 14035):
#0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832
0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387
0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370
0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x10b5078) at block.c:368
0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x10b5078) at record.c:470
0000006 0x0000000000409a7c in do_append_data (jcr=0x10949e8, bs=0x7f9c5c4138a8, what=0x41f780 "FD") at append.c:222
0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10949e8) at fd_cmds.c:269
0000008 0x0000000000412121 in do_fd_commands (jcr=0x10949e8) at fd_cmds.c:225
0000009 0x0000000000412322 in run_job (jcr=0x10949e8) at fd_cmds.c:181
0000010 0x0000000000412d45 in do_job_run (jcr=0x10949e8) at job.c:237
0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c310098, job_name=<optimized out>) at dir_cmd.c:281
0000012 handle_connection_request (arg=0x7f9c5c310098) at dir_cmd.c:362
0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c412098) at lockmgr.c:932
0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000017 0x0000000000000000 in ?? ()

Thread 13 (LWP 13957):
#0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832
0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387
0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370
0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x109de48) at block.c:368
0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x109de48) at record.c:470
0000006 0x0000000000409a7c in do_append_data (jcr=0x10a6fe8, bs=0x7f9c5c413d48, what=0x41f780 "FD") at append.c:222
0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10a6fe8) at fd_cmds.c:269
0000008 0x0000000000412121 in do_fd_commands (jcr=0x10a6fe8) at fd_cmds.c:225
0000009 0x0000000000412322 in run_job (jcr=0x10a6fe8) at fd_cmds.c:181
0000010 0x0000000000412d45 in do_job_run (jcr=0x10a6fe8) at job.c:237
0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c4141a8, job_name=<optimized out>) at dir_cmd.c:281
0000012 handle_connection_request (arg=0x7f9c5c4141a8) at dir_cmd.c:362
0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c006668) at lockmgr.c:932
0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000017 0x0000000000000000 in ?? ()

Thread 12 (LWP 13293):
#0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64022e86 in rwl_writelock_p (rwl=0x7f9c646b28a0, file=<optimized out>, line=<optimized out>) at rwlock.c:236
0000002 0x00007f9c64490a51 in _lock_reservations (file=file@entry=0x7f9c644a6685 "reserve.c", line=line@entry=354) at reserve.c:104
0000003 0x00007f9c644933e7 in use_device_cmd (jcr=0x108e308) at reserve.c:354
0000004 use_cmd (jcr=0x108e308) at reserve.c:67
0000005 0x000000000041136f in handle_director_connection (dir=0x7f9c5c412968, job_name=<optimized out>) at dir_cmd.c:281
0000006 handle_connection_request (arg=0x7f9c5c412968) at dir_cmd.c:362
0000007 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000008 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932
0000009 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000010 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000011 0x0000000000000000 in ?? ()

Thread 11 (LWP 13215):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c50fe882c "\234\177", this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c0067e8, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890
0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c0067e8) at bsock_tcp.c:478
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c0067e8) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x1097b68, bs=0x7f9c5c0067e8, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x1097b68) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x1097b68) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x1097b68) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x1097b68) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c413768, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c413768) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 10 (LWP 13172):
#0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832
0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387
0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370
0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x18ba248) at block.c:368
0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x18ba248) at record.c:470
0000006 0x0000000000409a7c in do_append_data (jcr=0x10d54f8, bs=0x7f9c5c414468, what=0x41f780 "FD") at append.c:222
0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10d54f8) at fd_cmds.c:269
0000008 0x0000000000412121 in do_fd_commands (jcr=0x10d54f8) at fd_cmds.c:225
0000009 0x0000000000412322 in run_job (jcr=0x10d54f8) at fd_cmds.c:181
0000010 0x0000000000412d45 in do_job_run (jcr=0x10d54f8) at job.c:237
0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c411f58, job_name=<optimized out>) at dir_cmd.c:281
0000012 handle_connection_request (arg=0x7f9c5c411f58) at dir_cmd.c:362
0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932
0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000017 0x0000000000000000 in ?? ()

Thread 9 (LWP 13164):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c5c417dd4 "\022\373\353<\216k.́W\363\340\034dūs\272F\363\354\330\374\001\320o\277\263\177\376\066\260s\364\317\061\\\213\357\222\033zl\376\351\037\031\346w\361%\363Df\351\260\061\003r\017l\255\261\063\247\354\275\371\060\067\203\237A\374\305\306\033\177:\204\263\322g\274W\355[\035\b\307e\303\331s\272M\006\350\241\334\021\377\250\236-\317 ~\227\017\360\312\337\373\334@?\371P~\205ܰ\025\216\215\223ɻ\363\263\373\005\362P8\223\330^,\243\226\305\177\313\a\360\345b\037q\253\343;\237s;\016[\203\023zu\215r^\216LJ\313\333\373\277\372\030\350", <incomplete sequence \371>, this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c20f578, ptr=<optimized out>, nbytes=43549) at bsock_tcp.c:890
0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c20f578) at bsock_tcp.c:545
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c20f578) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x109d268, bs=0x7f9c5c20f578, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x109d268) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x109d268) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x109d268) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x109d268) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c005448, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c005448) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 8 (LWP 13146):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x1143e34 "0\303\034l:\300\212\267k\002I\ti\366=U\001\070\"\213п\n]|\275<\371\201\022\375^ \314\017û\370\255\005ė\351\322a\361}co\222\332H\332%\357\270\341K\237]\300f\335\332\326\v\177_\223\243\342\024\005\266n\214\253\264$\350(\223\303E5F\351c\240x?\306*p\f\322(q*\332@\\\372^=\027\064\324?rz+uG\242\220zf\aۣ\202}\270Gm\314&CG\001\253\337\326jZ\323\374\327\255\372\365\244\062~\023[\t\fǧ\267\250\210\200\211#\005\365P\322\006\f\361\034\256\317\022'P\272\301V\264\263\232\a\035\061\263;\204\026\063Z\231!\370\vVŰ\020v\267ͺ\330", <incomplete sequence \367\203>..., this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c4120e8, ptr=<optimized out>, nbytes=59207) at bsock_tcp.c:890
0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c4120e8) at bsock_tcp.c:545
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c4120e8) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x10a46a8, bs=0x7f9c5c4120e8, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a46a8) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a46a8) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x10a46a8) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x10a46a8) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c00dbc8, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c00dbc8) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 7 (LWP 13094):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c567f382c "V\026", this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c310608, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890
0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c310608) at bsock_tcp.c:478
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c310608) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x10ac308, bs=0x7f9c5c310608, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10ac308) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x10ac308) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x10ac308) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x10ac308) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c414328, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c414328) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c00bb88) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 6 (LWP 11439):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x11093bc "\030\367`\372\377\376\374\210\002&QoBB\204\225\203\371;O\217\001\347X\273\212K\027\376`\310\005\274#\306z\270\027n\250\003\023*\352\060=\020S\020]\261\216\204\373\212\217\066:R;qb9\021SY\233\270\b$X],K\270y\016\033\277\377\261\023A\201;s\337@ӥ F\316\202\067\374!Ɏ.\"[\026\324\376\r\352\307\326n;\273\241\032\266", this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c413628, ptr=<optimized out>, nbytes=65058) at bsock_tcp.c:890
0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c413628) at bsock_tcp.c:545
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c413628) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x10a88a8, bs=0x7f9c5c413628, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a88a8) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a88a8) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x10a88a8) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x10a88a8) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c007dd8, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c007dd8) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c009248) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 5 (LWP 11436):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c58ff882c "\234\177", this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c414b88, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890
0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c414b88) at bsock_tcp.c:478
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c414b88) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x10a5468, bs=0x7f9c5c414b88, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a5468) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a5468) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x10a5468) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x10a5468) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c003e58, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c003e58) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c0052c8) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 4 (LWP 11435):
#0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c5c442a14 "\223f\020\244\035\"cK%H\313\302\307\302\306F\307C\317\034\216\224ԣjr\307>ĺ\272\033){\313\v", this=<optimized out>) at bsock_tcp.c:906
0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c20e1d8, ptr=<optimized out>, nbytes=65528) at bsock_tcp.c:890
0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c20e1d8) at bsock_tcp.c:545
0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c20e1d8) at bget_msg.c:50
0000005 0x0000000000409978 in do_append_data (jcr=0x10a2408, bs=0x7f9c5c20e1d8, what=0x41f780 "FD") at append.c:208
0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a2408) at fd_cmds.c:269
0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a2408) at fd_cmds.c:225
0000008 0x0000000000412322 in run_job (jcr=0x10a2408) at fd_cmds.c:181
0000009 0x0000000000412d45 in do_job_run (jcr=0x10a2408) at job.c:237
0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c002868, job_name=<optimized out>) at dir_cmd.c:281
0000011 handle_connection_request (arg=0x7f9c5c002868) at dir_cmd.c:362
0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c003cd8) at lockmgr.c:932
0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000016 0x0000000000000000 in ?? ()

Thread 3 (LWP 11434):
#0 0x00007f9c62097817 in ftruncate64 () from /lib/x86_64-linux-gnu/libc.so.6
0000001 0x00007f9c6449e11b in unix_file_device::d_truncate (this=0x1090f08, dcr=0x12b6e18) at backends/unix_file_device.c:222
0000002 0x00007f9c64487d87 in truncate (dcr=0x12b6e18, this=<optimized out>) at dev.h:470
0000003 DCR::rewrite_volume_label (this=0x12b6e18, recycle=true) at label.c:533
0000004 0x00007f9c6448d90b in DCR::mount_next_write_volume (this=0x12b6e18) at mount.c:272
0000005 0x00007f9c64486a39 in fixup_device_block_write_error (dcr=0x12b6e18, retries=4) at device.c:118
0000006 0x00007f9c644813bb in DCR::write_block_to_device (this=0x12b6e18) at block.c:408
0000007 0x00007f9c6448fe75 in DCR::write_record (this=0x12b6e18) at record.c:470
0000008 0x0000000000409a7c in do_append_data (jcr=0x10a0f88, bs=0x7f9c5c30fb08, what=0x41f780 "FD") at append.c:222
0000009 0x0000000000411d03 in append_data_cmd (jcr=0x10a0f88) at fd_cmds.c:269
0000010 0x0000000000412121 in do_fd_commands (jcr=0x10a0f88) at fd_cmds.c:225
0000011 0x0000000000412322 in run_job (jcr=0x10a0f88) at fd_cmds.c:181
0000012 0x0000000000412d45 in do_job_run (jcr=0x10a0f88) at job.c:237
0000013 0x000000000041136f in handle_director_connection (dir=0x7f9c5c001278, job_name=<optimized out>) at dir_cmd.c:281
0000014 handle_connection_request (arg=0x7f9c5c001278) at dir_cmd.c:362
0000015 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335
0000016 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c0026e8) at lockmgr.c:932
0000017 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 2 (LWP 10739):
#0 0x00007f9c631236bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64018210 in bthread_cond_timedwait_p (cond=0x7f9c64248680, m=0x7f9c64248640, abstime=0x7f9c60b9cde0, file=0x7f9c6403a152 "watchdog.c", line=313) at lockmgr.c:817
0000002 0x00007f9c6402f40a in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313
0000003 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x108fc78) at lockmgr.c:932
0000004 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000005 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000006 0x0000000000000000 in ?? ()

Thread 1 (LWP 10736):
#0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567
0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
0000005 0x0000000000000000 in ?? ()
#0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100
100 bsys.c: No such file or directory.
timeout = {tv_sec = 30, tv_nsec = 0}
tv = {tv_sec = 0, tv_usec = 140309669443060}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
status = <optimized out>
0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567
567 lockmgr.c: No such file or directory.
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, -3419864286863657408, 140309653787168, 140309631203776, 140309681307712, 3, 3437074444671153728, 3437086053367445056}, __mask_was_saved = 0}}, __pad = {0x7f9c61b9ef60, 0x0, 0x18, 0x7f9c61b9ef60}}
__cancel_routine = 0x7f9c64017b50 <cln_hdl(void*)>
__not_first_call = <optimized out>
old = 0
0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
0000005 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
scottles

scottles

2015-02-04 11:47

reporter   ~0001257

The core file is 13MB compressed (tehre is a 2MB upload limit here). If you need it I can make it available on one of our web servers.
pstorz

pstorz

2015-02-05 09:02

administrator   ~0001258

Hello Scottles,

Thank you very much for our help.

Yes, please make the core file available so that I can download it.


Best regards,

Philipp
scottles

scottles

2015-02-05 12:02

reporter   ~0001259

Please find a copy of the core file here:

https://shodan.netsight.co.uk/bareos/bareos-sd.core.10735.gz
pstorz

pstorz

2015-02-05 12:27

administrator   ~0001260

I successfully downloaded the core file. Thanks
scottles

scottles

2015-02-05 12:29

reporter   ~0001261

Ok, I will remove the core file from the web server. Let me know if you need access to it again.
scottles

scottles

2015-02-10 17:02

reporter   ~0001276

Hey, I can provide additional tracebacks and core dumps if required as the daemon has crashed twice more since the last trace back was provided.
pstorz

pstorz

2015-03-31 14:36

administrator   ~0001667

We did some tests and were not able to reproduce it reliably.

As we do not see this problem in other installations we work on other tasks for now.
mvwieringen

mvwieringen

2015-11-13 17:27

developer   ~0001960

Fix committed to bareos bareos-15.2 branch with changesetid 5766.
mvwieringen

mvwieringen

2015-11-17 12:01

developer   ~0001975

Fix committed to bareos bareos-14.2 branch with changesetid 5818.

Related Changesets

bareos: master da1f26aa

2015-02-08 22:29

mvwieringen

Ported: N/A

Details Diff
Volume manager cleanup

- Cleanup comments
- Make concurrent read from filedevice configurable
- Use find_read_volume() instead of is_on_read_volume_list()
- Make free_volume_list() a generic function.

Instead of some trickery with using some temp vars give
free_volume_list() two args e.g. a comment what about what its freeing
and a dlist with the actual volume list it needs to free. Then we can
reuse this in 3 places to delete any volume list. We also don't have
to lock the vol_list when freeing the temp_vol_list as it now no longer
"misuses" the global dlist pointer.

Issues 0000414: Bareos storage daemon crashes during backups
Affected Issues
0000414
mod - src/stored/reserve.c Diff File
mod - src/stored/stored_conf.c Diff File
mod - src/stored/stored_conf.h Diff File
mod - src/stored/vol_mgr.c Diff File

bareos: bareos-14.2 8dc8375d

2015-04-25 17:47

mvwieringen

Ported: N/A

Details Diff
Overhaul of the vol_mgr.

Remove some new code that seems to give some race conditions.
This essentially restores the volume manager to the way it worked in 13.2.

Additionally we changed the following:
- Cleanup comments
- Make free_volume_list() a generic function.
Instead of some trickery with using some temp vars give
free_volume_list() two args e.g. a comment what about what its freeing
and a dlist with the actual volume list it needs to free. Then we can
reuse this in 3 places to delete any volume list. We also don't
have to lock the vol_list when freeing the temp_vol_list as it
now no longer "misuses" the global dlist pointer.

Issues 0000414: Bareos storage daemon crashes during backups
Affected Issues
0000414
mod - src/stored/reserve.c Diff File
mod - src/stored/vol_mgr.c Diff File

bareos: bareos-15.2 0b6435d7

2015-11-12 18:39

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
Fix random crashes on sd

The block variable was set to the dcr->block, but that can be altered in
the call to dev->set_label_blocksize(dcr).

When that happens, the code goes on with the wrong block.
We removed the whole local variable as it makes no sense and is only
referenced 3 times when calling empty_block()

Fixes 0000414: Bareos storage daemon crashes during backups
Fixse 0000483: bareos-sd crash during backup
Fixes 0000522: storage daemon crashes ocassionally when starting a new job
Fixes 0000552: SD crashes in -current

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000414, 0000483, 0000522, 0000552, 0000564
mod - src/stored/label.c Diff File

bareos: bareos-14.2 3a09212c

2015-11-12 18:39

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
Fix random crashes on sd

The block variable was set to the dcr->block, but that can be altered in
the call to dev->set_label_blocksize(dcr).

When that happens, the code goes on with the wrong block.
We removed the whole local variable as it makes no sense and is only
referenced 3 times when calling empty_block()

Fixes 0000414: Bareos storage daemon crashes during backups
Fixse 0000483: bareos-sd crash during backup
Fixes 0000522: storage daemon crashes ocassionally when starting a new job
Fixes 0000552: SD crashes in -current

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000414, 0000522, 0000552
mod - src/stored/label.c Diff File

Issue History

Date Modified Username Field Change
2015-01-28 18:56 scottles New Issue
2015-01-28 18:56 scottles File Added: bareos.log
2015-01-29 09:49 pstorz Assigned To => pstorz
2015-01-29 09:49 pstorz Status new => assigned
2015-01-29 10:11 pstorz Note Added: 0001231
2015-01-29 10:11 pstorz Status assigned => feedback
2015-01-29 12:12 scottles Note Added: 0001233
2015-01-29 12:12 scottles Status feedback => assigned
2015-01-29 12:30 pstorz Note Added: 0001234
2015-01-29 12:30 pstorz Status assigned => feedback
2015-01-29 12:44 scottles Note Added: 0001235
2015-01-29 12:44 scottles Status feedback => assigned
2015-02-03 12:15 scottles Note Added: 0001253
2015-02-04 11:45 scottles Note Added: 0001256
2015-02-04 11:47 scottles Note Added: 0001257
2015-02-05 09:02 pstorz Note Added: 0001258
2015-02-05 09:02 pstorz Status assigned => feedback
2015-02-05 12:02 scottles Note Added: 0001259
2015-02-05 12:02 scottles Status feedback => assigned
2015-02-05 12:27 pstorz Note Added: 0001260
2015-02-05 12:29 scottles Note Added: 0001261
2015-02-10 17:02 scottles Note Added: 0001276
2015-03-31 14:34 pstorz Assigned To pstorz =>
2015-03-31 14:36 pstorz Note Added: 0001667
2015-03-31 14:36 pstorz Status assigned => confirmed
2015-06-02 22:36 mvwieringen Changeset attached => bareos master da1f26aa
2015-06-17 12:13 joergs Relationship added child of 0000474
2015-06-17 20:26 mvwieringen Changeset attached => bareos bareos-14.2 8dc8375d
2015-11-13 10:19 stephand Relationship added related to 0000522
2015-11-13 10:19 stephand Relationship added related to 0000552
2015-11-13 10:19 stephand Relationship added related to 0000483
2015-11-13 17:27 mvwieringen Changeset attached => bareos bareos-15.2 0b6435d7
2015-11-13 17:27 mvwieringen Note Added: 0001960
2015-11-13 17:27 mvwieringen Status confirmed => resolved
2015-11-13 17:27 mvwieringen Resolution open => fixed
2015-11-17 12:01 mvwieringen Changeset attached => bareos bareos-14.2 3a09212c
2015-11-17 12:01 mvwieringen Note Added: 0001975
2019-12-18 15:25 arogge Status resolved => closed