View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000414 | bareos-core | storage daemon | public | 2015-01-28 18:56 | 2019-12-18 15:25 |
Reporter | scottles | Assigned To | |||
Priority | normal | Severity | crash | Reproducibility | random |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | Debian | OS Version | 7 |
Product Version | 14.2.2 | ||||
Summary | 0000414: Bareos storage daemon crashes during backups | ||||
Description | Since upgrading to Bareos 14.2.2-46.1 from 13.2.3-33.1, on the 13th of January, the storage daemon on our file storage server crashes almost every night whilst backups or copy migrations are running, preventing nightly backups from completing. The log file for the storage daemon outputs a message along the following lines: 28-Jan 02:57 bareos-storage: ABORTING due to ERROR in smartall.c:209 Overrun buffer: len=129 addr=15ef608 allocated: vol_mgr.c:896 called from dlist.c:326 The file volumes are are stored on a btrfs volume (we are using upstream 3.14.26 kernel compiled from kernel.org). Usually once a week a copy migration job runs to migrate the latest file based backups to tape storage, however this process also triggers the storage daemon on our file storage server to crash. | ||||
Steps To Reproduce | It appears to occur randomly during our routine nightly file based backups. | ||||
Additional Information | I have included the log from the storage daemon. If you need any further logs or information I will be happy to provide as needed. | ||||
Tags | No tags attached. | ||||
bareos.log (12,349 bytes)
26-Jan 13:29 bareos-storage JobId 16312: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes). 26-Jan 13:32 bareos-storage JobId 16312: Forward spacing Volume "0062" to file:block 4:2905371111. 26-Jan 14:33 bareos-storage JobId 16312: End of Volume at file 24 on device "FileStorage1" (/filestorage/volumes), Volume "0062" 26-Jan 14:33 bareos-storage JobId 16312: Ready to read from volume "0055" on device "FileStorage1" (/filestorage/volumes). 26-Jan 14:33 bareos-storage JobId 16312: Forward spacing Volume "0055" to file:block 0:1048784. 26-Jan 15:09 bareos-storage: ABORTING due to ERROR in smartall.c:209 Overrun buffer: len=129 addr=17d70c8 allocated: vol_mgr.c:277 called from vol_mgr.c:306 26-Jan 18:46 bareos-storage JobId 16372: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes). 26-Jan 19:03 bareos-storage JobId 16372: Forward spacing Volume "0062" to file:block 4:2871816679. 26-Jan 19:05 bareos-storage JobId 16374: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes). 26-Jan 19:05 bareos-storage JobId 16374: Forward spacing Volume "0062" to file:block 4:2831970791. 26-Jan 19:07 bareos-storage JobId 16376: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes). 26-Jan 19:07 bareos-storage JobId 16376: Forward spacing Volume "0038" to file:block 0:64719. 26-Jan 19:09 bareos-storage JobId 16378: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes). 26-Jan 19:09 bareos-storage JobId 16378: Forward spacing Volume "0038" to file:block 0:207. 26-Jan 19:11 bareos-storage: ABORTING due to ERROR in smartall.c:209 Overrun buffer: len=129 addr=ce56a8 allocated: vol_mgr.c:277 called from vol_mgr.c:306 27-Jan 11:45 bareos-storage JobId 16516: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes). 27-Jan 11:45 bareos-storage JobId 16516: Forward spacing Volume "0062" to file:block 4:2871816679. 27-Jan 11:48 bareos-storage JobId 16518: Ready to read from volume "0062" on device "FileStorage1" (/filestorage/volumes). 27-Jan 11:48 bareos-storage JobId 16518: Forward spacing Volume "0062" to file:block 4:2831970791. 27-Jan 11:50 bareos-storage JobId 16520: Ready to read from volume "0038" on device "FileStorage1" (/filestorage/volumes). 27-Jan 11:50 bareos-storage JobId 16520: Forward spacing Volume "0038" to file:block 0:64719. 27-Jan 22:00 bareos-storage JobId 16553: Volume "0053" previously written, moving to end of data. 27-Jan 22:00 bareos-storage JobId 16553: Warning: For Volume "0053": The sizes do not match! Volume=103332549838 Catalog=103307384014 Correcting Catalog 27-Jan 22:00 bareos-storage JobId 16551: Volume "0046" previously written, moving to end of data. 27-Jan 22:00 bareos-storage JobId 16551: Warning: For Volume "0046": The sizes do not match! Volume=14670917709 Catalog=14387802195 Correcting Catalog 27-Jan 22:00 bareos-storage JobId 16557: Volume "0096" previously written, moving to end of data. 27-Jan 22:00 bareos-storage JobId 16557: Warning: For Volume "0096": The sizes do not match! Volume=28462383680 Catalog=28023030336 Correcting Catalog 27-Jan 22:00 bareos-storage JobId 16553: Elapsed time=00:00:53, Transfer rate=8.926 M Bytes/second 27-Jan 22:00 bareos-storage JobId 16553: Sending spooled attrs to the Director. Despooling 8,033,467 bytes ... 27-Jan 22:01 bareos-storage JobId 16561: Elapsed time=00:01:36, Transfer rate=3.810 M Bytes/second 27-Jan 22:01 bareos-storage JobId 16561: Sending spooled attrs to the Director. Despooling 1,143,180 bytes ... 27-Jan 22:04 bareos-storage JobId 16558: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes). 27-Jan 22:04 bareos-storage JobId 16558: End of medium on Volume "0053" Bytes=107,373,990,340 Blocks=102,381 at 27-Jan-2015 22:04. 27-Jan 22:20 bareos-storage JobId 16558: Recycled volume "0058" on device "FileStorage1" (/filestorage/volumes), all previous data lost. 27-Jan 22:20 bareos-storage JobId 16558: New volume "0058" mounted on device "FileStorage1" (/filestorage/volumes) at 27-Jan-2015 22:20. 27-Jan 22:23 bareos-storage JobId 16554: Elapsed time=00:23:15, Transfer rate=2.530 M Bytes/second 27-Jan 22:23 bareos-storage JobId 16554: Sending spooled attrs to the Director. Despooling 36,091,410 bytes ... 27-Jan 22:47 bareos-storage JobId 16559: Elapsed time=00:24:06, Transfer rate=7.633 M Bytes/second 27-Jan 22:47 bareos-storage JobId 16559: Sending spooled attrs to the Director. Despooling 26,338,623 bytes ... 27-Jan 22:49 bareos-storage JobId 16552: Elapsed time=00:49:21, Transfer rate=11.04 M Bytes/second 27-Jan 22:49 bareos-storage JobId 16552: Sending spooled attrs to the Director. Despooling 9,153,557 bytes ... 27-Jan 22:54 bareos-storage JobId 16563: Elapsed time=00:54:12, Transfer rate=1.627 M Bytes/second 27-Jan 22:54 bareos-storage JobId 16563: Sending spooled attrs to the Director. Despooling 445,297,833 bytes ... 27-Jan 22:57 bareos-storage JobId 16568: Elapsed time=00:33:38, Transfer rate=10.13 M Bytes/second 27-Jan 22:57 bareos-storage JobId 16568: Sending spooled attrs to the Director. Despooling 13,976,559 bytes ... 27-Jan 23:00 bareos-storage JobId 16564: Elapsed time=00:10:41, Transfer rate=2.705 M Bytes/second 27-Jan 23:00 bareos-storage JobId 16564: Sending spooled attrs to the Director. Despooling 12,219,655 bytes ... 27-Jan 23:00 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes). 27-Jan 23:00 bareos-storage JobId 16562: End of medium on Volume "0046" Bytes=107,374,025,657 Blocks=102,133 at 27-Jan-2015 23:00. 27-Jan 23:01 bareos-storage JobId 16562: Recycled volume "0007" on device "FileStorage2" (/filestorage/volumes), all previous data lost. 27-Jan 23:01 bareos-storage JobId 16562: New volume "0007" mounted on device "FileStorage2" (/filestorage/volumes) at 27-Jan-2015 23:01. 27-Jan 23:10 bareos-storage JobId 16569: Elapsed time=00:22:19, Transfer rate=2.785 M Bytes/second 27-Jan 23:10 bareos-storage JobId 16569: Sending spooled attrs to the Director. Despooling 15,894,901 bytes ... 27-Jan 23:56 bareos-storage JobId 16566: Elapsed time=00:46:37, Transfer rate=5.636 M Bytes/second 27-Jan 23:56 bareos-storage JobId 16566: Sending spooled attrs to the Director. Despooling 54,140,188 bytes ... 28-Jan 00:01 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes). 28-Jan 00:01 bareos-storage JobId 16562: End of medium on Volume "0007" Bytes=107,373,139,932 Blocks=102,400 at 28-Jan-2015 00:01. 28-Jan 00:03 bareos-storage JobId 16562: Recycled volume "0036" on device "FileStorage2" (/filestorage/volumes), all previous data lost. 28-Jan 00:03 bareos-storage JobId 16562: New volume "0036" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 00:03. 28-Jan 00:15 bareos-storage JobId 16551: Elapsed time=02:15:21, Transfer rate=4.592 M Bytes/second 28-Jan 00:15 bareos-storage JobId 16551: Sending spooled attrs to the Director. Despooling 56,252,739 bytes ... 28-Jan 00:24 bareos-storage JobId 16560: Elapsed time=02:22:13, Transfer rate=2.871 M Bytes/second 28-Jan 00:24 bareos-storage JobId 16560: Sending spooled attrs to the Director. Despooling 117,011,081 bytes ... 28-Jan 00:30 bareos-storage JobId 16576: Elapsed time=00:33:39, Transfer rate=2.690 M Bytes/second 28-Jan 00:30 bareos-storage JobId 16576: Sending spooled attrs to the Director. Despooling 36,243,056 bytes ... 28-Jan 00:48 bareos-storage JobId 16579: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes). 28-Jan 00:48 bareos-storage JobId 16579: End of medium on Volume "0058" Bytes=107,373,290,049 Blocks=102,401 at 28-Jan-2015 00:48. 28-Jan 00:48 bareos-storage JobId 16579: Recycled volume "0104" on device "FileStorage1" (/filestorage/volumes), all previous data lost. 28-Jan 00:48 bareos-storage JobId 16579: New volume "0104" mounted on device "FileStorage1" (/filestorage/volumes) at 28-Jan-2015 00:48. 28-Jan 01:06 bareos-storage JobId 16562: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes). 28-Jan 01:06 bareos-storage JobId 16562: End of medium on Volume "0036" Bytes=107,373,238,407 Blocks=102,400 at 28-Jan-2015 01:06. 28-Jan 01:07 bareos-storage JobId 16562: Recycled volume "0071" on device "FileStorage2" (/filestorage/volumes), all previous data lost. 28-Jan 01:07 bareos-storage JobId 16562: New volume "0071" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 01:07. 28-Jan 01:36 bareos-storage JobId 16565: Elapsed time=01:05:24, Transfer rate=2.499 M Bytes/second 28-Jan 01:36 bareos-storage JobId 16565: Sending spooled attrs to the Director. Despooling 282,891,508 bytes ... 28-Jan 01:59 bareos-storage JobId 16575: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes). 28-Jan 01:59 bareos-storage JobId 16575: End of medium on Volume "0071" Bytes=107,373,179,552 Blocks=102,400 at 28-Jan-2015 01:59. 28-Jan 02:00 bareos-storage JobId 16575: Recycled volume "0022" on device "FileStorage2" (/filestorage/volumes), all previous data lost. 28-Jan 02:00 bareos-storage JobId 16575: New volume "0022" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 02:00. 28-Jan 02:04 bareos-storage JobId 16575: Elapsed time=01:48:12, Transfer rate=8.891 M Bytes/second 28-Jan 02:04 bareos-storage JobId 16575: Sending spooled attrs to the Director. Despooling 84,680,155 bytes ... 28-Jan 02:07 bareos-storage JobId 16581: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage1" (/filestorage/volumes). 28-Jan 02:07 bareos-storage JobId 16581: End of medium on Volume "0104" Bytes=107,373,133,951 Blocks=102,399 at 28-Jan-2015 02:07. 28-Jan 02:08 bareos-storage JobId 16581: Recycled volume "0028" on device "FileStorage1" (/filestorage/volumes), all previous data lost. 28-Jan 02:08 bareos-storage JobId 16581: New volume "0028" mounted on device "FileStorage1" (/filestorage/volumes) at 28-Jan-2015 02:08. 28-Jan 02:09 bareos-storage JobId 16562: Elapsed time=04:06:41, Transfer rate=8.062 M Bytes/second 28-Jan 02:09 bareos-storage JobId 16562: Sending spooled attrs to the Director. Despooling 344,122,159 bytes ... 28-Jan 02:21 bareos-storage JobId 16556: Elapsed time=04:21:32, Transfer rate=4.728 M Bytes/second 28-Jan 02:21 bareos-storage JobId 16556: Sending spooled attrs to the Director. Despooling 285,120,110 bytes ... 28-Jan 02:26 bareos-storage JobId 16557: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage3" (/filestorage/volumes). 28-Jan 02:26 bareos-storage JobId 16557: End of medium on Volume "0096" Bytes=107,374,019,002 Blocks=101,982 at 28-Jan-2015 02:26. 28-Jan 02:27 bareos-storage JobId 16557: Recycled volume "0014" on device "FileStorage3" (/filestorage/volumes), all previous data lost. 28-Jan 02:27 bareos-storage JobId 16557: New volume "0014" mounted on device "FileStorage3" (/filestorage/volumes) at 28-Jan-2015 02:27. 28-Jan 02:29 bareos-storage JobId 16573: Elapsed time=00:24:55, Transfer rate=5.192 M Bytes/second 28-Jan 02:29 bareos-storage JobId 16573: Sending spooled attrs to the Director. Despooling 82,710,650 bytes ... 28-Jan 02:40 bareos-storage JobId 16558: Elapsed time=04:23:36, Transfer rate=5.757 M Bytes/second 28-Jan 02:40 bareos-storage JobId 16558: Sending spooled attrs to the Director. Despooling 478,272,299 bytes ... 28-Jan 02:55 bareos-storage JobId 16580: User defined maximum volume capacity 107,374,182,400 exceeded on device "FileStorage2" (/filestorage/volumes). 28-Jan 02:55 bareos-storage JobId 16580: End of medium on Volume "0022" Bytes=107,373,414,339 Blocks=102,401 at 28-Jan-2015 02:55. 28-Jan 02:57 bareos-storage: ABORTING due to ERROR in smartall.c:209 Overrun buffer: len=129 addr=15ef608 allocated: vol_mgr.c:896 called from dlist.c:326 28-Jan 02:57 bareos-storage JobId 16580: Recycled volume "0015" on device "FileStorage2" (/filestorage/volumes), all previous data lost. 28-Jan 02:57 bareos-storage JobId 16580: New volume "0015" mounted on device "FileStorage2" (/filestorage/volumes) at 28-Jan-2015 02:57. |
|
Hello scottles, thank you for submitting this bugreport. Now we need to catch the bug. In order to do this I would like to ask you to do the following: Can you please install the packages "bareos-dbg" and "gdb"? This will enable us to get backtraces of the crashed daemon. This should produce backtraces when the next crash happens, and that will hopefully give us a bit more information when this problem happens. Please provide the info that the backtrace produces when it happens. Thank you very much Philipp |
|
Hi Philipp, I have installed bareos-dbg and gdb. I assume the debugging output just goes in to the mail spool? |
|
Hello, the output goes to /var/lib/bareos. There will be the .traceback file and the .core files: Here is an example of a trace ouput: ============================ Created /var/lib/bareos/bareos-dir.core.5118 for doing postmortem debugging [New LWP 5120] [New LWP 5122] [New LWP 5123] [New LWP 5124] [New LWP 5118] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/bareos-dir'. #0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 $1 = 0x6987c0 "ci-Debian-7-0-64-918a1ff57eeaf1d1-dir" $2 = 0x1767088 "bareos-dir" $3 = 0x17670c8 "/usr/sbin/bareos-dir" $4 = 0x179fa18 "PostgreSQL" $5 = 0x7fbcb3650d55 "14.2.2 (12 December 2014)" $6 = 0x7fbcb3650d41 "x86_64-pc-linux-gnu" $7 = 0x7fbcb3650d3a "debian" $8 = 0x7fbcb3650d1c "Debian GNU/Linux 7.0 (wheezy)" $9 = "ci-Debian-7-0-64-918a1ff57eeaf1d1", '\000' <repeats 222 times> $10 = 0x7fbcb36512b8 "debian Debian GNU/Linux 7.0 (wheezy)" Environment variable "TestName" not defined. #0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567 0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000005 0x0000000000000000 in ?? () Thread 5 (LWP 5118): #0 0x00007fbcb2752c1d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007fbcb3640d01 in signal_handler (sig=11) at signal.c:228 0000002 <signal handler called> 0000003 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 0000004 0x00007fbcb3622c59 in bmicrosleep (sec=60, usec=0) at bsys.c:100 0000005 0x000000000044097e in wait_for_next_job (one_shot_job_to_run=<optimized out>) at scheduler.c:124 0000006 0x000000000040ed37 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:375 Thread 4 (LWP 5124): #0 0x00007fbcb274f64b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007fbcb3633210 in bthread_cond_timedwait_p (cond=0x699620, m=0x6995e0, abstime=0x7fbcacc18d10, file=0x47b05d "stats.c", line=117) at lockmgr.c:817 0000002 0x0000000000440f9a in wait_for_next_run () at stats.c:117 0000003 statistics_thread_runner (arg=arg@entry=0x0) at stats.c:290 0000004 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1799178) at lockmgr.c:932 0000005 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000006 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000007 0x0000000000000000 in ?? () Thread 3 (LWP 5123): #0 0x00007fbcb274f64b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007fbcb3633210 in bthread_cond_timedwait_p (cond=0x7fbcb3863680, m=0x7fbcb3863640, abstime=0x7fbcad419de0, file=0x7fbcb3655152 "watchdog.c", line=313) at lockmgr.c:817 0000002 0x00007fbcb364a40a in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313 0000003 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1795ff8) at lockmgr.c:932 0000004 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000005 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000006 0x0000000000000000 in ?? () Thread 2 (LWP 5122): #0 0x00007fbcb16d8223 in poll () from /lib/x86_64-linux-gnu/libc.so.6 0000001 0x00007fbcb3617933 in bnet_thread_server_tcp (addr_list=0x7fbcadc19f30, max_clients=20, sockfds=0x1798048, client_wq=0x6996a0, nokeepalive=false, handle_client_request=0x460790 <handle_UA_client_request(void*)>) at bnet_server_tcp.c:298 0000002 0x000000000046037f in connect_thread (arg=arg@entry=0x1767f88) at ua_server.c:92 0000003 0x00007fbcb3632bb2 in lmgr_thread_launcher (x=0x1796e28) at lockmgr.c:932 0000004 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000005 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000006 0x0000000000000000 in ?? () Thread 1 (LWP 5120): #0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567 0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000005 0x0000000000000000 in ?? () #0 0x00007fbcb27528ad in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. 0000001 0x00007fbcb3622c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 100 bsys.c: No such file or directory. timeout = {tv_sec = 30, tv_nsec = 0} tv = {tv_sec = 0, tv_usec = 140448440198644} tz = {tz_minuteswest = 0, tz_dsttime = 0} status = <optimized out> 0000002 0x00007fbcb3632c54 in check_deadlock () at lockmgr.c:567 567 lockmgr.c: No such file or directory. __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 2602541748363333316, 140448424612256, 140448402151872, 140448458272832, 3, -2637007306732990780, -2637012440356090172}, __mask_was_saved = 0}}, __pad = {0x7fbcb11e8f60, 0x0, 0x18, 0x7fbcb11e8f60}} __cancel_routine = 0x7fbcb3632b50 <cln_hdl(void*)> __not_first_call = <optimized out> old = 0 0000003 0x00007fbcb274ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. 0000004 0x00007fbcb16e2e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. 0000005 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. ----- END of /var/lib/bareos/bareos.5118.traceback ---- ----- checking for /usr/bin/bareos-dir in traceback ---- Created /var/lib/bareos/bareos-dir.core.5118 for doing postmortem debugging Core was generated by `/usr/sbin/bareos-dir'. $1 = 0x6987c0 "ci-Debian-7-0-64-918a1ff57eeaf1d1-dir" $2 = 0x1767088 "bareos-dir" $3 = 0x17670c8 "/usr/sbin/bareos-dir" 0000006 0x000000000040ed37 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:375 100 bsys.c: No such file or directory. 567 lockmgr.c: No such file or directory. ======================= |
|
Great, thanks. I will provide a traceback and core files the next time it crashes (I suspect it will crash sometime over the weekend). | |
Hi, last night the bareos-sd daemon crashed, but the traceback file is less than helpful and no core file was created. It contained the following: root@bareos-storage:/var/lib/bareos# cat bareos.3315.traceback Created /var/lib/bareos/bareos-sd.core.3315 for doing postmortem debugging ptrace: Operation not permitted. /var/lib/bareos/3315: No such file or directory. $1 = 0 /usr/lib/bareos/scripts/btraceback.gdb:2: Error in sourced command file: No symbol "exename" in current context. Googling around suggests I need to ensure /usr/bin/gdb has the setuid flag set as gdb needs to run as root. I have set the setuid bit on gdb and will have to wait for bareos-sd to crash again. |
|
Success! Here is the traceback: Created /var/lib/bareos/bareos-sd.core.10735 for doing postmortem debugging [New LWP 10736] [New LWP 10739] [New LWP 11434] [New LWP 11435] [New LWP 11436] [New LWP 11439] [New LWP 13094] [New LWP 13146] [New LWP 13164] [New LWP 13172] [New LWP 13215] [New LWP 13293] [New LWP 13957] [New LWP 14035] [New LWP 14072] [New LWP 14092] [New LWP 10735] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff0c00000 Core was generated by `/usr/sbin/bareos-sd'. #0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 $1 = 0x626220 "bareos-storage" $2 = 0x1068058 "bareos-sd" $3 = 0x1068098 "/usr/sbin/bareos-sd" $4 = 0x0 $5 = 0x7f9c64035d55 "14.2.2 (12 December 2014)" $6 = 0x7f9c64035d41 "x86_64-pc-linux-gnu" $7 = 0x7f9c64035d3a "debian" $8 = 0x7f9c64035d1c "Debian GNU/Linux 7.0 (wheezy)" $9 = "bareos-storage", '\000' <repeats 241 times> $10 = 0x7f9c640362b8 "debian Debian GNU/Linux 7.0 (wheezy)" Environment variable "TestName" not defined. #0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567 0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000005 0x0000000000000000 in ?? () Thread 17 (LWP 10735): #0 0x00007f9c62091ac3 in poll () from /lib/x86_64-linux-gnu/libc.so.6 0000001 0x00007f9c63ffc933 in bnet_thread_server_tcp (addr_list=0x7ffff0a7b4c0, max_clients=41, sockfds=0x108fe48, client_wq=0x626600, nokeepalive=false, handle_client_request=0x410f60 <handle_connection_request(void*)>) at bnet_server_tcp.c:298 0000002 0x0000000000409021 in main (argc=<optimized out>, argv=<optimized out>) at stored.c:299 Thread 16 (LWP 14092): #0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64022e86 in rwl_writelock_p (rwl=0x7f9c646b28a0, file=<optimized out>, line=<optimized out>) at rwlock.c:236 0000002 0x00007f9c64490a51 in _lock_reservations (file=file@entry=0x7f9c644a6685 "reserve.c", line=line@entry=354) at reserve.c:104 0000003 0x00007f9c644933e7 in use_device_cmd (jcr=0x13b75f8) at reserve.c:354 0000004 use_cmd (jcr=0x13b75f8) at reserve.c:67 0000005 0x000000000041136f in handle_director_connection (dir=0x7f9c5c30f7e8, job_name=<optimized out>) at dir_cmd.c:281 0000006 handle_connection_request (arg=0x7f9c5c30f7e8) at dir_cmd.c:362 0000007 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000008 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c009248) at lockmgr.c:932 0000009 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000010 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000011 0x0000000000000000 in ?? () Thread 15 (LWP 14072): #0 0x00007f9c63126c8d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64025d01 in signal_handler (sig=11) at signal.c:228 0000002 <signal handler called> 0000003 e_msg (file=<optimized out>, line=<optimized out>, type=<optimized out>, level=<optimized out>, fmt=0x7f9c640388e8 "Overrun buffer: len=%d addr=%p allocated: %s:%d called from %s:%d\n") at message.c:1403 0000004 0x00007f9c64026a81 in sm_free (file=0x7f9c6403440c "dlist.c", line=326, fp=0x10b2e08) at smartall.c:209 0000005 0x00007f9c64010cb2 in dlist::destroy (this=0x10a74a8) at dlist.c:326 0000006 0x00007f9c6449b894 in ~dlist (this=<optimized out>, __in_chrg=<optimized out>) at ../lib/dlist.h:62 0000007 free_volume_list () at vol_mgr.c:788 0000008 0x00007f9c6449d705 in free_temp_vol_list (temp_vol_list=0x10a74a8) at vol_mgr.c:923 0000009 0x00007f9c64492c67 in find_suitable_device_for_job (jcr=jcr@entry=0x11b5f28, rctx=...) at reserve.c:506 0000010 0x00007f9c6449352b in use_device_cmd (jcr=0x11b5f28) at reserve.c:330 0000011 use_cmd (jcr=0x11b5f28) at reserve.c:67 0000012 0x000000000041136f in handle_director_connection (dir=0x7f9c5c415fa8, job_name=<optimized out>) at dir_cmd.c:281 0000013 handle_connection_request (arg=0x7f9c5c415fa8) at dir_cmd.c:362 0000014 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000015 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c412098) at lockmgr.c:932 0000016 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000017 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #18 0x0000000000000000 in ?? () Thread 14 (LWP 14035): #0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832 0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387 0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370 0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x10b5078) at block.c:368 0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x10b5078) at record.c:470 0000006 0x0000000000409a7c in do_append_data (jcr=0x10949e8, bs=0x7f9c5c4138a8, what=0x41f780 "FD") at append.c:222 0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10949e8) at fd_cmds.c:269 0000008 0x0000000000412121 in do_fd_commands (jcr=0x10949e8) at fd_cmds.c:225 0000009 0x0000000000412322 in run_job (jcr=0x10949e8) at fd_cmds.c:181 0000010 0x0000000000412d45 in do_job_run (jcr=0x10949e8) at job.c:237 0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c310098, job_name=<optimized out>) at dir_cmd.c:281 0000012 handle_connection_request (arg=0x7f9c5c310098) at dir_cmd.c:362 0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c412098) at lockmgr.c:932 0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000017 0x0000000000000000 in ?? () Thread 13 (LWP 13957): #0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832 0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387 0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370 0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x109de48) at block.c:368 0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x109de48) at record.c:470 0000006 0x0000000000409a7c in do_append_data (jcr=0x10a6fe8, bs=0x7f9c5c413d48, what=0x41f780 "FD") at append.c:222 0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10a6fe8) at fd_cmds.c:269 0000008 0x0000000000412121 in do_fd_commands (jcr=0x10a6fe8) at fd_cmds.c:225 0000009 0x0000000000412322 in run_job (jcr=0x10a6fe8) at fd_cmds.c:181 0000010 0x0000000000412d45 in do_job_run (jcr=0x10a6fe8) at job.c:237 0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c4141a8, job_name=<optimized out>) at dir_cmd.c:281 0000012 handle_connection_request (arg=0x7f9c5c4141a8) at dir_cmd.c:362 0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c006668) at lockmgr.c:932 0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000017 0x0000000000000000 in ?? () Thread 12 (LWP 13293): #0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64022e86 in rwl_writelock_p (rwl=0x7f9c646b28a0, file=<optimized out>, line=<optimized out>) at rwlock.c:236 0000002 0x00007f9c64490a51 in _lock_reservations (file=file@entry=0x7f9c644a6685 "reserve.c", line=line@entry=354) at reserve.c:104 0000003 0x00007f9c644933e7 in use_device_cmd (jcr=0x108e308) at reserve.c:354 0000004 use_cmd (jcr=0x108e308) at reserve.c:67 0000005 0x000000000041136f in handle_director_connection (dir=0x7f9c5c412968, job_name=<optimized out>) at dir_cmd.c:281 0000006 handle_connection_request (arg=0x7f9c5c412968) at dir_cmd.c:362 0000007 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000008 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932 0000009 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000010 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000011 0x0000000000000000 in ?? () Thread 11 (LWP 13215): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c50fe882c "\234\177", this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c0067e8, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890 0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c0067e8) at bsock_tcp.c:478 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c0067e8) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x1097b68, bs=0x7f9c5c0067e8, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x1097b68) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x1097b68) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x1097b68) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x1097b68) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c413768, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c413768) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 10 (LWP 13172): #0 0x00007f9c63123344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c640182c5 in bthread_cond_wait_p (cond=0x1091000, m=0x1090f48, file=0x7f9c644a0bcf "lock.c", line=387) at lockmgr.c:832 0000002 0x00007f9c6448b527 in rLock (this=0x1090f08, locked=<optimized out>) at lock.c:387 0000003 DEVICE::rLock (this=0x1090f08, locked=<optimized out>) at lock.c:370 0000004 0x00007f9c6448140e in DCR::write_block_to_device (this=0x18ba248) at block.c:368 0000005 0x00007f9c6448fe75 in DCR::write_record (this=0x18ba248) at record.c:470 0000006 0x0000000000409a7c in do_append_data (jcr=0x10d54f8, bs=0x7f9c5c414468, what=0x41f780 "FD") at append.c:222 0000007 0x0000000000411d03 in append_data_cmd (jcr=0x10d54f8) at fd_cmds.c:269 0000008 0x0000000000412121 in do_fd_commands (jcr=0x10d54f8) at fd_cmds.c:225 0000009 0x0000000000412322 in run_job (jcr=0x10d54f8) at fd_cmds.c:181 0000010 0x0000000000412d45 in do_job_run (jcr=0x10d54f8) at job.c:237 0000011 0x000000000041136f in handle_director_connection (dir=0x7f9c5c411f58, job_name=<optimized out>) at dir_cmd.c:281 0000012 handle_connection_request (arg=0x7f9c5c411f58) at dir_cmd.c:362 0000013 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000014 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932 0000015 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000016 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000017 0x0000000000000000 in ?? () Thread 9 (LWP 13164): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c5c417dd4 "\022\373\353<\216k.́W\363\340\034dūs\272F\363\354\330\374\001\320o\277\263\177\376\066\260s\364\317\061\\\213\357\222\033zl\376\351\037\031\346w\361%\363Df\351\260\061\003r\017l\255\261\063\247\354\275\371\060\067\203\237A\374\305\306\033\177:\204\263\322g\274W\355[\035\b\307e\303\331s\272M\006\350\241\334\021\377\250\236-\317 ~\227\017\360\312\337\373\334@?\371P~\205ܰ\025\216\215\223ɻ\363\263\373\005\362P8\223\330^,\243\226\305\177\313\a\360\345b\037q\253\343;\237s;\016[\203\023zu\215r^\216LJ\313\333\373\277\372\030\350", <incomplete sequence \371>, this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c20f578, ptr=<optimized out>, nbytes=43549) at bsock_tcp.c:890 0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c20f578) at bsock_tcp.c:545 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c20f578) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x109d268, bs=0x7f9c5c20f578, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x109d268) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x109d268) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x109d268) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x109d268) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c005448, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c005448) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 8 (LWP 13146): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x1143e34 "0\303\034l:\300\212\267k\002I\ti\366=U\001\070\"\213п\n]|\275<\371\201\022\375^ \314\017û\370\255\005ė\351\322a\361}co\222\332H\332%\357\270\341K\237]\300f\335\332\326\v\177_\223\243\342\024\005\266n\214\253\264$\350(\223\303E5F\351c\240x?\306*p\f\322(q*\332@\\\372^=\027\064\324?rz+uG\242\220zf\aۣ\202}\270Gm\314&CG\001\253\337\326jZ\323\374\327\255\372\365\244\062~\023[\t\fǧ\267\250\210\200\211#\005\365P\322\006\f\361\034\256\317\022'P\272\301V\264\263\232\a\035\061\263;\204\026\063Z\231!\370\vVŰ\020v\267ͺ\330", <incomplete sequence \367\203>..., this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c4120e8, ptr=<optimized out>, nbytes=59207) at bsock_tcp.c:890 0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c4120e8) at bsock_tcp.c:545 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c4120e8) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x10a46a8, bs=0x7f9c5c4120e8, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a46a8) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a46a8) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x10a46a8) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x10a46a8) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c00dbc8, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c00dbc8) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c30ff18) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 7 (LWP 13094): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c567f382c "V\026", this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c310608, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890 0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c310608) at bsock_tcp.c:478 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c310608) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x10ac308, bs=0x7f9c5c310608, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10ac308) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x10ac308) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x10ac308) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x10ac308) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c414328, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c414328) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c00bb88) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 6 (LWP 11439): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x11093bc "\030\367`\372\377\376\374\210\002&QoBB\204\225\203\371;O\217\001\347X\273\212K\027\376`\310\005\274#\306z\270\027n\250\003\023*\352\060=\020S\020]\261\216\204\373\212\217\066:R;qb9\021SY\233\270\b$X],K\270y\016\033\277\377\261\023A\201;s\337@ӥ F\316\202\067\374!Ɏ.\"[\026\324\376\r\352\307\326n;\273\241\032\266", this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c413628, ptr=<optimized out>, nbytes=65058) at bsock_tcp.c:890 0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c413628) at bsock_tcp.c:545 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c413628) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x10a88a8, bs=0x7f9c5c413628, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a88a8) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a88a8) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x10a88a8) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x10a88a8) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c007dd8, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c007dd8) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c009248) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 5 (LWP 11436): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c58ff882c "\234\177", this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c414b88, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:890 0000003 0x00007f9c640053cc in BSOCK_TCP::recv (this=0x7f9c5c414b88) at bsock_tcp.c:478 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c414b88) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x10a5468, bs=0x7f9c5c414b88, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a5468) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a5468) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x10a5468) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x10a5468) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c003e58, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c003e58) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c0052c8) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 4 (LWP 11435): #0 0x00007f9c631261fd in read () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64005a9f in read_nbytes (nbytes=<optimized out>, ptr=0x7f9c5c442a14 "\223f\020\244\035\"cK%H\313\302\307\302\306F\307C\317\034\216\224ԣjr\307>ĺ\272\033){\313\v", this=<optimized out>) at bsock_tcp.c:906 0000002 BSOCK_TCP::read_nbytes (this=0x7f9c5c20e1d8, ptr=<optimized out>, nbytes=65528) at bsock_tcp.c:890 0000003 0x00007f9c64005606 in BSOCK_TCP::recv (this=0x7f9c5c20e1d8) at bsock_tcp.c:545 0000004 0x00007f9c63ffbc8d in bget_msg (sock=0x7f9c5c20e1d8) at bget_msg.c:50 0000005 0x0000000000409978 in do_append_data (jcr=0x10a2408, bs=0x7f9c5c20e1d8, what=0x41f780 "FD") at append.c:208 0000006 0x0000000000411d03 in append_data_cmd (jcr=0x10a2408) at fd_cmds.c:269 0000007 0x0000000000412121 in do_fd_commands (jcr=0x10a2408) at fd_cmds.c:225 0000008 0x0000000000412322 in run_job (jcr=0x10a2408) at fd_cmds.c:181 0000009 0x0000000000412d45 in do_job_run (jcr=0x10a2408) at job.c:237 0000010 0x000000000041136f in handle_director_connection (dir=0x7f9c5c002868, job_name=<optimized out>) at dir_cmd.c:281 0000011 handle_connection_request (arg=0x7f9c5c002868) at dir_cmd.c:362 0000012 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000013 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c003cd8) at lockmgr.c:932 0000014 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000015 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000016 0x0000000000000000 in ?? () Thread 3 (LWP 11434): #0 0x00007f9c62097817 in ftruncate64 () from /lib/x86_64-linux-gnu/libc.so.6 0000001 0x00007f9c6449e11b in unix_file_device::d_truncate (this=0x1090f08, dcr=0x12b6e18) at backends/unix_file_device.c:222 0000002 0x00007f9c64487d87 in truncate (dcr=0x12b6e18, this=<optimized out>) at dev.h:470 0000003 DCR::rewrite_volume_label (this=0x12b6e18, recycle=true) at label.c:533 0000004 0x00007f9c6448d90b in DCR::mount_next_write_volume (this=0x12b6e18) at mount.c:272 0000005 0x00007f9c64486a39 in fixup_device_block_write_error (dcr=0x12b6e18, retries=4) at device.c:118 0000006 0x00007f9c644813bb in DCR::write_block_to_device (this=0x12b6e18) at block.c:408 0000007 0x00007f9c6448fe75 in DCR::write_record (this=0x12b6e18) at record.c:470 0000008 0x0000000000409a7c in do_append_data (jcr=0x10a0f88, bs=0x7f9c5c30fb08, what=0x41f780 "FD") at append.c:222 0000009 0x0000000000411d03 in append_data_cmd (jcr=0x10a0f88) at fd_cmds.c:269 0000010 0x0000000000412121 in do_fd_commands (jcr=0x10a0f88) at fd_cmds.c:225 0000011 0x0000000000412322 in run_job (jcr=0x10a0f88) at fd_cmds.c:181 0000012 0x0000000000412d45 in do_job_run (jcr=0x10a0f88) at job.c:237 0000013 0x000000000041136f in handle_director_connection (dir=0x7f9c5c001278, job_name=<optimized out>) at dir_cmd.c:281 0000014 handle_connection_request (arg=0x7f9c5c001278) at dir_cmd.c:362 0000015 0x00007f9c6402fc17 in workq_server (arg=arg@entry=0x626600) at workq.c:335 0000016 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x7f9c5c0026e8) at lockmgr.c:932 0000017 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #18 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #19 0x0000000000000000 in ?? () Thread 2 (LWP 10739): #0 0x00007f9c631236bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64018210 in bthread_cond_timedwait_p (cond=0x7f9c64248680, m=0x7f9c64248640, abstime=0x7f9c60b9cde0, file=0x7f9c6403a152 "watchdog.c", line=313) at lockmgr.c:817 0000002 0x00007f9c6402f40a in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313 0000003 0x00007f9c64017bb2 in lmgr_thread_launcher (x=0x108fc78) at lockmgr.c:932 0000004 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000005 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000006 0x0000000000000000 in ?? () Thread 1 (LWP 10736): #0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567 0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 0000005 0x0000000000000000 in ?? () #0 0x00007f9c6312691d in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. 0000001 0x00007f9c64007c59 in bmicrosleep (sec=30, usec=0) at bsys.c:100 100 bsys.c: No such file or directory. timeout = {tv_sec = 30, tv_nsec = 0} tv = {tv_sec = 0, tv_usec = 140309669443060} tz = {tz_minuteswest = 0, tz_dsttime = 0} status = <optimized out> 0000002 0x00007f9c64017c54 in check_deadlock () at lockmgr.c:567 567 lockmgr.c: No such file or directory. __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, -3419864286863657408, 140309653787168, 140309631203776, 140309681307712, 3, 3437074444671153728, 3437086053367445056}, __mask_was_saved = 0}}, __pad = {0x7f9c61b9ef60, 0x0, 0x18, 0x7f9c61b9ef60}} __cancel_routine = 0x7f9c64017b50 <cln_hdl(void*)> __not_first_call = <optimized out> old = 0 0000003 0x00007f9c6311eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. 0000004 0x00007f9c6209c70d in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. 0000005 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. |
|
The core file is 13MB compressed (tehre is a 2MB upload limit here). If you need it I can make it available on one of our web servers. | |
Hello Scottles, Thank you very much for our help. Yes, please make the core file available so that I can download it. Best regards, Philipp |
|
Please find a copy of the core file here: https://shodan.netsight.co.uk/bareos/bareos-sd.core.10735.gz |
|
I successfully downloaded the core file. Thanks | |
Ok, I will remove the core file from the web server. Let me know if you need access to it again. | |
Hey, I can provide additional tracebacks and core dumps if required as the daemon has crashed twice more since the last trace back was provided. | |
We did some tests and were not able to reproduce it reliably. As we do not see this problem in other installations we work on other tasks for now. |
|
Fix committed to bareos bareos-15.2 branch with changesetid 5766. | |
Fix committed to bareos bareos-14.2 branch with changesetid 5818. | |
bareos: master da1f26aa 2015-02-08 22:29 Ported: N/A Details Diff |
Volume manager cleanup - Cleanup comments - Make concurrent read from filedevice configurable - Use find_read_volume() instead of is_on_read_volume_list() - Make free_volume_list() a generic function. Instead of some trickery with using some temp vars give free_volume_list() two args e.g. a comment what about what its freeing and a dlist with the actual volume list it needs to free. Then we can reuse this in 3 places to delete any volume list. We also don't have to lock the vol_list when freeing the temp_vol_list as it now no longer "misuses" the global dlist pointer. Issues 0000414: Bareos storage daemon crashes during backups |
Affected Issues 0000414 |
|
mod - src/stored/reserve.c | Diff File | ||
mod - src/stored/stored_conf.c | Diff File | ||
mod - src/stored/stored_conf.h | Diff File | ||
mod - src/stored/vol_mgr.c | Diff File | ||
bareos: bareos-14.2 8dc8375d 2015-04-25 17:47 Ported: N/A Details Diff |
Overhaul of the vol_mgr. Remove some new code that seems to give some race conditions. This essentially restores the volume manager to the way it worked in 13.2. Additionally we changed the following: - Cleanup comments - Make free_volume_list() a generic function. Instead of some trickery with using some temp vars give free_volume_list() two args e.g. a comment what about what its freeing and a dlist with the actual volume list it needs to free. Then we can reuse this in 3 places to delete any volume list. We also don't have to lock the vol_list when freeing the temp_vol_list as it now no longer "misuses" the global dlist pointer. Issues 0000414: Bareos storage daemon crashes during backups |
Affected Issues 0000414 |
|
mod - src/stored/reserve.c | Diff File | ||
mod - src/stored/vol_mgr.c | Diff File | ||
bareos: bareos-15.2 0b6435d7 2015-11-12 18:39 Committer: mvwieringen Ported: N/A Details Diff |
Fix random crashes on sd The block variable was set to the dcr->block, but that can be altered in the call to dev->set_label_blocksize(dcr). When that happens, the code goes on with the wrong block. We removed the whole local variable as it makes no sense and is only referenced 3 times when calling empty_block() Fixes 0000414: Bareos storage daemon crashes during backups Fixse 0000483: bareos-sd crash during backup Fixes 0000522: storage daemon crashes ocassionally when starting a new job Fixes 0000552: SD crashes in -current Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com> |
Affected Issues 0000414, 0000483, 0000522, 0000552, 0000564 |
|
mod - src/stored/label.c | Diff File | ||
bareos: bareos-14.2 3a09212c 2015-11-12 18:39 Committer: mvwieringen Ported: N/A Details Diff |
Fix random crashes on sd The block variable was set to the dcr->block, but that can be altered in the call to dev->set_label_blocksize(dcr). When that happens, the code goes on with the wrong block. We removed the whole local variable as it makes no sense and is only referenced 3 times when calling empty_block() Fixes 0000414: Bareos storage daemon crashes during backups Fixse 0000483: bareos-sd crash during backup Fixes 0000522: storage daemon crashes ocassionally when starting a new job Fixes 0000552: SD crashes in -current Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com> |
Affected Issues 0000414, 0000522, 0000552 |
|
mod - src/stored/label.c | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-01-28 18:56 | scottles | New Issue | |
2015-01-28 18:56 | scottles | File Added: bareos.log | |
2015-01-29 09:49 | pstorz | Assigned To | => pstorz |
2015-01-29 09:49 | pstorz | Status | new => assigned |
2015-01-29 10:11 | pstorz | Note Added: 0001231 | |
2015-01-29 10:11 | pstorz | Status | assigned => feedback |
2015-01-29 12:12 | scottles | Note Added: 0001233 | |
2015-01-29 12:12 | scottles | Status | feedback => assigned |
2015-01-29 12:30 | pstorz | Note Added: 0001234 | |
2015-01-29 12:30 | pstorz | Status | assigned => feedback |
2015-01-29 12:44 | scottles | Note Added: 0001235 | |
2015-01-29 12:44 | scottles | Status | feedback => assigned |
2015-02-03 12:15 | scottles | Note Added: 0001253 | |
2015-02-04 11:45 | scottles | Note Added: 0001256 | |
2015-02-04 11:47 | scottles | Note Added: 0001257 | |
2015-02-05 09:02 | pstorz | Note Added: 0001258 | |
2015-02-05 09:02 | pstorz | Status | assigned => feedback |
2015-02-05 12:02 | scottles | Note Added: 0001259 | |
2015-02-05 12:02 | scottles | Status | feedback => assigned |
2015-02-05 12:27 | pstorz | Note Added: 0001260 | |
2015-02-05 12:29 | scottles | Note Added: 0001261 | |
2015-02-10 17:02 | scottles | Note Added: 0001276 | |
2015-03-31 14:34 | pstorz | Assigned To | pstorz => |
2015-03-31 14:36 | pstorz | Note Added: 0001667 | |
2015-03-31 14:36 | pstorz | Status | assigned => confirmed |
2015-06-02 22:36 | mvwieringen | Changeset attached | => bareos master da1f26aa |
2015-06-17 12:13 | joergs | Relationship added | child of 0000474 |
2015-06-17 20:26 | mvwieringen | Changeset attached | => bareos bareos-14.2 8dc8375d |
2015-11-13 10:19 | stephand | Relationship added | related to 0000522 |
2015-11-13 10:19 | stephand | Relationship added | related to 0000552 |
2015-11-13 10:19 | stephand | Relationship added | related to 0000483 |
2015-11-13 17:27 | mvwieringen | Changeset attached | => bareos bareos-15.2 0b6435d7 |
2015-11-13 17:27 | mvwieringen | Note Added: 0001960 | |
2015-11-13 17:27 | mvwieringen | Status | confirmed => resolved |
2015-11-13 17:27 | mvwieringen | Resolution | open => fixed |
2015-11-17 12:01 | mvwieringen | Changeset attached | => bareos bareos-14.2 3a09212c |
2015-11-17 12:01 | mvwieringen | Note Added: 0001975 | |
2019-12-18 15:25 | arogge | Status | resolved => closed |