View Issue Details

IDProjectCategoryView StatusLast Update
0000552bareos-corestorage daemonpublic2019-12-18 15:25
Reporteravantsysadm@avant.ca Assigned To 
PrioritynormalSeveritycrashReproducibilitysometimes
Status closedResolutionfixed 
PlatformLinuxOSCentOSOS Version6
Product Version15.4.0 
Summary0000552: SD crashes in -current
DescriptionWhile attempting to reproduce a next-tape-selection problem on the mailing list, my SD crashed. Maybe this backtrace is useful. I was trying to run a "status storage=TL1000" at the time.
Additional InformationCreated /var/lib/bareos/bareos-sd.core.7501 for doing postmortem debugging
Missing separate debuginfo for
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/fa/be1ca508dffca0ce7e6bffdc6197edd22e4583
[New Thread 7503]
[New Thread 7505]
[New Thread 7506]
[New Thread 18715]
[New Thread 7501]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/sbin/bareos-sd -g bareos -c /etc/bareos/bareos-sd.conf'.
#0 0x00007fefedf13fbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
$1 = '\000' <repeats 127 times>
$2 = 0x2027068 "bareos-sd"
$3 = 0x20270a8 "/usr/sbin/bareos-sd"
$4 = 0x0
$5 = 0x7fefef0156d2 "15.4.0 (03 October 2015)"
$6 = 0x7fefef0156eb "x86_64-redhat-linux-gnu"
$7 = 0x7fefef015703 "redhat"
$8 = 0x7fefef01570a "CentOS release 6.6 (Final)"
$9 = "backup1.ad.avant.ca", '\000' <repeats 236 times>
$10 = 0x7fefef015c48 "redhat CentOS release 6.6 (Final)"
Environment variable "TestName" not defined.
#0 0x00007fefedf13fbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007fefeefe51f2 in bmicrosleep (sec=30, usec=0) at bsys.c:171
0000002 0x00007fefeeff5f31 in check_deadlock () at lockmgr.c:566
0000003 0x00007fefedf0ca51 in start_thread (arg=0x7fefe596c700) at pthread_create.c:301
0000004 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7fefefb1e7e0 (LWP 7501)):
#0 0x00007fefece9d113 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
0000001 0x00007fefeefd9386 in bnet_thread_server_tcp (addr_list=0x204d198, max_clients=1076814496, sockfds=0x204c698, client_wq=0x628540, nokeepalive=false, handle_client_request=0x1) at bnet_server_tcp.c:298
0000002 0x000000000041d78e in main (argc=<value optimized out>, argv=<value optimized out>) at stored.c:325

Thread 4 (Thread 0x7fefdffff700 (LWP 18715)):
#0 0x00007fefedf1432d in __libc_waitpid (pid=<value optimized out>, stat_loc=<value optimized out>, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:41
0000001 0x00007fefef0054d1 in signal_handler (sig=11) at signal.c:240
0000002 <signal handler called>
0000003 smart_alloc_msg (file=<value optimized out>, line=229, fmt=0x7fefef018580 "Overrun buffer: len=%d addr=%p allocated: %s:%d called from %s:%d\n") at smartall.c:113
0000004 0x00007fefef006792 in sm_free (file=0x7fefef015275 "mem_pool.c", line=254, fp=0x7fefb4001248) at smartall.c:230
0000005 0x00007fefeeff7004 in sm_free_pool_memory (fname=<value optimized out>, lineno=<value optimized out>, obuf=0x7fefb4001260 "") at mem_pool.c:254
0000006 0x00007fefef45ca6a in DEVICE::set_blocksizes (this=0x7fefd8003c18, dcr=0x7fefb4112e78) at dev.c:484
0000007 0x00007fefef462c43 in read_dev_volume_label (dcr=0x7fefb4112e78) at label.c:286
0000008 0x00007fefef464fcd in DCR::check_volume_label (this=0x7fefb4112e78, ask=@0x7fefdfffe8bf, autochanger=@0x7fefdfffe8be) at mount.c:431
0000009 0x00007fefef465cce in DCR::mount_next_write_volume (this=0x7fefb4112e78) at mount.c:259
0000010 0x00007fefef44f0dc in acquire_device_for_append (dcr=0x7fefb4112e78) at acquire.c:436
0000011 0x000000000040892c in do_append_data (jcr=0x7fefb4001a18, bs=0x204d198, what=0x420750 "FD") at append.c:76
0000012 0x00000000004114c3 in append_data_cmd (jcr=0x7fefb4001a18) at fd_cmds.c:269
0000013 0x0000000000410c99 in do_fd_commands (jcr=0x7fefb4001a18) at fd_cmds.c:225
0000014 0x0000000000411640 in run_job (jcr=0x7fefb4001a18) at fd_cmds.c:181
0000015 0x0000000000412757 in do_job_run (jcr=0x7fefb4001a18) at job.c:237
0000016 0x00000000004109cf in handle_director_connection (dir=0x2054588) at dir_cmd.c:286
0000017 0x00000000004198ab in handle_connection_request (arg=0x2054588) at socket_server.c:99
#18 0x00007fefef00f77d in workq_server (arg=0x628540) at workq.c:335
#19 0x00007fefeeff5e6d in lmgr_thread_launcher (x=0x204c6f8) at lockmgr.c:926
0000020 0x00007fefedf0ca51 in start_thread (arg=0x7fefdffff700) at pthread_create.c:301
0000021 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7fefdebfd700 (LWP 7506)):
#0 0x00007fefece9d113 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
0000001 0x0000000000416edc in ndmp_thread_server (arg=0x628310) at ndmp_tape.c:1467
0000002 0x00007fefeeff5e6d in lmgr_thread_launcher (x=0x204d678) at lockmgr.c:926
0000003 0x00007fefedf0ca51 in start_thread (arg=0x7fefdebfd700) at pthread_create.c:301
0000004 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7fefdf5fe700 (LWP 7505)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
0000001 0x00007fefeeff4c40 in bthread_cond_timedwait_p (cond=0x7fefef229b20, m=0x7fefef229ae0, abstime=0x7fefdf5fdd60, file=0x7fefef019d0a "watchdog.c", line=313) at lockmgr.c:811
0000002 0x00007fefef00f2d8 in watchdog_thread (arg=<value optimized out>) at watchdog.c:313
0000003 0x00007fefeeff5e6d in lmgr_thread_launcher (x=0x204cf48) at lockmgr.c:926
0000004 0x00007fefedf0ca51 in start_thread (arg=0x7fefdf5fe700) at pthread_create.c:301
0000005 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7fefe596c700 (LWP 7503)):
#0 0x00007fefedf13fbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007fefeefe51f2 in bmicrosleep (sec=30, usec=0) at bsys.c:171
0000002 0x00007fefeeff5f31 in check_deadlock () at lockmgr.c:566
0000003 0x00007fefedf0ca51 in start_thread (arg=0x7fefe596c700) at pthread_create.c:301
0000004 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
#0 0x00007fefedf13fbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
No locals.
0000001 0x00007fefeefe51f2 in bmicrosleep (sec=30, usec=0) at bsys.c:171
171 status = nanosleep(&timeout, NULL);
timeout = {tv_sec = 30, tv_nsec = 0}
tv = {tv_sec = 3, tv_usec = 140668483622179}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
status = <value optimized out>
0000002 0x00007fefeeff5f31 in check_deadlock () at lockmgr.c:566
566 while (!bmicrosleep(30, 0)) {
__clframe = {__cancel_routine = 0x7fefeeff5a20 <cln_hdl(void*)>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <value optimized out>}
old = 0
0000003 0x00007fefedf0ca51 in start_thread (arg=0x7fefe596c700) at pthread_create.c:301
301 THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd));
__res = <value optimized out>
pd = 0x7fefe596c700
now = <value optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140668325775104, 8087716692990061641, 140668468073312, 140668325775808, 0, 3, -8078722887656444855, -8078740185181098935}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <value optimized out>
pagesize_m1 = <value optimized out>
sp = <value optimized out>
freesize = <value optimized out>
0000004 0x00007fefecea693d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
115 call *%rax
No locals.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
TagsNo tags attached.

Relationships

related to 0000414 closed Bareos storage daemon crashes during backups 
child of 0000474 closed Release bareos-14.2.6 

Activities

maik

maik

2015-11-06 16:46

administrator   ~0001925

Are the packages from download.bareos.org / nightly, which version ?
maik

maik

2015-11-06 16:47

administrator   ~0001926

Which package versions are you using?
avantsysadm@avant.ca

avantsysadm@avant.ca

2015-11-06 16:48

reporter   ~0001927

Yes, sorry, should have been more specific...

bareos-storage-15.4.0.git.1446221083.2c08394-1171.1.el6.x86_64

(all the other pkgs match that version)
maik

maik

2015-11-06 17:02

administrator   ~0001930

OK, thanks for the information.
Does it happen on 15.2.1 also? Otherwise we will care about it later and concentrate on 15.2.1 related stuff first.
avantsysadm@avant.ca

avantsysadm@avant.ca

2015-11-06 17:13

reporter   ~0001934

I have only observed this on this nightly build so far.

(Perhaps "nightly" or "experimental" should be an option, too, when reporting bugs? See 0000538.)
mvwieringen

mvwieringen

2015-11-13 17:27

developer   ~0001962

Fix committed to bareos bareos-15.2 branch with changesetid 5766.
mvwieringen

mvwieringen

2015-11-17 12:01

developer   ~0001977

Fix committed to bareos bareos-14.2 branch with changesetid 5818.

Related Changesets

bareos: bareos-15.2 0b6435d7

2015-11-12 18:39

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
Fix random crashes on sd

The block variable was set to the dcr->block, but that can be altered in
the call to dev->set_label_blocksize(dcr).

When that happens, the code goes on with the wrong block.
We removed the whole local variable as it makes no sense and is only
referenced 3 times when calling empty_block()

Fixes 0000414: Bareos storage daemon crashes during backups
Fixse 0000483: bareos-sd crash during backup
Fixes 0000522: storage daemon crashes ocassionally when starting a new job
Fixes 0000552: SD crashes in -current

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000414, 0000483, 0000522, 0000552, 0000564
mod - src/stored/label.c Diff File

bareos: bareos-14.2 3a09212c

2015-11-12 18:39

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
Fix random crashes on sd

The block variable was set to the dcr->block, but that can be altered in
the call to dev->set_label_blocksize(dcr).

When that happens, the code goes on with the wrong block.
We removed the whole local variable as it makes no sense and is only
referenced 3 times when calling empty_block()

Fixes 0000414: Bareos storage daemon crashes during backups
Fixse 0000483: bareos-sd crash during backup
Fixes 0000522: storage daemon crashes ocassionally when starting a new job
Fixes 0000552: SD crashes in -current

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000414, 0000522, 0000552
mod - src/stored/label.c Diff File

Issue History

Date Modified Username Field Change
2015-11-05 23:50 avantsysadm@avant.ca New Issue
2015-11-06 16:46 maik Note Added: 0001925
2015-11-06 16:47 maik Note Added: 0001926
2015-11-06 16:47 maik Status new => feedback
2015-11-06 16:48 avantsysadm@avant.ca Note Added: 0001927
2015-11-06 16:48 avantsysadm@avant.ca Status feedback => new
2015-11-06 17:02 maik Note Added: 0001930
2015-11-06 17:02 maik Status new => feedback
2015-11-06 17:13 avantsysadm@avant.ca Note Added: 0001934
2015-11-06 17:13 avantsysadm@avant.ca Status feedback => new
2015-11-06 17:17 maik Status new => acknowledged
2015-11-06 17:17 maik Product Version => 15.4.0
2015-11-13 10:19 stephand Relationship added related to 0000414
2015-11-13 17:27 mvwieringen Changeset attached => bareos bareos-15.2 0b6435d7
2015-11-13 17:27 mvwieringen Note Added: 0001962
2015-11-13 17:27 mvwieringen Status acknowledged => resolved
2015-11-13 17:27 mvwieringen Resolution open => fixed
2015-11-17 12:01 mvwieringen Changeset attached => bareos bareos-14.2 3a09212c
2015-11-17 12:01 mvwieringen Note Added: 0001977
2015-11-30 18:45 joergs Relationship added child of 0000474
2019-12-18 15:25 arogge Status resolved => closed