View Issue Details

IDProjectCategoryView StatusLast Update
0000564bareos-core[All Projects] directorpublic2017-06-08 15:27
Reporteravantsysadm@avant.caAssigned To 
PrioritynormalSeveritycrashReproducibilityalways
Status closedResolutionfixed 
PlatformLinuxOSCentOSOS Version6
Product Version15.2.1 
Fixed in Version 
Summary0000564: director crashes when inventorying library
Descriptionrunning the second RC build (I think...):


Created /var/lib/bareos/bareos-dir.core.2707 for doing postmortem debugging
Missing separate debuginfo for
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/fa/be1ca508dffca0ce7e6bffdc6197edd22e4583
[New Thread 2710]
[New Thread 2743]
[New Thread 2744]
[New Thread 2745]
[New Thread 3700]
[New Thread 3911]
[New Thread 3986]
[New Thread 2707]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/sbin/bareos-dir -g bareos -c /etc/bareos/bareos-dir.conf'.
#0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
$1 = '\000' <repeats 127 times>
$2 = 0x1b8a068 "bareos-dir"
$3 = 0x1b8a0a8 "/usr/sbin/bareos-dir"
$4 = 0x1bcb3c8 "PostgreSQL"
$5 = 0x7f0f2e99ee12 "15.2.1 (24 August 2015)"
$6 = 0x7f0f2e99ee2a "x86_64-redhat-linux-gnu"
$7 = 0x7f0f2e99ee42 "redhat"
$8 = 0x7f0f2e99ee49 "CentOS release 6.6 (Final)"
$9 = "backup1.ad.avant.ca", '\000' <repeats 236 times>
$10 = 0x7f0f2e99f388 "redhat CentOS release 6.6 (Final)"
Environment variable "TestName" not defined.
#0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171
0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566
0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301
0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 8 (Thread 0x7f0f2faa57e0 (LWP 2707)):
#0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007f0f2e96e352 in bmicrosleep (sec=60, usec=0) at bsys.c:171
0000002 0x00000000004447e0 in wait_for_next_job (one_shot_job_to_run=<value optimized out>) at scheduler.c:124
0000003 0x0000000000422665 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:393

Thread 7 (Thread 0x7f0f121fc700 (LWP 3986)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x6a8c68, m=0x6a8c40, abstime=0x7f0f121fbd60, file=0x7f0f2e9a383c "workq.c", line=312) at lockmgr.c:811
0000002 0x00007f0f2e998f63 in workq_server (arg=0x6a8c40) at workq.c:312
0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f140018d8) at lockmgr.c:926
0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f121fc700) at pthread_create.c:301
0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7f0f13fff700 (LWP 3911)):
#0 0x00007f0f2d89d32d in __libc_waitpid (pid=<value optimized out>, stat_loc=<value optimized out>, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:41
0000001 0x00007f0f2e98ebd1 in signal_handler (sig=11) at signal.c:240
0000002 <signal handler called>
0000003 0x00007f0f2e969396 in bvsnprintf(char *, int32_t, const char *, typedef __va_list_tag __va_list_tag *) (buffer=0x7f0e5a248050 "", maxlen=-1646561203, format=<value optimized out>, args=0x7f0f13ffe510) at bsnprintf.c:413
0000004 0x00007f0f2e969fc7 in BSOCK::fsend (this=0x7f0f140012a8, fmt=0x48eaf6 "%s") at bsock.c:216
0000005 0x0000000000453e09 in UAContext::send_msg (this=0x7f0f08003fb8, fmt=0x48eaf6 "%s") at ua_output.c:1374
0000006 0x0000000000442b02 in get_volume_name_from_SD (ua=0x7f0f08003fb8, Slot=<value optimized out>, drive=<value optimized out>) at sd_cmds.c:195
0000007 0x000000000046f77a in update_slots (ua=0x7f0f08003fb8) at ua_update.c:1084
0000008 0x00000000004723f8 in update_cmd (ua=0x7f0f08003fb8, cmd=<value optimized out>) at ua_update.c:106
0000009 0x000000000044a919 in do_a_command (ua=0x7f0f08003fb8) at ua_cmds.c:331
0000010 0x000000000046807c in handle_UA_client_request (user=0x7f0f140012a8) at ua_server.c:97
0000011 0x000000000043eebd in handle_connection_request (arg=0x7f0f140012a8) at socket_server.c:83
0000012 0x00007f0f2e998e7d in workq_server (arg=0x6a8c40) at workq.c:335
0000013 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f14001478) at lockmgr.c:926
0000014 0x00007f0f2d895a51 in start_thread (arg=0x7f0f13fff700) at pthread_create.c:301
0000015 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7f0f135fe700 (LWP 3700)):
#0 0x00007f0f2d89c7dd in read () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007f0f2e96afd6 in BSOCK_TCP::read_nbytes (this=0x7f0f0001b6a8, ptr=0x7f0f135fd6cc "9", nbytes=4) at bsock_tcp.c:906
0000002 0x00007f0f2e96b476 in BSOCK_TCP::recv (this=0x7f0f0001b6a8) at bsock_tcp.c:478
0000003 0x0000000000442ae7 in get_volume_name_from_SD (ua=0x7f0f00011d08, Slot=<value optimized out>, drive=<value optimized out>) at sd_cmds.c:194
0000004 0x000000000046f77a in update_slots (ua=0x7f0f00011d08) at ua_update.c:1084
0000005 0x00000000004723f8 in update_cmd (ua=0x7f0f00011d08, cmd=<value optimized out>) at ua_update.c:106
0000006 0x000000000044a919 in do_a_command (ua=0x7f0f00011d08) at ua_cmds.c:331
0000007 0x000000000046807c in handle_UA_client_request (user=0x7f0f14001168) at ua_server.c:97
0000008 0x000000000043eebd in handle_connection_request (arg=0x7f0f14001168) at socket_server.c:83
0000009 0x00007f0f2e998e7d in workq_server (arg=0x6a8c40) at workq.c:335
0000010 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f14002618) at lockmgr.c:926
0000011 0x00007f0f2d895a51 in start_thread (arg=0x7f0f135fe700) at pthread_create.c:301
0000012 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f0f211d3700 (LWP 2745)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x6a8ee0, m=0x6a8fc0, abstime=0x7f0f211d2c70, file=0x48490c "stats.c", line=117) at lockmgr.c:811
0000002 0x0000000000444f2c in wait_for_next_run (arg=<value optimized out>) at stats.c:117
0000003 statistics_thread_runner (arg=<value optimized out>) at stats.c:172
0000004 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc1ea8) at lockmgr.c:926
0000005 0x00007f0f2d895a51 in start_thread (arg=0x7f0f211d3700) at pthread_create.c:301
0000006 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f0f21bd4700 (LWP 2744)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x7f0f2ebb3820, m=0x7f0f2ebb37e0, abstime=0x7f0f21bd3d60, file=0x7f0f2e9a357a "watchdog.c", line=313) at lockmgr.c:811
0000002 0x00007f0f2e9989d8 in watchdog_thread (arg=<value optimized out>) at watchdog.c:313
0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc4088) at lockmgr.c:926
0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f21bd4700) at pthread_create.c:301
0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7f0f225d5700 (LWP 2743)):
#0 0x00007f0f2c826113 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
0000001 0x00007f0f2e962516 in bnet_thread_server_tcp (addr_list=0x7f0f140014c8, max_clients=576535952, sockfds=0x7f0f14001078, client_wq=0x6a8c40, nokeepalive=false, handle_client_request=0x1) at bnet_server_tcp.c:298
0000002 0x000000000043ee2f in connect_thread (arg=0x1b8c328) at socket_server.c:100
0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc34e8) at lockmgr.c:926
0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f225d5700) at pthread_create.c:301
0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f0f252f5700 (LWP 2710)):
#0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171
0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566
0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301
0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
#0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
No locals.
0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171
171 status = nanosleep(&timeout, NULL);
timeout = {tv_sec = 30, tv_nsec = 0}
tv = {tv_sec = 3, tv_usec = 139703182943795}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
status = <value optimized out>
0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566
566 while (!bmicrosleep(30, 0)) {
__clframe = {__cancel_routine = 0x7f0f2e97eb30 <cln_hdl(void*)>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <value optimized out>}
old = 0
0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301
301 THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd));
__res = <value optimized out>
pd = 0x7f0f252f5700
now = <value optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139703025096448, -6184079751837869380, 139703167394656, 139703025097152, 0, 3, 6067464034012404412, 6067447283278191292}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <value optimized out>
pagesize_m1 = <value optimized out>
sp = <value optimized out>
freesize = <value optimized out>
0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
115 call *%rax
No locals.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
Steps To Reproducerun two inventories simultaneously?
Additional Information*update storage=TL1000 scan
Update choice:
     1: Volume parameters
     2: Pool from resource
     3: Slots from autochanger
     4: Long term statistics
Choose catalog item to update (1-4): 3
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "slots" command.
Device "autochanger-tl1000" has 7 slots.
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "list" command.
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3307 Issuing autochanger "unload slot 3, drive 0" command.
3304 Issuing autochanger "load slot 1, drive 0" command.
3305 Autochanger "load slot 1, drive 0", status is OK.
generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error
3001 Volume=000001L6 Slot=1
Catalog record for Volume "000001L6" is up to date.
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3307 Issuing autochanger "unload slot 1, drive 0" command.
3304 Issuing autochanger "load slot 2, drive 0" command.
3305 Autochanger "load slot 2, drive 0", status is OK.
generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error
3001 Volume=000010L6 Slot=2
Catalog record for Volume "000010L6" is up to date.
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3307 Issuing autochanger "unload slot 2, drive 0" command.
[root@backup1 ~]#


while in another window, simultaneously:
[root@backup1 ~]# bconsole
Connecting to Director backup1.ad.avant.ca:9101
1000 OK: backup1-dir Version: 15.2.1 (24 August 2015)
Enter a period to cancel a command.
*list storages
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
+-----------+--------+-------------+
| storageid | name | autochanger |
+-----------+--------+-------------+
| 1 | File | 0 |
| 2 | TL1000 | 0 |
| 3 | PV124T | 0 |
+-----------+--------+-------------+
*update storage=PV124T slots scan
Using Catalog "MyCatalog"
Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "slots" command.
Device "autochanger-pv124t" has 16 slots.
Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "list" command.
Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ...
3307 Issuing autochanger "unload slot 15, drive 0" command.
3304 Issuing autochanger "load slot 1, drive 0" command.
3305 Autochanger "load slot 1, drive 0", status is OK.
[root@backup1 ~]#
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

pstorz

pstorz

2015-11-16 11:52

administrator   ~0001965

Hello,

please verify if in current nightly build the problem still exists,


Thanks

avantsysadm@avant.ca

avantsysadm@avant.ca

2015-11-16 18:11

reporter   ~0001966

Can't tell... now I get this:

*status storage=TL1000 slots
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "slots" command.
Device "autochanger-tl1000" has 7 slots.
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3306 Issuing autochanger "listall" command.
 Slot | Volume Name | Status | Media Type | Pool |
------+------------------+-----------+----------------+--------------------------|
    1 | 000001L6 | Used | LTO6 | LTO6 |
    2 | 000010L6 | Full | LTO6 | LTO6 |
    3 | 000012L6 | Append | LTO6 | LTO6 |
    4 | 000006L6 | Append | LTO6 | LTO6 |
    5 | 000005L6 | Append | LTO6 | LTO6 |
    6 | 000009L6 | Full | LTO6 | LTO6 |
    7@| ? | ? | ? | ? |
*mount storage=TL1000 drive=0 slot=3
Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ...
3304 Issuing autochanger "load slot 3, drive 0" command.
3992 Bad autochanger "load slot 3, drive 0": ERR=Child died from signal 15: Termination.
Results=Program killed by BAREOS (timeout)

3901 Unable to open device ""tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst)": ERR=generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error
avantsysadm@avant.ca

avantsysadm@avant.ca

2015-11-16 18:13

reporter   ~0001967

That was with bareos-15.4.0.git.1447429466.dcea847-1173.1.el6.x86_64 (et al.)
avantsysadm@avant.ca

avantsysadm@avant.ca

2015-11-16 19:23

reporter   ~0001969

Never mind... that was the result of a mis-labeled cleaning cartridge. :-(

Current nightly build does *not* crash while doing two "update storage slots scan" commands simultaneously. It hasn't finished, but it's already proceeded much further than ever before (i.e. it got past the first tape on each library).
pstorz

pstorz

2015-11-17 08:20

administrator   ~0001970

fixed by: changeset bareos: bareos-15.2 0b6435d7
pstorz

pstorz

2015-11-23 17:00

administrator   ~0002012

Fix committed to bareos bareos-15.2 branch with changesetid 5923.

Related Changesets

bareos: bareos-15.2 0b6435d7

2015-11-12 17:39:51

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
Fix random crashes on sd

The block variable was set to the dcr->block, but that can be altered in
the call to dev->set_label_blocksize(dcr).

When that happens, the code goes on with the wrong block.
We removed the whole local variable as it makes no sense and is only
referenced 3 times when calling empty_block()

Fixes 0000414: Bareos storage daemon crashes during backups
Fixse 0000483: bareos-sd crash during backup
Fixes 0000522: storage daemon crashes ocassionally when starting a new job
Fixes 0000552: SD crashes in -current

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000414, 0000483, 0000522, 0000552, 0000564
mod - src/stored/label.c Diff File

bareos: bareos-15.2 e652bb0a

2015-11-23 15:52:04

pstorz

Ported: N/A

Details Diff
Fix bsock::fsend problem when msglen is zero

Fixes 0000564 :director crashes when inventorying library
Affected Issues
0000549, 0000564
mod - src/lib/bsock.c Diff File

Issue History

Date Modified Username Field Change
2015-11-13 17:00 avantsysadm@avant.ca New Issue
2015-11-16 11:52 pstorz Note Added: 0001965
2015-11-16 11:52 pstorz Status new => feedback
2015-11-16 18:11 avantsysadm@avant.ca Note Added: 0001966
2015-11-16 18:11 avantsysadm@avant.ca Status feedback => new
2015-11-16 18:13 avantsysadm@avant.ca Note Added: 0001967
2015-11-16 19:23 avantsysadm@avant.ca Note Added: 0001969
2015-11-17 08:19 pstorz Changeset attached => bareos bareos-15.2 0b6435d7
2015-11-17 08:20 pstorz Note Added: 0001970
2015-11-17 08:20 pstorz Status new => closed
2015-11-17 08:20 pstorz Resolution open => fixed
2015-11-23 17:00 pstorz Changeset attached => bareos bareos-15.2 e652bb0a
2015-11-23 17:00 pstorz Note Added: 0002012
2015-11-23 17:00 pstorz Status closed => resolved
2017-06-08 15:27 joergs Status resolved => closed