View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000564 | bareos-core | director | public | 2015-11-13 17:00 | 2017-06-08 15:27 |
Reporter | avantsysadm@avant.ca | Assigned To | |||
Priority | normal | Severity | crash | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | CentOS | OS Version | 6 |
Product Version | 15.2.1 | ||||
Summary | 0000564: director crashes when inventorying library | ||||
Description | running the second RC build (I think...): Created /var/lib/bareos/bareos-dir.core.2707 for doing postmortem debugging Missing separate debuginfo for Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/fa/be1ca508dffca0ce7e6bffdc6197edd22e4583 [New Thread 2710] [New Thread 2743] [New Thread 2744] [New Thread 2745] [New Thread 3700] [New Thread 3911] [New Thread 3986] [New Thread 2707] [Thread debugging using libthread_db enabled] Core was generated by `/usr/sbin/bareos-dir -g bareos -c /etc/bareos/bareos-dir.conf'. #0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) $1 = '\000' <repeats 127 times> $2 = 0x1b8a068 "bareos-dir" $3 = 0x1b8a0a8 "/usr/sbin/bareos-dir" $4 = 0x1bcb3c8 "PostgreSQL" $5 = 0x7f0f2e99ee12 "15.2.1 (24 August 2015)" $6 = 0x7f0f2e99ee2a "x86_64-redhat-linux-gnu" $7 = 0x7f0f2e99ee42 "redhat" $8 = 0x7f0f2e99ee49 "CentOS release 6.6 (Final)" $9 = "backup1.ad.avant.ca", '\000' <repeats 236 times> $10 = 0x7f0f2e99f388 "redhat CentOS release 6.6 (Final)" Environment variable "TestName" not defined. #0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171 0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566 0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301 0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 8 (Thread 0x7f0f2faa57e0 (LWP 2707)): #0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 0000001 0x00007f0f2e96e352 in bmicrosleep (sec=60, usec=0) at bsys.c:171 0000002 0x00000000004447e0 in wait_for_next_job (one_shot_job_to_run=<value optimized out>) at scheduler.c:124 0000003 0x0000000000422665 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:393 Thread 7 (Thread 0x7f0f121fc700 (LWP 3986)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x6a8c68, m=0x6a8c40, abstime=0x7f0f121fbd60, file=0x7f0f2e9a383c "workq.c", line=312) at lockmgr.c:811 0000002 0x00007f0f2e998f63 in workq_server (arg=0x6a8c40) at workq.c:312 0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f140018d8) at lockmgr.c:926 0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f121fc700) at pthread_create.c:301 0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 6 (Thread 0x7f0f13fff700 (LWP 3911)): #0 0x00007f0f2d89d32d in __libc_waitpid (pid=<value optimized out>, stat_loc=<value optimized out>, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:41 0000001 0x00007f0f2e98ebd1 in signal_handler (sig=11) at signal.c:240 0000002 <signal handler called> 0000003 0x00007f0f2e969396 in bvsnprintf(char *, int32_t, const char *, typedef __va_list_tag __va_list_tag *) (buffer=0x7f0e5a248050 "", maxlen=-1646561203, format=<value optimized out>, args=0x7f0f13ffe510) at bsnprintf.c:413 0000004 0x00007f0f2e969fc7 in BSOCK::fsend (this=0x7f0f140012a8, fmt=0x48eaf6 "%s") at bsock.c:216 0000005 0x0000000000453e09 in UAContext::send_msg (this=0x7f0f08003fb8, fmt=0x48eaf6 "%s") at ua_output.c:1374 0000006 0x0000000000442b02 in get_volume_name_from_SD (ua=0x7f0f08003fb8, Slot=<value optimized out>, drive=<value optimized out>) at sd_cmds.c:195 0000007 0x000000000046f77a in update_slots (ua=0x7f0f08003fb8) at ua_update.c:1084 0000008 0x00000000004723f8 in update_cmd (ua=0x7f0f08003fb8, cmd=<value optimized out>) at ua_update.c:106 0000009 0x000000000044a919 in do_a_command (ua=0x7f0f08003fb8) at ua_cmds.c:331 0000010 0x000000000046807c in handle_UA_client_request (user=0x7f0f140012a8) at ua_server.c:97 0000011 0x000000000043eebd in handle_connection_request (arg=0x7f0f140012a8) at socket_server.c:83 0000012 0x00007f0f2e998e7d in workq_server (arg=0x6a8c40) at workq.c:335 0000013 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f14001478) at lockmgr.c:926 0000014 0x00007f0f2d895a51 in start_thread (arg=0x7f0f13fff700) at pthread_create.c:301 0000015 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 5 (Thread 0x7f0f135fe700 (LWP 3700)): #0 0x00007f0f2d89c7dd in read () at ../sysdeps/unix/syscall-template.S:82 0000001 0x00007f0f2e96afd6 in BSOCK_TCP::read_nbytes (this=0x7f0f0001b6a8, ptr=0x7f0f135fd6cc "9", nbytes=4) at bsock_tcp.c:906 0000002 0x00007f0f2e96b476 in BSOCK_TCP::recv (this=0x7f0f0001b6a8) at bsock_tcp.c:478 0000003 0x0000000000442ae7 in get_volume_name_from_SD (ua=0x7f0f00011d08, Slot=<value optimized out>, drive=<value optimized out>) at sd_cmds.c:194 0000004 0x000000000046f77a in update_slots (ua=0x7f0f00011d08) at ua_update.c:1084 0000005 0x00000000004723f8 in update_cmd (ua=0x7f0f00011d08, cmd=<value optimized out>) at ua_update.c:106 0000006 0x000000000044a919 in do_a_command (ua=0x7f0f00011d08) at ua_cmds.c:331 0000007 0x000000000046807c in handle_UA_client_request (user=0x7f0f14001168) at ua_server.c:97 0000008 0x000000000043eebd in handle_connection_request (arg=0x7f0f14001168) at socket_server.c:83 0000009 0x00007f0f2e998e7d in workq_server (arg=0x6a8c40) at workq.c:335 0000010 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x7f0f14002618) at lockmgr.c:926 0000011 0x00007f0f2d895a51 in start_thread (arg=0x7f0f135fe700) at pthread_create.c:301 0000012 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 4 (Thread 0x7f0f211d3700 (LWP 2745)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x6a8ee0, m=0x6a8fc0, abstime=0x7f0f211d2c70, file=0x48490c "stats.c", line=117) at lockmgr.c:811 0000002 0x0000000000444f2c in wait_for_next_run (arg=<value optimized out>) at stats.c:117 0000003 statistics_thread_runner (arg=<value optimized out>) at stats.c:172 0000004 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc1ea8) at lockmgr.c:926 0000005 0x00007f0f2d895a51 in start_thread (arg=0x7f0f211d3700) at pthread_create.c:301 0000006 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 3 (Thread 0x7f0f21bd4700 (LWP 2744)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 0000001 0x00007f0f2e97dd50 in bthread_cond_timedwait_p (cond=0x7f0f2ebb3820, m=0x7f0f2ebb37e0, abstime=0x7f0f21bd3d60, file=0x7f0f2e9a357a "watchdog.c", line=313) at lockmgr.c:811 0000002 0x00007f0f2e9989d8 in watchdog_thread (arg=<value optimized out>) at watchdog.c:313 0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc4088) at lockmgr.c:926 0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f21bd4700) at pthread_create.c:301 0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7f0f225d5700 (LWP 2743)): #0 0x00007f0f2c826113 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 0000001 0x00007f0f2e962516 in bnet_thread_server_tcp (addr_list=0x7f0f140014c8, max_clients=576535952, sockfds=0x7f0f14001078, client_wq=0x6a8c40, nokeepalive=false, handle_client_request=0x1) at bnet_server_tcp.c:298 0000002 0x000000000043ee2f in connect_thread (arg=0x1b8c328) at socket_server.c:100 0000003 0x00007f0f2e97ef7d in lmgr_thread_launcher (x=0x1bc34e8) at lockmgr.c:926 0000004 0x00007f0f2d895a51 in start_thread (arg=0x7f0f225d5700) at pthread_create.c:301 0000005 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7f0f252f5700 (LWP 2710)): #0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171 0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566 0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301 0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 #0 0x00007f0f2d89cfbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) No locals. 0000001 0x00007f0f2e96e352 in bmicrosleep (sec=30, usec=0) at bsys.c:171 171 status = nanosleep(&timeout, NULL); timeout = {tv_sec = 30, tv_nsec = 0} tv = {tv_sec = 3, tv_usec = 139703182943795} tz = {tz_minuteswest = 0, tz_dsttime = 0} status = <value optimized out> 0000002 0x00007f0f2e97f041 in check_deadlock () at lockmgr.c:566 566 while (!bmicrosleep(30, 0)) { __clframe = {__cancel_routine = 0x7f0f2e97eb30 <cln_hdl(void*)>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <value optimized out>} old = 0 0000003 0x00007f0f2d895a51 in start_thread (arg=0x7f0f252f5700) at pthread_create.c:301 301 THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd)); __res = <value optimized out> pd = 0x7f0f252f5700 now = <value optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139703025096448, -6184079751837869380, 139703167394656, 139703025097152, 0, 3, 6067464034012404412, 6067447283278191292}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <value optimized out> pagesize_m1 = <value optimized out> sp = <value optimized out> freesize = <value optimized out> 0000004 0x00007f0f2c82f93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 115 call *%rax No locals. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. | ||||
Steps To Reproduce | run two inventories simultaneously? | ||||
Additional Information | *update storage=TL1000 scan Update choice: 1: Volume parameters 2: Pool from resource 3: Slots from autochanger 4: Long term statistics Choose catalog item to update (1-4): 3 Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "slots" command. Device "autochanger-tl1000" has 7 slots. Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "list" command. Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3307 Issuing autochanger "unload slot 3, drive 0" command. 3304 Issuing autochanger "load slot 1, drive 0" command. 3305 Autochanger "load slot 1, drive 0", status is OK. generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error 3001 Volume=000001L6 Slot=1 Catalog record for Volume "000001L6" is up to date. Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3307 Issuing autochanger "unload slot 1, drive 0" command. 3304 Issuing autochanger "load slot 2, drive 0" command. 3305 Autochanger "load slot 2, drive 0", status is OK. generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error 3001 Volume=000010L6 Slot=2 Catalog record for Volume "000010L6" is up to date. Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3307 Issuing autochanger "unload slot 2, drive 0" command. [root@backup1 ~]# while in another window, simultaneously: [root@backup1 ~]# bconsole Connecting to Director backup1.ad.avant.ca:9101 1000 OK: backup1-dir Version: 15.2.1 (24 August 2015) Enter a period to cancel a command. *list storages Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" +-----------+--------+-------------+ | storageid | name | autochanger | +-----------+--------+-------------+ | 1 | File | 0 | | 2 | TL1000 | 0 | | 3 | PV124T | 0 | +-----------+--------+-------------+ *update storage=PV124T slots scan Using Catalog "MyCatalog" Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "slots" command. Device "autochanger-pv124t" has 16 slots. Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "list" command. Connecting to Storage daemon PV124T at backup1.ad.avant.ca:9103 ... 3307 Issuing autochanger "unload slot 15, drive 0" command. 3304 Issuing autochanger "load slot 1, drive 0" command. 3305 Autochanger "load slot 1, drive 0", status is OK. [root@backup1 ~]# | ||||
Tags | No tags attached. | ||||
Hello, please verify if in current nightly build the problem still exists, Thanks |
|
Can't tell... now I get this: *status storage=TL1000 slots Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "slots" command. Device "autochanger-tl1000" has 7 slots. Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3306 Issuing autochanger "listall" command. Slot | Volume Name | Status | Media Type | Pool | ------+------------------+-----------+----------------+--------------------------| 1 | 000001L6 | Used | LTO6 | LTO6 | 2 | 000010L6 | Full | LTO6 | LTO6 | 3 | 000012L6 | Append | LTO6 | LTO6 | 4 | 000006L6 | Append | LTO6 | LTO6 | 5 | 000005L6 | Append | LTO6 | LTO6 | 6 | 000009L6 | Full | LTO6 | LTO6 | 7@| ? | ? | ? | ? | *mount storage=TL1000 drive=0 slot=3 Connecting to Storage daemon TL1000 at backup1.ad.avant.ca:9103 ... 3304 Issuing autochanger "load slot 3, drive 0" command. 3992 Bad autochanger "load slot 3, drive 0": ERR=Child died from signal 15: Termination. Results=Program killed by BAREOS (timeout) 3901 Unable to open device ""tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst)": ERR=generic_tape_device.c:140 Unable to open device "tapedrive-tl1000" (/dev/tape/by-id/scsi-350016977299e1010-nst): ERR=Input/output error |
|
That was with bareos-15.4.0.git.1447429466.dcea847-1173.1.el6.x86_64 (et al.) | |
Never mind... that was the result of a mis-labeled cleaning cartridge. :-( Current nightly build does *not* crash while doing two "update storage slots scan" commands simultaneously. It hasn't finished, but it's already proceeded much further than ever before (i.e. it got past the first tape on each library). |
|
fixed by: changeset bareos: bareos-15.2 0b6435d7 | |
Fix committed to bareos bareos-15.2 branch with changesetid 5923. | |
bareos: bareos-15.2 0b6435d7 2015-11-12 18:39 Committer: mvwieringen Ported: N/A Details Diff |
Fix random crashes on sd The block variable was set to the dcr->block, but that can be altered in the call to dev->set_label_blocksize(dcr). When that happens, the code goes on with the wrong block. We removed the whole local variable as it makes no sense and is only referenced 3 times when calling empty_block() Fixes 0000414: Bareos storage daemon crashes during backups Fixse 0000483: bareos-sd crash during backup Fixes 0000522: storage daemon crashes ocassionally when starting a new job Fixes 0000552: SD crashes in -current Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com> |
Affected Issues 0000414, 0000483, 0000522, 0000552, 0000564 |
|
mod - src/stored/label.c | Diff File | ||
bareos: bareos-15.2 e652bb0a 2015-11-23 16:52 Ported: N/A Details Diff |
Fix bsock::fsend problem when msglen is zero Fixes 0000564 :director crashes when inventorying library |
Affected Issues 0000564 |
|
mod - src/lib/bsock.c | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-11-13 17:00 | avantsysadm@avant.ca | New Issue | |
2015-11-16 11:52 | pstorz | Note Added: 0001965 | |
2015-11-16 11:52 | pstorz | Status | new => feedback |
2015-11-16 18:11 | avantsysadm@avant.ca | Note Added: 0001966 | |
2015-11-16 18:11 | avantsysadm@avant.ca | Status | feedback => new |
2015-11-16 18:13 | avantsysadm@avant.ca | Note Added: 0001967 | |
2015-11-16 19:23 | avantsysadm@avant.ca | Note Added: 0001969 | |
2015-11-17 08:19 | pstorz | Changeset attached | => bareos bareos-15.2 0b6435d7 |
2015-11-17 08:20 | pstorz | Note Added: 0001970 | |
2015-11-17 08:20 | pstorz | Status | new => closed |
2015-11-17 08:20 | pstorz | Resolution | open => fixed |
2015-11-23 17:00 | pstorz | Changeset attached | => bareos bareos-15.2 e652bb0a |
2015-11-23 17:00 | pstorz | Note Added: 0002012 | |
2015-11-23 17:00 | pstorz | Status | closed => resolved |
2017-06-08 15:27 | joergs | Status | resolved => closed |