View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000899 | bareos-core | director | public | 2018-01-27 09:10 | 2018-06-22 17:16 |
| Reporter | zendx | Assigned To | joergs | ||
| Priority | high | Severity | major | Reproducibility | sometimes |
| Status | closed | Resolution | won't fix | ||
| Platform | Linux | OS | Debian | OS Version | 8 |
| Product Version | 17.2.4 | ||||
| Summary | 0000899: Bareos crashes with some clients | ||||
| Description | When I try to start work, for some clients, director crashes. Any errors in logs, but i have backtrace. Two problem clients - one with CentOS 7 x64 (FD ver 17.2.4), and other - WS 2008 (FD ver 17.2.4) | ||||
| Additional Information | Backtrace attached. | ||||
| Tags | Collect Statistics | ||||
| child of | 0000903 | closed | director crashes some time after a reload if Collect Statistic is enabled |
|
9gn7SD8M.txt (9,121 bytes)
Created /var/lib/bareos/bareos-dir.core.25321 for doing postmortem debugging
[New LWP 25322]
[New LWP 25325]
[New LWP 25326]
[New LWP 25327]
[New LWP 25361]
[New LWP 25363]
[New LWP 25321]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/bareos-dir'.
#0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
$1 = 0x6b6b80 <my_name> "bareos-dir"
$2 = 0x17ab758 "bareos-dir"
$3 = 0x17ab798 "/usr/sbin/bareos-dir"
$4 = 0x7f6fc4012498 "PostgreSQL"
$5 = 0x7f6fdfd710a6 "17.2.4 (21 Sep 2017)"
$6 = 0x7f6fdfd71092 "x86_64-pc-linux-gnu"
$7 = 0x7f6fdfd7108b "debian"
$8 = 0x7f6fdfd7106d "Debian GNU/Linux 8.0 (jessie)"
$9 = "bareos", '\000' <repeats 249 times>
$10 = 0x7f6fdfd71588 "debian Debian GNU/Linux 8.0 (jessie)"
Environment variable "TestName" not defined.
#0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171
#2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568
#3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309
#4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 7 (Thread 0x7f6fe0e78740 (LWP 25321)):
#0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd40854 in bmicrosleep (sec=60, usec=0) at bsys.c:171
#2 0x0000000000447959 in wait_for_next_job (one_shot_job_to_run=0x0) at scheduler.c:126
#3 0x000000000040f9d6 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:434
Thread 6 (Thread 0x7f6fceffd700 (LWP 25363)):
#0 0x00007f6fde633a9d in read () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd3ea60 in BSOCK_TCP::read_nbytes (this=0x7f6fb8001c78, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:978
#2 0x00007f6fdfd3e2df in BSOCK_TCP::recv (this=0x7f6fb8001c78) at bsock_tcp.c:550
#3 0x0000000000426bcf in bget_dirmsg (bs=0x7f6fb8001c78, allow_any_message=false) at getmsg.c:154
#4 0x0000000000433fcc in msg_thread (arg=0xb, arg@entry=0x18101c8) at msgchan.c:434
#5 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x7f6fb8002f48) at lockmgr.c:928
#6 0x00007f6fde62d064 in start_thread (arg=0x7f6fceffd700) at pthread_create.c:309
#7 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 5 (Thread 0x7f6fcf7fe700 (LWP 25361)):
#0 0x00007f6fde633a9d in read () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd3ea60 in BSOCK_TCP::read_nbytes (this=0x7f6fb8004008, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:978
#2 0x00007f6fdfd3e2df in BSOCK_TCP::recv (this=0x7f6fb8004008) at bsock_tcp.c:550
#3 0x0000000000426bcf in bget_dirmsg (bs=0x7f6fb8004008, allow_any_message=false) at getmsg.c:154
#4 0x0000000000412692 in wait_for_job_termination (jcr=0x18101c8, timeout=0) at backup.c:715
#5 0x00000000004147f9 in do_native_backup (jcr=jcr@entry=0x18101c8) at backup.c:650
#6 0x0000000000429684 in job_thread (arg=0x18101c8) at job.c:514
#7 0x000000000042eba1 in jobq_server (arg=0x6b7240 <job_queue>) at jobq.c:485
#8 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x18192c8) at lockmgr.c:928
#9 0x00007f6fde62d064 in start_thread (arg=0x7f6fcf7fe700) at pthread_create.c:309
#10 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 4 (Thread 0x7f6fd501e700 (LWP 25327)):
#0 0x00007f6fde634489 in __libc_waitpid (pid=25862, stat_loc=0x7f6fd501c6cc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1 0x00007f6fdfd61154 in signal_handler (sig=7) at signal.c:240
#2 <signal handler called>
#3 0x00007f6fdfd36563 in bnet_host2ipaddrs (host=0xaaaaaaaaaaaaaaaa <error: Cannot access memory at address 0xaaaaaaaaaaaaaaaa>, family=0, errstr=0x7f6fd501d028) at bnet.c:418
#4 0x00007f6fdfd3fc06 in BSOCK_TCP::open (this=0x7f6fc000bfb8, jcr=0x38, name=0x18 <error: Cannot access memory at address 0x18>, host=0x7f6fc0000020 "", service=0x0, port=-1431655766, heart_beat=-6148914691236517206, fatal=0x7f6fd501dbdc) at bsock_tcp.c:182
#5 0x00007f6fdfd3ebc5 in BSOCK_TCP::connect (this=0x7f6fc000bfb8, jcr=0x7f6fc0001078, retry_interval=2, max_retry_time=140117939322912, max_retry_time@entry=1, heart_beat=0, heart_beat@entry=-6148914691236517206, name=0x47c661 "Storage daemon", host=0xaaaaaaaaaaaaaaaa <error: Cannot access memory at address 0xaaaaaaaaaaaaaaaa>, service=0x0, port=-1431655766, verbose=false) at bsock_tcp.c:115
#6 0x00000000004452de in connect_to_storage_daemon (jcr=0x7f6fc0001078, retry_interval=<optimized out>, max_retry_time=<optimized out>, verbose=<optimized out>) at sd_cmds.c:118
#7 0x000000000044802b in statistics_thread_runner (arg=0x7f6fc000c4b8, arg@entry=0x0) at stats.c:233
#8 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180d558) at lockmgr.c:928
#9 0x00007f6fde62d064 in start_thread (arg=0x7f6fd501e700) at pthread_create.c:309
#10 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7f6fcd01e700 (LWP 25326)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6fdfd51b8c in bthread_cond_timedwait_p (cond=cond@entry=0x7f6fdff87dc0 <_ZL5timer>, m=m@entry=0x7f6fdff87e00 <_ZL11timer_mutex>, abstime=abstime@entry=0x7f6fcd01de20, file=file@entry=0x7f6fdfd75952 "watchdog.c", line=line@entry=313) at lockmgr.c:813
#2 0x00007f6fdfd6a3fd in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313
#3 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180d558) at lockmgr.c:928
#4 0x00007f6fde62d064 in start_thread (arg=0x7f6fcd01e700) at pthread_create.c:309
#5 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7f6fd581f700 (LWP 25325)):
#0 0x00007f6fdd933aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd35723 in bnet_thread_server_tcp (addr_list=addr_list@entry=0x17c74d8, max_clients=<optimized out>, sockfds=<optimized out>, client_wq=client_wq@entry=0x6b7540 <socket_workq>, nokeepalive=<optimized out>, handle_client_request=handle_client_request@entry=0x442160 <handle_connection_request(void*)>) at bnet_server_tcp.c:306
#2 0x00000000004423ef in connect_thread (arg=arg@entry=0x17c74d8) at socket_server.c:115
#3 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180e848) at lockmgr.c:928
#4 0x00007f6fde62d064 in start_thread (arg=0x7f6fd581f700) at pthread_create.c:309
#5 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7f6fdd436700 (LWP 25322)):
#0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171
#2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568
#3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309
#4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
#0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
No locals.
#1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171
171 bsys.c: No such file or directory.
timeout = {tv_sec = 30, tv_nsec = 0}
tv = {tv_sec = 0, tv_usec = 0}
tz = {tz_minuteswest = -582785280, tz_dsttime = 32623}
status = <optimized out>
#2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568
568 lockmgr.c: No such file or directory.
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 3039939368786552139, 0, 140118491484256, 0, 140118430279424, -3103065658840095413, -3103060313982671541}, __mask_was_saved = 0}}, __pad = {0x7f6fdd435f30, 0x0, 0x7f6fdd436700, 0x7f6fdd436700}}
__cancel_routine = 0x7f6fdfd51770 <cln_hdl(void*)>
__not_first_call = <optimized out>
old = 0
#3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309
309 pthread_create.c: No such file or directory.
__res = <optimized out>
pd = 0x7f6fdd436700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140118430279424, 3039939368786552139, 0, 140118491484256, 0, 140118430279424, -3103065658850581173, -3103063180958145205}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
111 ../sysdeps/unix/sysv/linux/x86_64/clone.S: No such file or directory.
No locals.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available. |
|
|
New backtrace (crashed at the middle of the running job (about 20GB\40Gb backed up): https://pastebin.com/sMigZd3T |
|
|
Thank you for this report. I see, that the problem is related to the statistic thread. You may try to disable it by http://doc.bareos.org/master/html/bareos-manual-main-reference.html#directiveDirStorageCollect%20Statistics A few additional questions: Have you used the packages from http://download.baeos.org/ ? The backup works okay with your other clients? Does the backup always fails on this two clients? Have you configured storages that are no longer available? |
|
|
Yes, i have enabled statistic collection. I'll try to disable it... 1. I'm using debian repo from download.bareos.com. 2. Randomly. On example, by restarting failed job with same client i got successfull backup (1 backup on 4 crashes). 3. Other clients have less files, so, maybe, this is answer. 4. No unavailable or failed storages. This night I will run the backup again (with the statistics collection disabled). |
|
| Any news no this? Has the problem been related to the statistic thread? If yes, there is now a fix for it, already released in the source and soon to be released as package. | |
| Received no feedback and I think this have been fixed in 17.2.6. Closing ticket. | |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2018-01-27 09:10 | zendx | New Issue | |
| 2018-01-27 09:10 | zendx | File Added: 9gn7SD8M.txt | |
| 2018-01-27 10:00 | zendx | Note Added: 0002885 | |
| 2018-01-27 19:34 | joergs | Priority | normal => high |
| 2018-01-27 19:34 | joergs | Severity | minor => major |
| 2018-01-27 19:34 | joergs | Reproducibility | always => sometimes |
| 2018-01-27 19:34 | joergs | Status | new => acknowledged |
| 2018-01-27 19:45 | joergs | Note Added: 0002886 | |
| 2018-01-27 19:45 | joergs | Status | acknowledged => feedback |
| 2018-01-27 19:46 | joergs | Tag Attached: Collect Statistics | |
| 2018-01-27 20:17 | zendx | Note Added: 0002888 | |
| 2018-01-27 20:17 | zendx | Status | feedback => new |
| 2018-02-02 14:40 | joergs | Relationship added | child of 0000903 |
| 2018-06-08 13:52 | joergs | Note Added: 0003035 | |
| 2018-06-08 13:52 | joergs | Assigned To | => joergs |
| 2018-06-08 13:52 | joergs | Status | new => feedback |
| 2018-06-22 17:16 | joergs | Note Added: 0003050 | |
| 2018-06-22 17:16 | joergs | Status | feedback => closed |
| 2018-06-22 17:16 | joergs | Resolution | open => won't fix |