View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000899 | bareos-core | director | public | 2018-01-27 09:10 | 2018-06-22 17:16 |
Reporter | zendx | Assigned To | joergs | ||
Priority | high | Severity | major | Reproducibility | sometimes |
Status | closed | Resolution | won't fix | ||
Platform | Linux | OS | Debian | OS Version | 8 |
Product Version | 17.2.4 | ||||
Summary | 0000899: Bareos crashes with some clients | ||||
Description | When I try to start work, for some clients, director crashes. Any errors in logs, but i have backtrace. Two problem clients - one with CentOS 7 x64 (FD ver 17.2.4), and other - WS 2008 (FD ver 17.2.4) | ||||
Additional Information | Backtrace attached. | ||||
Tags | Collect Statistics | ||||
child of | 0000903 | closed | director crashes some time after a reload if Collect Statistic is enabled |
9gn7SD8M.txt (9,121 bytes)
Created /var/lib/bareos/bareos-dir.core.25321 for doing postmortem debugging [New LWP 25322] [New LWP 25325] [New LWP 25326] [New LWP 25327] [New LWP 25361] [New LWP 25363] [New LWP 25321] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/bareos-dir'. #0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. $1 = 0x6b6b80 <my_name> "bareos-dir" $2 = 0x17ab758 "bareos-dir" $3 = 0x17ab798 "/usr/sbin/bareos-dir" $4 = 0x7f6fc4012498 "PostgreSQL" $5 = 0x7f6fdfd710a6 "17.2.4 (21 Sep 2017)" $6 = 0x7f6fdfd71092 "x86_64-pc-linux-gnu" $7 = 0x7f6fdfd7108b "debian" $8 = 0x7f6fdfd7106d "Debian GNU/Linux 8.0 (jessie)" $9 = "bareos", '\000' <repeats 249 times> $10 = 0x7f6fdfd71588 "debian Debian GNU/Linux 8.0 (jessie)" Environment variable "TestName" not defined. #0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171 #2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568 #3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309 #4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 7 (Thread 0x7f6fe0e78740 (LWP 25321)): #0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd40854 in bmicrosleep (sec=60, usec=0) at bsys.c:171 #2 0x0000000000447959 in wait_for_next_job (one_shot_job_to_run=0x0) at scheduler.c:126 #3 0x000000000040f9d6 in main (argc=<optimized out>, argv=<optimized out>) at dird.c:434 Thread 6 (Thread 0x7f6fceffd700 (LWP 25363)): #0 0x00007f6fde633a9d in read () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd3ea60 in BSOCK_TCP::read_nbytes (this=0x7f6fb8001c78, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:978 #2 0x00007f6fdfd3e2df in BSOCK_TCP::recv (this=0x7f6fb8001c78) at bsock_tcp.c:550 #3 0x0000000000426bcf in bget_dirmsg (bs=0x7f6fb8001c78, allow_any_message=false) at getmsg.c:154 #4 0x0000000000433fcc in msg_thread (arg=0xb, arg@entry=0x18101c8) at msgchan.c:434 #5 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x7f6fb8002f48) at lockmgr.c:928 #6 0x00007f6fde62d064 in start_thread (arg=0x7f6fceffd700) at pthread_create.c:309 #7 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 5 (Thread 0x7f6fcf7fe700 (LWP 25361)): #0 0x00007f6fde633a9d in read () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd3ea60 in BSOCK_TCP::read_nbytes (this=0x7f6fb8004008, ptr=<optimized out>, nbytes=4) at bsock_tcp.c:978 #2 0x00007f6fdfd3e2df in BSOCK_TCP::recv (this=0x7f6fb8004008) at bsock_tcp.c:550 #3 0x0000000000426bcf in bget_dirmsg (bs=0x7f6fb8004008, allow_any_message=false) at getmsg.c:154 #4 0x0000000000412692 in wait_for_job_termination (jcr=0x18101c8, timeout=0) at backup.c:715 #5 0x00000000004147f9 in do_native_backup (jcr=jcr@entry=0x18101c8) at backup.c:650 #6 0x0000000000429684 in job_thread (arg=0x18101c8) at job.c:514 #7 0x000000000042eba1 in jobq_server (arg=0x6b7240 <job_queue>) at jobq.c:485 #8 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x18192c8) at lockmgr.c:928 #9 0x00007f6fde62d064 in start_thread (arg=0x7f6fcf7fe700) at pthread_create.c:309 #10 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x7f6fd501e700 (LWP 25327)): #0 0x00007f6fde634489 in __libc_waitpid (pid=25862, stat_loc=0x7f6fd501c6cc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40 #1 0x00007f6fdfd61154 in signal_handler (sig=7) at signal.c:240 #2 <signal handler called> #3 0x00007f6fdfd36563 in bnet_host2ipaddrs (host=0xaaaaaaaaaaaaaaaa <error: Cannot access memory at address 0xaaaaaaaaaaaaaaaa>, family=0, errstr=0x7f6fd501d028) at bnet.c:418 #4 0x00007f6fdfd3fc06 in BSOCK_TCP::open (this=0x7f6fc000bfb8, jcr=0x38, name=0x18 <error: Cannot access memory at address 0x18>, host=0x7f6fc0000020 "", service=0x0, port=-1431655766, heart_beat=-6148914691236517206, fatal=0x7f6fd501dbdc) at bsock_tcp.c:182 #5 0x00007f6fdfd3ebc5 in BSOCK_TCP::connect (this=0x7f6fc000bfb8, jcr=0x7f6fc0001078, retry_interval=2, max_retry_time=140117939322912, max_retry_time@entry=1, heart_beat=0, heart_beat@entry=-6148914691236517206, name=0x47c661 "Storage daemon", host=0xaaaaaaaaaaaaaaaa <error: Cannot access memory at address 0xaaaaaaaaaaaaaaaa>, service=0x0, port=-1431655766, verbose=false) at bsock_tcp.c:115 #6 0x00000000004452de in connect_to_storage_daemon (jcr=0x7f6fc0001078, retry_interval=<optimized out>, max_retry_time=<optimized out>, verbose=<optimized out>) at sd_cmds.c:118 #7 0x000000000044802b in statistics_thread_runner (arg=0x7f6fc000c4b8, arg@entry=0x0) at stats.c:233 #8 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180d558) at lockmgr.c:928 #9 0x00007f6fde62d064 in start_thread (arg=0x7f6fd501e700) at pthread_create.c:309 #10 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3 (Thread 0x7f6fcd01e700 (LWP 25326)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f6fdfd51b8c in bthread_cond_timedwait_p (cond=cond@entry=0x7f6fdff87dc0 <_ZL5timer>, m=m@entry=0x7f6fdff87e00 <_ZL11timer_mutex>, abstime=abstime@entry=0x7f6fcd01de20, file=file@entry=0x7f6fdfd75952 "watchdog.c", line=line@entry=313) at lockmgr.c:813 #2 0x00007f6fdfd6a3fd in watchdog_thread (arg=arg@entry=0x0) at watchdog.c:313 #3 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180d558) at lockmgr.c:928 #4 0x00007f6fde62d064 in start_thread (arg=0x7f6fcd01e700) at pthread_create.c:309 #5 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x7f6fd581f700 (LWP 25325)): #0 0x00007f6fdd933aed in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd35723 in bnet_thread_server_tcp (addr_list=addr_list@entry=0x17c74d8, max_clients=<optimized out>, sockfds=<optimized out>, client_wq=client_wq@entry=0x6b7540 <socket_workq>, nokeepalive=<optimized out>, handle_client_request=handle_client_request@entry=0x442160 <handle_connection_request(void*)>) at bnet_server_tcp.c:306 #2 0x00000000004423ef in connect_thread (arg=arg@entry=0x17c74d8) at socket_server.c:115 #3 0x00007f6fdfd516ef in lmgr_thread_launcher (x=0x180e848) at lockmgr.c:928 #4 0x00007f6fde62d064 in start_thread (arg=0x7f6fd581f700) at pthread_create.c:309 #5 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7f6fdd436700 (LWP 25322)): #0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171 #2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568 #3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309 #4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 #0 0x00007f6fde63414d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 81 in ../sysdeps/unix/syscall-template.S No locals. #1 0x00007f6fdfd40854 in bmicrosleep (sec=sec@entry=30, usec=usec@entry=0) at bsys.c:171 171 bsys.c: No such file or directory. timeout = {tv_sec = 30, tv_nsec = 0} tv = {tv_sec = 0, tv_usec = 0} tz = {tz_minuteswest = -582785280, tz_dsttime = 32623} status = <optimized out> #2 0x00007f6fdfd5165c in check_deadlock () at lockmgr.c:568 568 lockmgr.c: No such file or directory. __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 3039939368786552139, 0, 140118491484256, 0, 140118430279424, -3103065658840095413, -3103060313982671541}, __mask_was_saved = 0}}, __pad = {0x7f6fdd435f30, 0x0, 0x7f6fdd436700, 0x7f6fdd436700}} __cancel_routine = 0x7f6fdfd51770 <cln_hdl(void*)> __not_first_call = <optimized out> old = 0 #3 0x00007f6fde62d064 in start_thread (arg=0x7f6fdd436700) at pthread_create.c:309 309 pthread_create.c: No such file or directory. __res = <optimized out> pd = 0x7f6fdd436700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140118430279424, 3039939368786552139, 0, 140118491484256, 0, 140118430279424, -3103065658850581173, -3103063180958145205}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #4 0x00007f6fdd93c62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 111 ../sysdeps/unix/sysv/linux/x86_64/clone.S: No such file or directory. No locals. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. |
|
New backtrace (crashed at the middle of the running job (about 20GB\40Gb backed up): https://pastebin.com/sMigZd3T |
|
Thank you for this report. I see, that the problem is related to the statistic thread. You may try to disable it by http://doc.bareos.org/master/html/bareos-manual-main-reference.html#directiveDirStorageCollect%20Statistics A few additional questions: Have you used the packages from http://download.baeos.org/ ? The backup works okay with your other clients? Does the backup always fails on this two clients? Have you configured storages that are no longer available? |
|
Yes, i have enabled statistic collection. I'll try to disable it... 1. I'm using debian repo from download.bareos.com. 2. Randomly. On example, by restarting failed job with same client i got successfull backup (1 backup on 4 crashes). 3. Other clients have less files, so, maybe, this is answer. 4. No unavailable or failed storages. This night I will run the backup again (with the statistics collection disabled). |
|
Any news no this? Has the problem been related to the statistic thread? If yes, there is now a fix for it, already released in the source and soon to be released as package. | |
Received no feedback and I think this have been fixed in 17.2.6. Closing ticket. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2018-01-27 09:10 | zendx | New Issue | |
2018-01-27 09:10 | zendx | File Added: 9gn7SD8M.txt | |
2018-01-27 10:00 | zendx | Note Added: 0002885 | |
2018-01-27 19:34 | joergs | Priority | normal => high |
2018-01-27 19:34 | joergs | Severity | minor => major |
2018-01-27 19:34 | joergs | Reproducibility | always => sometimes |
2018-01-27 19:34 | joergs | Status | new => acknowledged |
2018-01-27 19:45 | joergs | Note Added: 0002886 | |
2018-01-27 19:45 | joergs | Status | acknowledged => feedback |
2018-01-27 19:46 | joergs | Tag Attached: Collect Statistics | |
2018-01-27 20:17 | zendx | Note Added: 0002888 | |
2018-01-27 20:17 | zendx | Status | feedback => new |
2018-02-02 14:40 | joergs | Relationship added | child of 0000903 |
2018-06-08 13:52 | joergs | Note Added: 0003035 | |
2018-06-08 13:52 | joergs | Assigned To | => joergs |
2018-06-08 13:52 | joergs | Status | new => feedback |
2018-06-22 17:16 | joergs | Note Added: 0003050 | |
2018-06-22 17:16 | joergs | Status | feedback => closed |
2018-06-22 17:16 | joergs | Resolution | open => won't fix |