View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001405 | bareos-core | storage daemon | public | 2021-12-02 12:01 | 2023-09-05 09:42 |
Reporter | gusevmk | Assigned To | bruno-at-bareos | ||
Priority | urgent | Severity | major | Reproducibility | sometimes |
Status | closed | Resolution | unable to reproduce | ||
Platform | Linux | OS | RHEL | OS Version | 8.2 |
Product Version | 19.2.11 | ||||
Summary | 0001405: Storage daemon not answer, when there are no available devices. | ||||
Description | UPD: Version is 19.2.7 I have troubles when no more available devices on storage daemon. how storage=pgbkps01.mgmt. Storage { Name = "pgbkps01_loop" Address = "10.*.*.192" Password = "***" Device = "local-storage-0","local-storage-1","local-storage-2","local-storage-3" MediaType = "File" MaximumConcurrentJobs = 4 SddPort = 9103 } There are 4 jobs running at the moment. When starting new job, it does not queue, but ends with error: 02-Dec 13:13 bareos-dir JobId 1903: Fatal error: Authorization key rejected bareos-dir. 02-Dec 13:13 bareos-dir JobId 1903: Fatal error: Director unable to authenticate with Storage daemon at "10.*.*.192:9103". Possible causes: Passwords or names not the same or TLS negotiation problem or Maximum Concurrent Jobs exceeded on the SD or SD networking messed up (restart daemon). *status storage=pgbkps01_loop Connecting to Storage daemon pgbkps01_loop at 10.*.*.192:9103 Failed to connect to Storage daemon pgbkps01_loop When one of 4 jobs ends, there are no problem. I did check strace of bareos-sd threads: ps -eLo pid,tid,ppid,user:11,comm,state,wchan | grep bareos-sd 3256361 3256361 1 bareos bareos-sd S x64_sys_poll 3256361 3256363 1 bareos bareos-sd S - 3256361 1332308 1 bareos bareos-sd S x64_sys_poll 3256361 2252738 1 bareos bareos-sd S x64_sys_poll 3256361 2407428 1 bareos bareos-sd S x64_sys_poll ======================================================================== WHEN IS PROBLEM: strace -p 3256361: accept(3, {sa_family=AF_INET, sin_port=htons(48954), sin_addr=inet_addr("10.76.74.192")}, [128->16]) = 6 setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 clone(child_stack=0x7ff13d75feb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff13d7609d0, tls=0x7ff13d760700, child_tidptr=0x7ff13d7609d0) = 2458575 futex(0x2487490, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442869, tv_nsec=440823368}, FUTEX_BITSET_MATCH_ANY) = 0 futex(0x2487418, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x24874c0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x2487440, FUTEX_WAKE_PRIVATE, 1) = 1 strace -p 3256363 strace: Process 3256363 attached restart_syscall(<... resuming interrupted futex ...> ======================================================================== WHEN IS NO PROBLEM: strace -p 3256361: accept(3, {sa_family=AF_INET, sin_port=htons(48954), sin_addr=inet_addr("10.76.74.192")}, [128->16]) = 6 setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 clone(child_stack=0x7ff13d75feb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff13d7609d0, tls=0x7ff13d760700, child_tidptr=0x7ff13d7609d0) = 2458575 futex(0x2487490, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442869, tv_nsec=440823368}, FUTEX_BITSET_MATCH_ANY) = 0 futex(0x2487418, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x24874c0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x2487440, FUTEX_WAKE_PRIVATE, 1) = 1 strace -p 3256363 strace: Process 3256363 attached restart_syscall(<... resuming interrupted futex ...> futex(0x7ff1508ce340, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7ff1508ce328, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442749, tv_nsec=490329000}, FUTEX_BITSET_MATCH_ANY) = 0 futex(0x7ff1508ce340, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7ff1508ce32c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442807, tv_nsec=745975000}, FUTEX_BITSET_MATCH_ANY) = 0 futex(0x7ff1508ce340, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7ff1508ce328, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442807, tv_nsec=747064000}, FUTEX_BITSET_MATCH_ANY) = 0 futex(0x7ff1508ce340, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7ff1508ce32c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442807, tv_nsec=792002000}, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable) futex(0x7ff1508ce340, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7ff1508ce328, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1638442807, tv_nsec=792408000}, FUTEX_BITSET_MATCH_ANY ============================================= I think here is some inter process communication issue | ||||
Tags | No tags attached. | ||||
This problem happens, when director configuration was reload, steps to reproduce: 1. Set 4 maximum concurrent jobs on storage = 4 2. Run 4 simultaneous jobs 3. Run one more job - all ok, this job is in queue 4. Reload director configuration (add new job/schedule) 5. Start one more job - fail, this job starts and fail - because - no more available devices/Maximum concurent jobs limit exceeded on storage daemon |
|
Could you provide you bareos-sd configuration bareos-sd.d/storage/bareos-sd.conf (normally) and also the bareos-dir.d/director/bareos-dir.conf (usually) You have to check if you change the default values if there's enough for those (here the default) Maximum Concurrent Jobs (Sd->Storage) = PINT32 20 Maximum Connections (Sd->Storage) = PINT32 42 As you see Maximum Connections is higher to allow to pass command to SD (like status etc). |
|
Any news here ? | |
[EDIT]: Sorry, I saw the strace output, and it appeared to be the same as my own situation. Trouble is, I am now seeing something similar to the OP's situation, but not quite the same. I will start a separate ticket for mine. Please disregard my original post here. |
|
Not reproducible. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-12-02 12:01 | gusevmk | New Issue | |
2021-12-04 06:52 | gusevmk | Note Added: 0004379 | |
2021-12-06 14:17 | bruno-at-bareos | Note Added: 0004380 | |
2021-12-06 14:18 | bruno-at-bareos | Description Updated | |
2021-12-22 16:35 | bruno-at-bareos | Note Added: 0004408 | |
2022-01-03 22:35 | embareossed | Note Added: 0004446 | |
2022-01-03 22:36 | embareossed | Note Edited: 0004446 | |
2022-01-04 00:01 | embareossed | Note Edited: 0004446 | |
2022-01-04 00:06 | embareossed | Note Edited: 0004446 | |
2022-01-04 00:11 | embareossed | Note Edited: 0004446 | |
2022-01-09 23:29 | embareossed | Note Edited: 0004446 | |
2023-09-05 09:42 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2023-09-05 09:42 | bruno-at-bareos | Status | new => closed |
2023-09-05 09:42 | bruno-at-bareos | Resolution | open => unable to reproduce |
2023-09-05 09:42 | bruno-at-bareos | Note Added: 0005373 |