View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001089 | bareos-core | storage daemon | public | 2019-06-06 10:36 | 2019-07-08 10:56 |
Reporter | fbrendel | Assigned To | arogge | ||
Priority | normal | Severity | block | Reproducibility | always |
Status | closed | Resolution | unable to reproduce | ||
Platform | Linux | OS | CentOS | OS Version | 7 |
Summary | 0001089: bareos-sd hang writing VirtualFull to tape | ||||
Description | During a VirtualFull backup the SD spools the data to the SpoolDirectory but then it doesn't write anything to tape. The futex() calls (see strace) repeating until I reboot the machine. Killing bareos-sd via SIGKILL leaves bareos-sd as a zombie. The Bareos Version is 18.2.6. Director: JobDefs { Name = lt-Prod Type = Backup Level = VirtualFull Schedule = Quarterly Messages = Standard Pool = ConProd Priority = 13 Write Bootstrap = "/var/lib/bareos/%c.bsr" Accurate = yes Spool Data = Yes Prefer Mounted Volumes = no Spool Attributes = yes File Set = LinuxAll Run Script { Console = "update jobid=%i jobtype=A" Runs When = After Runs On Client = No Runs On Failure = No } } Storage { Name = elbct3-msl2024 Address = 192.168.192.168 Password = "xxxxxxxxxxxxxxxxxxxxx" Media Type = LTO-6 Device = MSL2024 Auto Changer = yes Maximum Concurrent Jobs = 2 Allow Compression = No Collect Statistics = yes TLS enable = no } Storage: Autochanger { Name = MSL2024 Device = Drive0 Device = Drive1 Changer Command = "/usr/lib/bareos/scripts/mtx-changer %c %o %S %a %d" Changer Device = /dev/tape/by-id/scsi-35001438016033b76 } Device { Name = Drive0 Media Type = LTO-6 Archive Device = /dev/tape/by-id/scsi-35001438016033b77-nst Autochanger = yes Always Open = yes Automatic Mount = yes Maximum Concurrent Jobs = 1 Drive Index = 0 Maximum Block Size = 524288 Maximum File Size = 21474836480 Maximum Spool Size = 214748364800 Spool Directory = /mnt/ssd/spool Label Media = yes } Device { Name = Drive1 Media Type = LTO-6 Archive Device = /dev/tape/by-id/scsi-35001438016033b7a-nst Autochanger = yes Always Open = yes Automatic Mount = yes Maximum Concurrent Jobs = 1 Drive Index = 1 Maximum Block Size = 524288 Maximum File Size = 21474836480 Maximum Spool Size = 214748364800 Spool Directory = /mnt/ssd/spool Label Media = yes } Storage { Name = elbct3-sd Maximum Concurrent Jobs = 33 Device Reserve By Media Type = yes Collect Device Statistics = yes Collect Job Statistics = yes TLS Enable = no } | ||||
Steps To Reproduce | 1. write eof to tapes 2. label barcode 3. update slots 4. start job | ||||
Additional Information | These are the last lines from Bareos log 06-Jun 10:10 elbct3-sd JobId 3117: Forward spacing Volume "IncProd-0179" to file:block 0:215202049. 06-Jun 10:10 elbct3-sd JobId 3117: End of Volume at file 0 on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/), Volume "IncProd-0179" 06-Jun 10:10 elbct3-sd JobId 3117: Ready to read from volume "IncProd-0206" on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/). 06-Jun 10:10 elbct3-sd JobId 3117: Forward spacing Volume "IncProd-0206" to file:block 0:217961589. 06-Jun 10:10 elbct3-sd JobId 3117: End of Volume at file 0 on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/), Volume "IncProd-0206" 06-Jun 10:10 elbct3-sd JobId 3117: End of all volumes. 06-Jun 10:10 elbct3-sd JobId 3117: Committing spooled data to Volume "5000182". Despooling 9,842,645,367 bytes ... and these are from strace. [pid 14587] write(8, "\0\0\0\247UpdCat Job=inc-elbcg1.2019-0"..., 171) = 171 [pid 14587] write(8, "\0\0\0\266UpdCat Job=inc-elbcg1.2019-0"..., 186) = 186 [pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100 [pid 14587] write(8, "\0\0\0\271UpdCat Job=inc-elbcg1.2019-0"..., 189) = 189 [pid 14587] read(6, "\211\261b\221\0\0\374\0\0\0\0zBB02\0\0\1z\\\346\217I\0\0\1\231\377\377\377\343"..., 64512) = 64512 [pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100 [pid 14587] write(8, "\0\0\0\266UpdCat Job=inc-elbcg1.2019-0"..., 186) = 186 [pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100 [pid 14587] write(8, "\0\0\0\271UpdCat Job=inc-elbcg1.2019-0"..., 189) = 189 [pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100 [pid 14587] write(8, "\0\0\0\272UpdCat Job=inc-elbcg1.2019-0"..., 190) = 190 [pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100 [pid 14587] write(8, "\0\0\0\243UpdCat Job=inc-elbcg1.2019-0"..., 167) = 167 [pid 14587] write(8, "\0\0\0\236UpdCat Job=inc-elbcg1.2019-0"..., 162) = 162 [pid 14587] getcwd("/root", 4098) = 6 [pid 14587] getcwd("/root", 4098) = 6 [pid 14587] write(5, "\0\0\0\307Jmsg Job=inc-elbcg1.2019-05-"..., 203) = 203 [pid 14587] getcwd("/root", 4098) = 6 [pid 14587] write(5, "\0\0\0mJmsg Job=inc-elbcg1.2019-05-"..., 113) = 113 [pid 14587] write(7, "=a\2\0\227a\2\0\355i\7\0", 12) = 12 [pid 14587] write(7, "UUUUUUUUUUUUUUUUUUUUUUUU\0\2a=\377\377\377\343"..., 485869) = 485869 [pid 14587] getcwd("/root", 4098) = 6 [pid 14587] write(5, "\0\0\0\251Jmsg Job=inc-elbcg1.2019-05-"..., 173) = 173 [pid 14587] write(5, "\0\0\0;Status Job=inc-elbcg1.2019-0"..., 63) = 63 [pid 14587] mprotect(0x7fa63c2b9000, 524288, PROT_READ|PROT_WRITE) = 0 [pid 14587] lseek(7, 0, SEEK_SET) = 0 [pid 14587] fadvise64(7, 0, 0, POSIX_FADV_WILLNEED <unfinished ...> [pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7, {1559808672, 42190000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 9, {1559808702, 42716000}, ffffffff <unfinished ...> [pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 21, {1559808740, 603142000}, ffffffff <unfinished ...> [pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11, {1559808732, 43432000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 13, {1559808762, 44086000}, ffffffff <unfinished ...> [pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 23, {1559808800, 603732000}, ffffffff <unfinished ...> [pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15, {1559808792, 44812000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17, {1559808822, 45483000}, ffffffff <unfinished ...> [pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 25, {1559808860, 604274000}, ffffffff <unfinished ...> | ||||
Tags | No tags attached. | ||||
Debian 9.9 and Bareos 18.2.5 works fine. | |
So you're seeing this issue with 18.2.6, but not 18.2.5? | |
Yes, but that was a self compiled RPM. We've purchased a Bareos subscription recently and are now in the process of upgrading all daemons to 18.2.6. I'd say you can close the issue since I can't reproduce that anyway. Thanks :-) |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2019-06-06 10:36 | fbrendel | New Issue | |
2019-06-13 15:29 | fbrendel | Note Added: 0003391 | |
2019-07-04 16:16 | arogge | Assigned To | => arogge |
2019-07-04 16:16 | arogge | Status | new => feedback |
2019-07-04 16:16 | arogge | Note Added: 0003415 | |
2019-07-04 16:16 | arogge | Assigned To | arogge => |
2019-07-04 16:26 | fbrendel | Note Added: 0003416 | |
2019-07-04 16:26 | fbrendel | Status | feedback => new |
2019-07-08 10:56 | arogge | Assigned To | => arogge |
2019-07-08 10:56 | arogge | Status | new => closed |
2019-07-08 10:56 | arogge | Resolution | open => unable to reproduce |