View Issue Details

IDProjectCategoryView StatusLast Update
0001089bareos-corestorage daemonpublic2019-07-08 10:56
Reporterfbrendel Assigned Toarogge  
PrioritynormalSeverityblockReproducibilityalways
Status closedResolutionunable to reproduce 
PlatformLinuxOSCentOSOS Version7
Summary0001089: bareos-sd hang writing VirtualFull to tape
DescriptionDuring a VirtualFull backup the SD spools the data to the SpoolDirectory but then it doesn't write anything to tape.
The futex() calls (see strace) repeating until I reboot the machine.
Killing bareos-sd via SIGKILL leaves bareos-sd as a zombie.

The Bareos Version is 18.2.6.

Director:
JobDefs {
  Name = lt-Prod
  Type = Backup
  Level = VirtualFull
  Schedule = Quarterly
  Messages = Standard
  Pool = ConProd
  Priority = 13
  Write Bootstrap = "/var/lib/bareos/%c.bsr"
  Accurate = yes
  Spool Data = Yes
  Prefer Mounted Volumes = no
  Spool Attributes = yes
  File Set = LinuxAll
  Run Script {
        Console = "update jobid=%i jobtype=A"
        Runs When = After
        Runs On Client = No
        Runs On Failure = No
  }
}
Storage {
  Name = elbct3-msl2024
  Address = 192.168.192.168
  Password = "xxxxxxxxxxxxxxxxxxxxx"
  Media Type = LTO-6
  Device = MSL2024
  Auto Changer = yes
  Maximum Concurrent Jobs = 2
  Allow Compression = No
  Collect Statistics = yes
  TLS enable = no
}

Storage:
Autochanger {
  Name = MSL2024
  Device = Drive0
  Device = Drive1
  Changer Command = "/usr/lib/bareos/scripts/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/tape/by-id/scsi-35001438016033b76
}
Device {
  Name = Drive0
  Media Type = LTO-6
  Archive Device = /dev/tape/by-id/scsi-35001438016033b77-nst
  Autochanger = yes
  Always Open = yes
  Automatic Mount = yes
  Maximum Concurrent Jobs = 1
  Drive Index = 0
  Maximum Block Size = 524288
  Maximum File Size = 21474836480
  Maximum Spool Size = 214748364800
  Spool Directory = /mnt/ssd/spool
  Label Media = yes
}

Device {
  Name = Drive1
  Media Type = LTO-6
  Archive Device = /dev/tape/by-id/scsi-35001438016033b7a-nst
  Autochanger = yes
  Always Open = yes
  Automatic Mount = yes
  Maximum Concurrent Jobs = 1
  Drive Index = 1
  Maximum Block Size = 524288
  Maximum File Size = 21474836480
  Maximum Spool Size = 214748364800
  Spool Directory = /mnt/ssd/spool
  Label Media = yes
}
Storage {
  Name = elbct3-sd
  Maximum Concurrent Jobs = 33
  Device Reserve By Media Type = yes
  Collect Device Statistics = yes
  Collect Job Statistics = yes
  TLS Enable = no
}
Steps To Reproduce1. write eof to tapes
2. label barcode
3. update slots
4. start job
Additional InformationThese are the last lines from Bareos log

06-Jun 10:10 elbct3-sd JobId 3117: Forward spacing Volume "IncProd-0179" to file:block 0:215202049.
06-Jun 10:10 elbct3-sd JobId 3117: End of Volume at file 0 on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/), Volume "IncProd-0179"
06-Jun 10:10 elbct3-sd JobId 3117: Ready to read from volume "IncProd-0206" on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/).
06-Jun 10:10 elbct3-sd JobId 3117: Forward spacing Volume "IncProd-0206" to file:block 0:217961589.
06-Jun 10:10 elbct3-sd JobId 3117: End of Volume at file 0 on device "FileInc1" (/mnt/cephfs/bareos/storage/incremental/), Volume "IncProd-0206"
06-Jun 10:10 elbct3-sd JobId 3117: End of all volumes.
06-Jun 10:10 elbct3-sd JobId 3117: Committing spooled data to Volume "5000182". Despooling 9,842,645,367 bytes ...


and these are from strace.

[pid 14587] write(8, "\0\0\0\247UpdCat Job=inc-elbcg1.2019-0"..., 171) = 171
[pid 14587] write(8, "\0\0\0\266UpdCat Job=inc-elbcg1.2019-0"..., 186) = 186
[pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100
[pid 14587] write(8, "\0\0\0\271UpdCat Job=inc-elbcg1.2019-0"..., 189) = 189
[pid 14587] read(6, "\211\261b\221\0\0\374\0\0\0\0zBB02\0\0\1z\\\346\217I\0\0\1\231\377\377\377\343"..., 64512) = 64512
[pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100
[pid 14587] write(8, "\0\0\0\266UpdCat Job=inc-elbcg1.2019-0"..., 186) = 186
[pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100
[pid 14587] write(8, "\0\0\0\271UpdCat Job=inc-elbcg1.2019-0"..., 189) = 189
[pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100
[pid 14587] write(8, "\0\0\0\272UpdCat Job=inc-elbcg1.2019-0"..., 190) = 190
[pid 14587] write(8, "\0\0\0`UpdCat Job=inc-elbcg1.2019-0"..., 100) = 100
[pid 14587] write(8, "\0\0\0\243UpdCat Job=inc-elbcg1.2019-0"..., 167) = 167
[pid 14587] write(8, "\0\0\0\236UpdCat Job=inc-elbcg1.2019-0"..., 162) = 162
[pid 14587] getcwd("/root", 4098) = 6
[pid 14587] getcwd("/root", 4098) = 6
[pid 14587] write(5, "\0\0\0\307Jmsg Job=inc-elbcg1.2019-05-"..., 203) = 203
[pid 14587] getcwd("/root", 4098) = 6
[pid 14587] write(5, "\0\0\0mJmsg Job=inc-elbcg1.2019-05-"..., 113) = 113
[pid 14587] write(7, "=a\2\0\227a\2\0\355i\7\0", 12) = 12
[pid 14587] write(7, "UUUUUUUUUUUUUUUUUUUUUUUU\0\2a=\377\377\377\343"..., 485869) = 485869
[pid 14587] getcwd("/root", 4098) = 6
[pid 14587] write(5, "\0\0\0\251Jmsg Job=inc-elbcg1.2019-05-"..., 173) = 173
[pid 14587] write(5, "\0\0\0;Status Job=inc-elbcg1.2019-0"..., 63) = 63
[pid 14587] mprotect(0x7fa63c2b9000, 524288, PROT_READ|PROT_WRITE) = 0
[pid 14587] lseek(7, 0, SEEK_SET) = 0
[pid 14587] fadvise64(7, 0, 0, POSIX_FADV_WILLNEED <unfinished ...>
[pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7, {1559808672, 42190000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 9, {1559808702, 42716000}, ffffffff <unfinished ...>
[pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 21, {1559808740, 603142000}, ffffffff <unfinished ...>
[pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11, {1559808732, 43432000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 13, {1559808762, 44086000}, ffffffff <unfinished ...>
[pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 23, {1559808800, 603732000}, ffffffff <unfinished ...>
[pid 13807] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15, {1559808792, 44812000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 13807] futex(0x62b180, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13807] futex(0x62b144, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17, {1559808822, 45483000}, ffffffff <unfinished ...>
[pid 13806] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 13806] futex(0x7fa64bac9e80, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 13806] futex(0x7fa64bac9e44, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 25, {1559808860, 604274000}, ffffffff <unfinished ...>
TagsNo tags attached.

Activities

fbrendel

fbrendel

2019-06-13 15:29

reporter   ~0003391

Debian 9.9 and Bareos 18.2.5 works fine.
arogge

arogge

2019-07-04 16:16

manager   ~0003415

So you're seeing this issue with 18.2.6, but not 18.2.5?
fbrendel

fbrendel

2019-07-04 16:26

reporter   ~0003416

Yes, but that was a self compiled RPM.

We've purchased a Bareos subscription recently and are now in the process of upgrading all daemons to 18.2.6.
I'd say you can close the issue since I can't reproduce that anyway.

Thanks :-)

Issue History

Date Modified Username Field Change
2019-06-06 10:36 fbrendel New Issue
2019-06-13 15:29 fbrendel Note Added: 0003391
2019-07-04 16:16 arogge Assigned To => arogge
2019-07-04 16:16 arogge Status new => feedback
2019-07-04 16:16 arogge Note Added: 0003415
2019-07-04 16:16 arogge Assigned To arogge =>
2019-07-04 16:26 fbrendel Note Added: 0003416
2019-07-04 16:26 fbrendel Status feedback => new
2019-07-08 10:56 arogge Assigned To => arogge
2019-07-08 10:56 arogge Status new => closed
2019-07-08 10:56 arogge Resolution open => unable to reproduce