View Issue Details

IDProjectCategoryView StatusLast Update
0001474bareos-corestorage daemonpublic2022-08-02 13:44
Reporterjens Assigned To 
PrioritynormalSeveritycrashReproducibilityalways
Status newResolutionopen 
PlatformLinuxOSDebianOS Version9
Product Version19.2.12 
Summary0001474: bareos-sd crashing on VirtualFull with SIGSEGV ../src/lib/serial.cc file not found
DescriptionWhen running 'always incremental' backup scheme the storage daemon crashes with segmentation fault
on VirtualFull backup triggered by consolidation.

Job error:
bareos-dir JobId 1267: Fatal error: Director's comm line to SD dropped.

GDB debug:
bareos-sd (200): stored/mac.cc:159-154 joblevel from SOS_LABEL is now F
bareos-sd (130): stored/label.cc:672-154 session_label record=ec015288
bareos-sd (150): stored/label.cc:718-154 Write sesson_label record JobId=154 FI=SOS_LABEL SessId=1 Strm=154 len=165 remainder=0
bareos-sd (150): stored/label.cc:722-154 Leave WriteSessionLabel Block=1351364161d File=0d
bareos-sd (200): stored/mac.cc:221-154 before write JobId=154 FI=1 SessId=1 Strm=UNIX-Attributes-EX len=123
Thread 4 "bareos-sd" received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7ffff4c5b700 (LWP 2271)]
serial_uint32 (ptr=ptr@entry=0x7ffff4c5aa70, v=<optimized out>) at ../../../src/lib/serial.cc:76
76 ../../../src/lib/serial.cc: No such file or directory.


I am running daily incrementals into 'File' pool, consolidating every 4 days into 'FileCons' pool, a virtual full every 1st Monday of a month into 'LongTerm-Disk' pool
and finally a migration fto tape every 2nd Monday of a month from 'LongTerm-Disk' pool into 'LongTerm-Tape' pool.

BareOS version: 19.2.7
BareOS director and storage daemon on the same machine.
Disk storage on CEPH mount
Tape storage on Fujitsu Eternus LT2 tape library with 1 LTO-7 drive

---------------------------------------------------------------------------------------------------
Storage Device config:

FileStorage with 10 devices, all into same 1st folder:

Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /storage/backup/bareos_Incremental # storage location
  LabelMedia = yes # lets Bareos label unlabeled media
  Random Access = yes # allow this device to be used by any job
  AutomaticMount = yes # when device opened, read it
  RemovableMedia = no # fixed media ( no tape, no usb )
  AlwaysOpen = no
  Auto Inflate = both # auto-decompress in- out- stream
  Auto Deflate = both # auto-compress in- out- stream ( backup server side compression )
  Auto Deflate Algorithm = LZ4HC # compression algorithm
}
....

FileStorageCons with 10 devices, all into same 2nd folder

Name = FileStorageCons
  Media Type = FileCons
  Archive Device = /storage/backup/bareos_Consolidate # storage location
  LabelMedia = yes # lets Bareos label unlabeled media
  Random Access = yes # allow this device to be used by any job
  AutomaticMount = yes # when device opened, read it
  RemovableMedia = no # fixed media ( no tape, no usb )
  AlwaysOpen = no
  Auto Inflate = both # auto-decompress in- out- stream
  Auto Deflate = both # auto-compress in- out- stream ( backup server side compression )
  Auto Deflate Algorithm = LZ4HC # compression algorithm
}
...

FileStorageVault with 10 devices, all into same 3rd folder

Name = FileStorageVault
  Media Type = FileVLT
  Archive Device = /storage/backup/bareos_LongTermDisk # storage location
  LabelMedia = yes # lets Bareos label unlabeled media
  Random Access = yes # allow this device to be used by any job
  AutomaticMount = yes # when device opened, read it
  RemovableMedia = no # fixed media ( no tape, no usb )
  AlwaysOpen = no
  Auto Inflate = both # auto-decompress in- out- stream
  Auto Deflate = both # auto-compress in- out- stream ( backup server side compression )
  Auto Deflate Algorithm = LZ4HC # compression algorithm
}
....

Tape Device:

Device {
  Name = IBM-ULTRIUM-HH7
  Device Type = Tape
  DriveIndex = 0
  ArchiveDevice = /dev/nst0
  Media Type = IBM-LTO-7
  AutoChanger = yes
  AutomaticMount = yes
  LabelMedia = yes
  RemovableMedia = yes
  Autoselect = yes
  MaximumFileSize = 10GB
  Spool Directory = /storage/scratch
  Maximum Spool Size = 2199023255552 # maximum total spool size in bytes (2Tbyte)
}

---------------------------------------------------------------------------------------------------
Pool Config:

Pool {
  Name = AI-Incremental # name of the media pool
  Pool Type = Backup # pool type
  Recycle = yes # BAReOS can automatically recycle volumes from that pool
  AutoPrune = yes # do not automatically prune expired volumes
  Volume Retention = 72 days
  Storage = File # storage device to be used
  Maximum Volume Bytes = 10G # maximum file size per volume
  Maximum Volumes = 500 # maximum allowed total number of volumes in pool
  Label Format = "AI-Incremental_" # volumes will be labeled "AI-Incremental_-<volume-id>"
  Volume Use Duration = 36 days # volume will be no longer used than
  Next Pool = AI-Consolidate # next pool for consolidation
  Job Retention = 72 days
  File Retention = 36 days
}

Pool {
  Name = AI-Consolidate # name of the media pool
  Pool Type = Backup # pool type
  Recycle = yes # BAReOS can automatically recycle volumes from that pool
  AutoPrune = yes # do not automatically prune expired volumes
  Volume Retention = 366 days
  Job Retention = 180 days
  File Retention = 93 days
  Storage = FileCons # storage device to be used
  Maximum Volume Bytes = 10G # maximum file size per volume
  Maximum Volumes = 1000 # maximum allowed total number of volumes in pool
  Label Format = "AI-Consolidate_" # volumes will be labeled "AI-Consolidate_-<volume-id>"
  Volume Use Duration = 2 days # volume will be no longer used than
  Next Pool = LongTerm-Disk # next pool for long term backups to disk
}

Pool {
  Name = LongTerm-Disk # name of the media pool
  Pool Type = Backup # pool type
  Recycle = yes # BAReOS can automatically recycle volumes from that pool
  AutoPrune = yes # do not automatically prune expired volumes
  Volume Retention = 732 days
  Job Retention = 732 days
  File Retention = 366 days
  Storage = FileVLT # storage device to be used
  Maximum Volume Bytes = 10G # maximum file size per volume
  Maximum Volumes = 1000 # maximum allowed total number of volumes in pool
  Label Format = "LongTerm-Disk_" # volumes will be labeled "LongTerm-Disk_<volume-id>"
  Volume Use Duration = 2 days # volume will be no longer used than
  Next Pool = LongTerm-Tape # next pool for long term backups to disk
  Migration Time = 2 days # Jobs older than 2 days in this pool will be migrated to 'Next Pool'
}

Pool {
  Name = LongTerm-Tape
  Pool Type = Backup
  Recycle = yes # Bareos can automatically recycle Volumes
  AutoPrune = yes # Prune expired volumes
  Volume Retention = 732 days # How long should the Backups be kept? (0000012)
  Job Retention = 732 days
  File Retention = 366 days
  Storage = TapeLibrary # Physical Media
  Maximum Block Size = 1048576
  Recycle Pool = Scratch
  Cleaning Prefix = "CLN"
}

---------------------------------------------------------------------------------------------------
JobDefs:

JobDefs {
  Name = AI-Incremental
  Type = Backup
  Level = Incremental
  Storage = File
  Messages = Standard
  Pool = AI-Incremental
  Incremental Backup Pool = AI-Incremental
  Full Backup Pool = AI-Consolidate
  Accurate = yes
  Allow Mixed Priority = yes
  Always Incremental = yes
  Always Incremental Job Retention = 36 days
  Always Incremental Keep Number = 14
  Always Incremental Max Full Age = 31 days
}

JobDefs {
  Name = AI-Consolidate
  Type = Consolidate
  Storage = File
  Messages = Standard
  Pool = AI-Consolidate
  Priority = 25
  Write Bootstrap = "/storage/bootstrap/%c.bsr"
  Incremental Backup Pool = AI-Incremental
  Full Backup Pool = AI-Consolidate
  Max Full Consolidations = 1
  Prune Volumes = yes
  Accurate = yes
}

JobDefs {
  Name = LongTermDisk
  Type = Backup
  Level = VirtualFull
  Messages = Standard
  Pool = AI-Consolidate
  Priority = 30
  Write Bootstrap = "/storage/bootstrap/%c.bsr"
  Accurate = yes
  Run Script {
    console = "update jobid=%1 jobtype=A"
    Runs When = After
    Runs On Client = No
    Runs On Failure = No
  }
}

JobDefs {
  Name = "LongTermTape"
  Pool = LongTerm-Disk
  Messages = Standard
  Type = Migrate
}


---------------------------------------------------------------------------------------------------
Job Config ( per client )

Job {
  Name = "Incr-<client>"
  Description = "<client> always incremental 36d retention"
  Client = <client>
  Jobdefs = AI-Incremental
  FileSet="fileset-<client>"
  Schedule = "daily_incremental_<client>"
  # Write Bootstrap file for disaster recovery.
  Write Bootstrap = "/storage/bootstrap/%j.bsr"
  # The higher the number the lower the job priority
  Priority = 15
  Run Script {
    Console = ".bvfs_update jobid=%i"
    RunsWhen = After
    RunsOnClient = No
  }
}

Job {
  Name = "AI-Consolidate"
  Description = "consolidation of 'always incremental' jobs"
  Client = backup.mgmt.drs
  FileSet = SelfTest
  Jobdefs = AI-Consolidate
  Schedule = consolidate

  # The higher the number the lower the job priority
  Priority = 25
}

Job {
  Name = "VFull-<client>"
  Description = "<client> monthly virtual full"
  Messages = Standard
  Client = <client>
  Type = Backup
  Level = VirtualFull
  Jobdefs = LongTermDisk
  FileSet=fileset-<client>
  Pool = AI-Consolidate
  Schedule = virtual-full_<client>
  Priority = 30
  Run Script {
    Console = ".bvfs_update"
    RunsWhen = After
    RunsOnClient = No
  }
}

Job {
  Name = "migrate-2-tape"
  Description = "monthly migration of virtual full backups from LongTerm-Disk to LongTerm-Tape pool"
  Jobdefs = LongTermTape
  Selection Type = PoolTime
  Schedule = "migrate-2-tape"
  Priority = 15
  Run Script {
    Console = ".bvfs_update jobid=%i"
    RunsWhen = After
    RunsOnClient = No
  }
}

---------------------------------------------------------------------------------------------------
Schedule config:

Schedule {
  Name = "daily_incremental_<client>"
  Run = daily at 02:00
}

Schedule {
  Name = "consolidate"
  Run = Incremental 3/4 at 00:00
}

Schedule {
  Name = "virtual-full_<client>"
  Run = 1st monday at 10:00
}

Schedule {
  Name = "migrate-2-tape"
  Run = 2nd monday at 8:00
}

---------------------------------------------------------------------------------------------------
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

bruno-at-bareos

bruno-at-bareos

2022-07-27 16:43

developer   ~0004688

could you check in your working dir (/var/lib/bareos) of SD any other trace backtrace and debug files
if you have them, please attach them (eventually compressed).
jens

jens

2022-07-27 17:00

reporter   ~0004690

debug files attached in private note
bruno-at-bareos

bruno-at-bareos

2022-07-28 09:34

developer   ~0004697

What is the reason behind running 19.2 instead upgrading to 21 ?
jens

jens

2022-07-28 13:06

reporter   ~0004699

1. missing comprehensive and easy to follow step-by-step guide on how to upgrade
2. being unconfident about a flawless upgrade procedure without rendering backup data unusable
3. lack of experience and skilled personnel resulting in major effort to roll out new version
4. limited access to online repositories to update local mirrors -> very long lead time to get new versions
jens

jens

2022-07-28 13:09

reporter   ~0004700

Last edited: 2022-07-28 13:12

For the above reasons I am little hesitant to take the effort of upgrading.
Currently I am considering an update only if this will be the only chance to get the issue resolved.
I need confirmation from your end first.
My hope is that there is just something wrong in my configuration or I am running an adverse setup and changing either one might resolve the issue ?
bruno-at-bareos

bruno-at-bareos

2022-08-01 11:59

developer   ~0004701

Hi Jens,

Thanks for the complements.

Did this crash happen each time a consolidation VF is created ?
bruno-at-bareos

bruno-at-bareos

2022-08-01 12:04

developer   ~0004702

Maybe related to fixed in 19.2.9 (available with subscription)
 - fix a memory corruption when autolabeling with increased maxiumum block size
https://docs.bareos.org/bareos-19.2/Appendix/ReleaseNotes.html#id12
jens

jens

2022-08-01 12:05

reporter   ~0004703

Hi Bruno,

so far, yes, that is my experience.
It is always failing.
Also when repeating or manually rescheduling the failed job through the web-ui during idle hours where nothing else is running on the director.
jens

jens

2022-08-01 12:14

reporter   ~0004704

The "- fix a memory corruption when autolabeling with increased maximum block size" indeed could be a lead
as I see the following in the job logs ?

Warning: For Volume "AI-Consolidate_0118": The sizes do not match!
Volume=64574484 Catalog=32964717
Correcting Catalog
bruno-at-bareos

bruno-at-bareos

2022-08-02 13:42

developer   ~0004705

Hi Jens, a quick note about the size do not match, this is unrelated. Aborted or failed job can have this effect.

This Fix was introduce with this commit https://github.com/bareos/bareos/commit/0086b852d and the 19.2.9 has the fix.

Issue History

Date Modified Username Field Change
2022-07-27 16:12 jens New Issue
2022-07-27 16:43 bruno-at-bareos Note Added: 0004688
2022-07-27 17:00 jens Note Added: 0004690
2022-07-28 09:34 bruno-at-bareos Note Added: 0004697
2022-07-28 13:06 jens Note Added: 0004699
2022-07-28 13:09 jens Note Added: 0004700
2022-07-28 13:12 jens Note Edited: 0004700
2022-08-01 11:59 bruno-at-bareos Note Added: 0004701
2022-08-01 12:04 bruno-at-bareos Note Added: 0004702
2022-08-01 12:05 jens Note Added: 0004703
2022-08-01 12:14 jens Note Added: 0004704
2022-08-02 13:42 bruno-at-bareos Note Added: 0004705