View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001474 | bareos-core | storage daemon | public | 2022-07-27 16:12 | 2022-10-04 10:28 |
Reporter | jens | Assigned To | bruno-at-bareos | ||
Priority | normal | Severity | crash | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | Linux | OS | Debian | OS Version | 9 |
Product Version | 19.2.12 | ||||
Summary | 0001474: bareos-sd crashing on VirtualFull with SIGSEGV ../src/lib/serial.cc file not found | ||||
Description | When running 'always incremental' backup scheme the storage daemon crashes with segmentation fault on VirtualFull backup triggered by consolidation. Job error: bareos-dir JobId 1267: Fatal error: Director's comm line to SD dropped. GDB debug: bareos-sd (200): stored/mac.cc:159-154 joblevel from SOS_LABEL is now F bareos-sd (130): stored/label.cc:672-154 session_label record=ec015288 bareos-sd (150): stored/label.cc:718-154 Write sesson_label record JobId=154 FI=SOS_LABEL SessId=1 Strm=154 len=165 remainder=0 bareos-sd (150): stored/label.cc:722-154 Leave WriteSessionLabel Block=1351364161d File=0d bareos-sd (200): stored/mac.cc:221-154 before write JobId=154 FI=1 SessId=1 Strm=UNIX-Attributes-EX len=123 Thread 4 "bareos-sd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff4c5b700 (LWP 2271)] serial_uint32 (ptr=ptr@entry=0x7ffff4c5aa70, v=<optimized out>) at ../../../src/lib/serial.cc:76 76 ../../../src/lib/serial.cc: No such file or directory. I am running daily incrementals into 'File' pool, consolidating every 4 days into 'FileCons' pool, a virtual full every 1st Monday of a month into 'LongTerm-Disk' pool and finally a migration fto tape every 2nd Monday of a month from 'LongTerm-Disk' pool into 'LongTerm-Tape' pool. BareOS version: 19.2.7 BareOS director and storage daemon on the same machine. Disk storage on CEPH mount Tape storage on Fujitsu Eternus LT2 tape library with 1 LTO-7 drive --------------------------------------------------------------------------------------------------- Storage Device config: FileStorage with 10 devices, all into same 1st folder: Device { Name = FileStorage Media Type = File Archive Device = /storage/backup/bareos_Incremental # storage location LabelMedia = yes # lets Bareos label unlabeled media Random Access = yes # allow this device to be used by any job AutomaticMount = yes # when device opened, read it RemovableMedia = no # fixed media ( no tape, no usb ) AlwaysOpen = no Auto Inflate = both # auto-decompress in- out- stream Auto Deflate = both # auto-compress in- out- stream ( backup server side compression ) Auto Deflate Algorithm = LZ4HC # compression algorithm } .... FileStorageCons with 10 devices, all into same 2nd folder Name = FileStorageCons Media Type = FileCons Archive Device = /storage/backup/bareos_Consolidate # storage location LabelMedia = yes # lets Bareos label unlabeled media Random Access = yes # allow this device to be used by any job AutomaticMount = yes # when device opened, read it RemovableMedia = no # fixed media ( no tape, no usb ) AlwaysOpen = no Auto Inflate = both # auto-decompress in- out- stream Auto Deflate = both # auto-compress in- out- stream ( backup server side compression ) Auto Deflate Algorithm = LZ4HC # compression algorithm } ... FileStorageVault with 10 devices, all into same 3rd folder Name = FileStorageVault Media Type = FileVLT Archive Device = /storage/backup/bareos_LongTermDisk # storage location LabelMedia = yes # lets Bareos label unlabeled media Random Access = yes # allow this device to be used by any job AutomaticMount = yes # when device opened, read it RemovableMedia = no # fixed media ( no tape, no usb ) AlwaysOpen = no Auto Inflate = both # auto-decompress in- out- stream Auto Deflate = both # auto-compress in- out- stream ( backup server side compression ) Auto Deflate Algorithm = LZ4HC # compression algorithm } .... Tape Device: Device { Name = IBM-ULTRIUM-HH7 Device Type = Tape DriveIndex = 0 ArchiveDevice = /dev/nst0 Media Type = IBM-LTO-7 AutoChanger = yes AutomaticMount = yes LabelMedia = yes RemovableMedia = yes Autoselect = yes MaximumFileSize = 10GB Spool Directory = /storage/scratch Maximum Spool Size = 2199023255552 # maximum total spool size in bytes (2Tbyte) } --------------------------------------------------------------------------------------------------- Pool Config: Pool { Name = AI-Incremental # name of the media pool Pool Type = Backup # pool type Recycle = yes # BAReOS can automatically recycle volumes from that pool AutoPrune = yes # do not automatically prune expired volumes Volume Retention = 72 days Storage = File # storage device to be used Maximum Volume Bytes = 10G # maximum file size per volume Maximum Volumes = 500 # maximum allowed total number of volumes in pool Label Format = "AI-Incremental_" # volumes will be labeled "AI-Incremental_-<volume-id>" Volume Use Duration = 36 days # volume will be no longer used than Next Pool = AI-Consolidate # next pool for consolidation Job Retention = 72 days File Retention = 36 days } Pool { Name = AI-Consolidate # name of the media pool Pool Type = Backup # pool type Recycle = yes # BAReOS can automatically recycle volumes from that pool AutoPrune = yes # do not automatically prune expired volumes Volume Retention = 366 days Job Retention = 180 days File Retention = 93 days Storage = FileCons # storage device to be used Maximum Volume Bytes = 10G # maximum file size per volume Maximum Volumes = 1000 # maximum allowed total number of volumes in pool Label Format = "AI-Consolidate_" # volumes will be labeled "AI-Consolidate_-<volume-id>" Volume Use Duration = 2 days # volume will be no longer used than Next Pool = LongTerm-Disk # next pool for long term backups to disk } Pool { Name = LongTerm-Disk # name of the media pool Pool Type = Backup # pool type Recycle = yes # BAReOS can automatically recycle volumes from that pool AutoPrune = yes # do not automatically prune expired volumes Volume Retention = 732 days Job Retention = 732 days File Retention = 366 days Storage = FileVLT # storage device to be used Maximum Volume Bytes = 10G # maximum file size per volume Maximum Volumes = 1000 # maximum allowed total number of volumes in pool Label Format = "LongTerm-Disk_" # volumes will be labeled "LongTerm-Disk_<volume-id>" Volume Use Duration = 2 days # volume will be no longer used than Next Pool = LongTerm-Tape # next pool for long term backups to disk Migration Time = 2 days # Jobs older than 2 days in this pool will be migrated to 'Next Pool' } Pool { Name = LongTerm-Tape Pool Type = Backup Recycle = yes # Bareos can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 732 days # How long should the Backups be kept? (0000012) Job Retention = 732 days File Retention = 366 days Storage = TapeLibrary # Physical Media Maximum Block Size = 1048576 Recycle Pool = Scratch Cleaning Prefix = "CLN" } --------------------------------------------------------------------------------------------------- JobDefs: JobDefs { Name = AI-Incremental Type = Backup Level = Incremental Storage = File Messages = Standard Pool = AI-Incremental Incremental Backup Pool = AI-Incremental Full Backup Pool = AI-Consolidate Accurate = yes Allow Mixed Priority = yes Always Incremental = yes Always Incremental Job Retention = 36 days Always Incremental Keep Number = 14 Always Incremental Max Full Age = 31 days } JobDefs { Name = AI-Consolidate Type = Consolidate Storage = File Messages = Standard Pool = AI-Consolidate Priority = 25 Write Bootstrap = "/storage/bootstrap/%c.bsr" Incremental Backup Pool = AI-Incremental Full Backup Pool = AI-Consolidate Max Full Consolidations = 1 Prune Volumes = yes Accurate = yes } JobDefs { Name = LongTermDisk Type = Backup Level = VirtualFull Messages = Standard Pool = AI-Consolidate Priority = 30 Write Bootstrap = "/storage/bootstrap/%c.bsr" Accurate = yes Run Script { console = "update jobid=%1 jobtype=A" Runs When = After Runs On Client = No Runs On Failure = No } } JobDefs { Name = "LongTermTape" Pool = LongTerm-Disk Messages = Standard Type = Migrate } --------------------------------------------------------------------------------------------------- Job Config ( per client ) Job { Name = "Incr-<client>" Description = "<client> always incremental 36d retention" Client = <client> Jobdefs = AI-Incremental FileSet="fileset-<client>" Schedule = "daily_incremental_<client>" # Write Bootstrap file for disaster recovery. Write Bootstrap = "/storage/bootstrap/%j.bsr" # The higher the number the lower the job priority Priority = 15 Run Script { Console = ".bvfs_update jobid=%i" RunsWhen = After RunsOnClient = No } } Job { Name = "AI-Consolidate" Description = "consolidation of 'always incremental' jobs" Client = backup.mgmt.drs FileSet = SelfTest Jobdefs = AI-Consolidate Schedule = consolidate # The higher the number the lower the job priority Priority = 25 } Job { Name = "VFull-<client>" Description = "<client> monthly virtual full" Messages = Standard Client = <client> Type = Backup Level = VirtualFull Jobdefs = LongTermDisk FileSet=fileset-<client> Pool = AI-Consolidate Schedule = virtual-full_<client> Priority = 30 Run Script { Console = ".bvfs_update" RunsWhen = After RunsOnClient = No } } Job { Name = "migrate-2-tape" Description = "monthly migration of virtual full backups from LongTerm-Disk to LongTerm-Tape pool" Jobdefs = LongTermTape Selection Type = PoolTime Schedule = "migrate-2-tape" Priority = 15 Run Script { Console = ".bvfs_update jobid=%i" RunsWhen = After RunsOnClient = No } } --------------------------------------------------------------------------------------------------- Schedule config: Schedule { Name = "daily_incremental_<client>" Run = daily at 02:00 } Schedule { Name = "consolidate" Run = Incremental 3/4 at 00:00 } Schedule { Name = "virtual-full_<client>" Run = 1st monday at 10:00 } Schedule { Name = "migrate-2-tape" Run = 2nd monday at 8:00 } --------------------------------------------------------------------------------------------------- | ||||
Tags | No tags attached. | ||||
could you check in your working dir (/var/lib/bareos) of SD any other trace backtrace and debug files if you have them, please attach them (eventually compressed). |
|
debug files attached in private note | |
What is the reason behind running 19.2 instead upgrading to 21 ? | |
1. missing comprehensive and easy to follow step-by-step guide on how to upgrade 2. being unconfident about a flawless upgrade procedure without rendering backup data unusable 3. lack of experience and skilled personnel resulting in major effort to roll out new version 4. limited access to online repositories to update local mirrors -> very long lead time to get new versions |
|
For the above reasons I am little hesitant to take the effort of upgrading. Currently I am considering an update only if this will be the only chance to get the issue resolved. I need confirmation from your end first. My hope is that there is just something wrong in my configuration or I am running an adverse setup and changing either one might resolve the issue ? |
|
Hi Jens, Thanks for the complements. Did this crash happen each time a consolidation VF is created ? |
|
Maybe related to fixed in 19.2.9 (available with subscription) - fix a memory corruption when autolabeling with increased maxiumum block size https://docs.bareos.org/bareos-19.2/Appendix/ReleaseNotes.html#id12 |
|
Hi Bruno, so far, yes, that is my experience. It is always failing. Also when repeating or manually rescheduling the failed job through the web-ui during idle hours where nothing else is running on the director. |
|
The "- fix a memory corruption when autolabeling with increased maximum block size" indeed could be a lead as I see the following in the job logs ? Warning: For Volume "AI-Consolidate_0118": The sizes do not match! Volume=64574484 Catalog=32964717 Correcting Catalog |
|
Hi Jens, a quick note about the size do not match, this is unrelated. Aborted or failed job can have this effect. This Fix was introduce with this commit https://github.com/bareos/bareos/commit/0086b852d and the 19.2.9 has the fix. |
|
Closing as a fix already exist | |
Fix is present in source code and published subscription binaries. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-07-27 16:12 | jens | New Issue | |
2022-07-27 16:43 | bruno-at-bareos | Note Added: 0004688 | |
2022-07-27 17:00 | jens | Note Added: 0004690 | |
2022-07-28 09:34 | bruno-at-bareos | Note Added: 0004697 | |
2022-07-28 13:06 | jens | Note Added: 0004699 | |
2022-07-28 13:09 | jens | Note Added: 0004700 | |
2022-07-28 13:12 | jens | Note Edited: 0004700 | |
2022-08-01 11:59 | bruno-at-bareos | Note Added: 0004701 | |
2022-08-01 12:04 | bruno-at-bareos | Note Added: 0004702 | |
2022-08-01 12:05 | jens | Note Added: 0004703 | |
2022-08-01 12:14 | jens | Note Added: 0004704 | |
2022-08-02 13:42 | bruno-at-bareos | Note Added: 0004705 | |
2022-10-04 10:27 | bruno-at-bareos | Note Added: 0004800 | |
2022-10-04 10:28 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2022-10-04 10:28 | bruno-at-bareos | Status | new => resolved |
2022-10-04 10:28 | bruno-at-bareos | Resolution | open => fixed |
2022-10-04 10:28 | bruno-at-bareos | Note Added: 0004801 |