0001202: droplet backend (S3): Using MaximumConcurrentJobs > 1 causes restore errors although backup job terminated with "Backup OK" - Bareos Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0001202	bareos-core	storage daemon	public	2020-02-26 00:15	2023-08-23 13:56

Reporter	stephand	Assigned To	bruno-at-bareos
Priority	high	Severity	major	Reproducibility	always
Status	closed	Resolution	fixed
Product Version	19.2.6

Summary	0001202: droplet backend (S3): Using MaximumConcurrentJobs > 1 causes restore errors although backup job terminated with "Backup OK"
Description	When using the droplet S3 bareos-sd backend with MaximumConcurrentJobs > 1 and running concurrent jobs, backup terminates with "Backup OK" but problems occur on restore. This issue can be mitigated by using multiple storage devices each with MaximumConcurrentJobs = 1 as described in https://docs.bareos.org/TasksAndConcepts/StorageBackends.html#configuration or https://docs.bareos.org/Configuration/StorageDaemon.html#autochanger-resource
Steps To Reproduce	1. Configure as in the attached config 2. Run two concurrent jobs, can be done like this: echo -e "run job=S3Data1 yes\nrun job=S3Data2 yes" \|bconsole 3. Try to restore
Additional Information	The problematic part seems to happen when the first job terminates while the second is still writing: 25-Feb 17:42 bareos-sd JobId 238: Releasing device "S3_Dev1" (S3). 25-Feb 17:42 bareos-sd JobId 239: End of Volume "S3Full-0" at 1:4189926 on device "S3_Dev1" (S3). Write of 64512 bytes got -1. 25-Feb 17:42 bareos-sd JobId 239: End of medium on Volume "S3Full-0" Bytes=4,299,157,223 Blocks=66,642 at 25-Feb-2020 17:42. 25-Feb 17:42 bareos-dir JobId 239: Created new Volume "S3Full-1" in catalog. Restore of the first job restores the files but never terminates, bareos-sd debug messages repeats the following every 10s until terminating the process: 25-Feb-2020 22:22:08.359365 bareos-sd (100): backends/droplet_device.cc:949-250 get chunked_remote_volume_size(S3Full-0) 25-Feb-2020 22:22:08.363639 bareos-sd (100): backends/droplet_device.cc:240-250 chunk /S3Full-0/0000 exists. Calling callback. 25-Feb-2020 22:22:08.367575 bareos-sd (100): backends/droplet_device.cc:240-250 chunk /S3Full-0/0001 exists. Calling callback. ... 25-Feb-2020 22:22:08.568750 bareos-sd (100): backends/droplet_device.cc:240-250 chunk /S3Full-0/0040 exists. Calling callback. 25-Feb-2020 22:22:08.572301 bareos-sd (100): backends/droplet_device.cc:257-250 chunk /S3Full-0/0041 does not exists. Exiting. 25-Feb-2020 22:22:08.572366 bareos-sd (100): backends/droplet_device.cc:960-250 Size of volume /S3Full-0: 4246192871 25-Feb-2020 22:22:08.572402 bareos-sd (100): backends/chunked_device.cc:1256-250 volume: S3Full-0, chunked_remote_volume_size = 4246192871, VolCatInfo.VolCatBytes = 4299157223 25-Feb-2020 22:22:08.572436 bareos-sd (100): backends/chunked_device.cc:1262-250 volume S3Full-0 is pending, as 'remote volume size' = 4246192871 < 'catalog volume size' = 4299157223 Restoring the second job fails with 25-Feb 22:28 bareos2-fd JobId 253: Error: findlib/attribs.cc:441 File size of restored file /tmp/bareos-restores/data/2/testfile4.txt not correct. Original 545837338, restored 492912238. When larger chunksizes are configured, eg. chunksize=1000M, also the following error message appears before the File size not correct error: 21-Feb 17:36 bareos-sd JobId 13: Error: stored/block.cc:1127 Volume data error at 0:3145669842! Short block of 58157 bytes on device "S3_Wasabi1" (S3) discarded. 21-Feb 17:36 bareos-sd JobId 13: Error: stored/read_record.cc:256 stored/block.cc:1127 Volume data error at 0:3145669842! Short block of 58157 bytes on device "S3_Wasabi1" (S3) discarded.
Tags	No tags attached.

stephand 2020-02-26 00:24 developer	S3_Dev1.conf (1,516 bytes) Device { Name = S3_Dev1 Media Type = S3_T1 Archive Device = S3 Object Storage # # Device Options: # profile= - Droplet profile path, e.g. /etc/bareos/bareos-sd.d/device/droplet/droplet.profile # acl= - Canned ACL # storageclass= - Storage Class to use. # bucket= - Bucket to store objects in. # chunksize= - Size of Volume Chunks (default = 10 Mb) # iothreads= - Number of IO-threads to use for upload (use blocking uploads if not defined) # ioslots= - Number of IO-slots per IO-thread (0-255, default 10) # retries= - Number of retires if a write fails (0-255, default = 0, which means unlimited retries) # mmap= - Use mmap to allocate Chunk memory instead of malloc(). # location= - Deprecated. If required (AWS only), it has to be set in the Droplet profile. # # testing: #Device Options = "profile=/etc/bareos/bareos-sd.d/droplet/droplet.profile,bucket=bareos-backup-test1-bucket1,chunksize=100M,iothreads=0,retries=1" # performance: Device Options = "profile=/etc/bareos/bareos-sd.d/device/droplet/droplet.profile,bucket=bareos-backup-test1-bucket1,chunksize=100M" Device Type = droplet Label Media = yes # lets Bareos label unlabeled media Random Access = yes Automatic Mount = yes # when device opened, read it Removable Media = no Always Open = no Description = "S3 device on bareos-backup-test1-bucket1" Maximum Concurrent Jobs = 2 } S3_Dev1.conf (1,516 bytes)

stephand 2020-02-26 00:25 developer	S3_Storage1.conf (253 bytes) Storage { Name = S3_Storage1 Address = bareostest.example.com # N.B. Use a fully qualified name here (do not use "localhost" here). Password = "rrandomsecret" Device = S3_Dev1 Media Type = S3_T1 Maximum Concurrent Jobs = 2 } S3_Storage1.conf (253 bytes)

stephand 2020-02-26 00:26 developer	S3Job.conf (282 bytes) JobDefs { Name = "S3Job" Type = Backup Level = Full Client = bareos-fd FileSet = "SelfTest" Schedule = "WeeklyCycle" Storage = S3_Storage1 Messages = Standard Pool = S3Full Priority = 10 Write Bootstrap = "/var/lib/bareos/%c.bsr" Full Backup Pool = S3Full } S3Job.conf (282 bytes)

stephand 2020-02-26 00:26 developer	S3testjobs.conf (189 bytes) Job { Name = "S3Data1" JobDefs = "S3Job" FileSet = "Data1Set" Client = "bareos1-fd" } Job { Name = "S3Data2" JobDefs = "S3Job" FileSet = "Data2Set" Client = "bareos2-fd" } S3testjobs.conf (189 bytes)

stephand 2020-02-26 00:27 developer	testfilesets.conf (221 bytes) FileSet { Name = "Data1Set" Include { Options { signature = SHA1 } File = /data/1 } } FileSet { Name = "Data2Set" Include { Options { signature = SHA1 } File = /data/2 } } testfilesets.conf (221 bytes)

stephand 2020-02-26 00:27 developer	gen_testfile.py (228 bytes) #!/usr/bin/python import string max_size = 1024 * 1024 * 512 line_number = 0 size = 0 while size < max_size: line = str(line_number) + ":" + string.ascii_letters print(line) size += len(line) line_number += 1 gen_testfile.py (228 bytes)

stephand 2020-02-26 00:29 developer ~0003858	To reproduce with the attached config, create testfiles like this: mkdir -p /data/1 mkdir -p /data/2 ./gen_testfile.py > /data/1/testfile1.txt ./gen_testfile.py > /data/1/testfile2.txt ./gen_testfile.py > /data/2/testfile1.txt ./gen_testfile.py > /data/2/testfile2.txt

bruno-at-bareos 2023-08-23 13:56 manager ~0005348	MaximumConcurrentJob = 1 is recommended for all disk based jobs, will be enforced to 1 in 23

Date Modified	Username	Field	Change
2020-02-26 00:15	stephand	New Issue
2020-02-26 00:24	stephand	File Added: S3_Dev1.conf
2020-02-26 00:25	stephand	File Added: S3_Storage1.conf
2020-02-26 00:26	stephand	File Added: S3Job.conf
2020-02-26 00:26	stephand	File Added: S3testjobs.conf
2020-02-26 00:27	stephand	File Added: testfilesets.conf
2020-02-26 00:27	stephand	File Added: gen_testfile.py
2020-02-26 00:29	stephand	Note Added: 0003858
2023-08-23 13:56	bruno-at-bareos	Assigned To	=> bruno-at-bareos
2023-08-23 13:56	bruno-at-bareos	Status	new => closed
2023-08-23 13:56	bruno-at-bareos	Resolution	open => fixed
2023-08-23 13:56	bruno-at-bareos	Note Added: 0005348