0000670: backup of large vmdks corrupt

ID	Project	Category	View Status	Date Submitted	Last Update

0000670	bareos-core	vmware plugin	public	2016-06-25 16:30	2023-12-13 13:32

Reporter	zacha	Assigned To	stephand
Priority	high	Severity	crash	Reproducibility	always
Status	closed	Resolution	fixed
Platform	Linux	OS	Debian	OS Version	8
Product Version	15.4.0

Summary	0000670: backup of large vmdks corrupt
Description	bareos-vmware-plugin 16.2.2 bareos-vmware-vix-disklib5 5.5.4-2454786 bareos-vadp-dumper 16.2.2 esxi 5.5 backup of large vmdk files (>2TB nominal capacity) is corrupt.
Steps To Reproduce	create a virtual disk of size > 2TB, can be thin provisioned. write any data on the disk create a second disk e.g. of size 1TB, thin provisioned too. copy the exact same data to the smaller disk. Does not need to be much data, just a gig or so. take a full backup of the virtual machine. try to restore the machine 1. the size of the backup will be smaller than the actual disk (we realized backup up a 4TB disk which is currently filled with 1,8TB data resulted in a ~240GB backup size which if of course not possible - no compression used in the backup job, we are compressing transparently in the fc target) 2. when trying to restore it will result in something like 25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file. Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000] 25-Jun 16:13 valerian-sd JobId 200273: Error: bsock_tcp.c:405 Write error sending 65536 bytes to client:127.0.0.1:9103: ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Error: restore.c:1239 Write error on /tmp/bareos-restores/VMS/onesty-tech-cb/support/augustus-old/[ha-10k] augustus-old/augustus-old_2.vmdk: Broken pipe 25-Jun 16:13 valerian-sd JobId 200273: Fatal error: read.c:154 Error sending to File daemon. ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file. Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000] I assume that this might be related in change of metadata possibly with disks > 2TB? if you like I can provide two example disks filled with exactly the same data, one will backup correctly, one not. what is really ugly is that the backup completes without any error (backup ok). we just noticed this occasionaly as we wondered the backup was much smaller as expected.
Tags	No tags attached.

Carl Lau 2016-07-30 02:13 reporter ~0002327 Last edited: 2016-07-30 02:14	maybe the problem is relative to this issue that have resolved by VDDK 6.0.2 release. Pleae refer to this link: https://www.vmware.com/support/developer/vddk/vddk-602-releasenotes.html "VDDK cannot HotAdd a > 2TB disk on VVol datastores. When trying to open a > 2TB virtual disk for writing on VVol datastores, the following error message appeared: “Failed to hot-add SCSI targets: Vmomi::MethodFault::Exception: vim.fault.GenericVmConfigFault.” The fix is actually not in VDDK but in ESXi hosts, which should be upgraded to vSphere 6.0 U2."

zacha 2016-11-28 09:04 reporter ~0002453	I cannot see any connection between the both issues at least it is not obvious to me. What did you lead to the assumption those bose might be related? Just that both stand in connection with > 2TB vmdk files? We don't even use virtual volumes on vsphere. Just plain old data stores. The > 2TB virtual disk works correctly just that the backed up files are corrupt. Anyone able to back up vmdk files > 2TB CONFIGURED size? Again, the disk does not have to have 2TB to test this, just have > 2TB configured. So can be easily tested with ANY datastore.

stephand 2019-01-16 18:12 developer ~0003201	This bug entry is very old, and nobody else added any comment. To be sure that this is a bug in the Bareos code, any information would be appreciated to know if this is problem still exists with newer vSphere versions (6.0/6.5) and the current Bareos version 17.2 where the VMWare Plugin is using VDDK 6.5.

zacha 2019-01-16 22:03 reporter ~0003202	I never resolved this. It was easy to reproduce as shown above. But I don't have no possibility anymore to reproduce because I am not working in this environment anymore. So I am not able to contribute.

cmlara 2019-05-12 09:02 reporter ~0003364	Tested on Vpshere 6.7 ESXi 6.7u2 -- Local storage only (single host) Created a new vm with a single disk 3TB in size. Install OS Reset CBT (needed because some of my testing messed with the CBT tracking making it think the entire disk needed to be backed up, this shouldn't normally occur to others and activating CBT per manual should be fine) Backup as per instructions in manual (backup is about 1gb so maybe this isn't enough to trigger but I think were good based on reading the docs) (note: I ended up accidentally performing the backup while the machine was powered off but since the backup works on a snapshot this shouldn't matter) Restored as per instructions in manual Boot VM and all looks good I'll be putting this into my environment soon enough to backup some larger VM's though I'm not sure yet if I am going to put it on the 2TB vm's or not (I'm starting to become inclined to do so as this may make life much easier) this will however take a bit longer as I'm still moving datasets around and the 2TB+ machines haven been provisioned yet. Per: https://kb.vmware.com/s/article/2058287 HotAdd/Extend was not permitted past 2TB for ESXi below 6.5. In addition per: https://code.vmware.com/doc/vddkDataStruct.5.5.html (linked from the bareos manual) the HotAdd limit for VMFS3 was also determined by block size (8MB block was 2TB, 4mb was 1TB, etc) this went away in VMFS5 Developers probably already know this but for a bit of history to make it easier to find: vSphere Storage APIs – Data Protection (formerly known as VADP) as noted on the vddkDataStruct page will choose 1 of 3 methods for how to download the virtual disk image. When the bareos-fd is running inside a GuestVM that has direct access to the same storage the preferred method is via HotAdd to the VM (the VMWARE API actually does a guest config change to the system holding bareos-fd to map the existing VMDK to it as an additional attached disk) which for files larger than 2TB will require ESXI 6.5 or above. If anyone can not upgrade to 6.5 and still hits this issue I believe they should be able to use the bareos-vmware plugin transport parameter and switch to to one of the other transports to avoid this. If I put this on the larger datasets and see it occur I will post so, but at the moment I'm feeling pretty confident based on VMWare's documentation that I shouldn't hit this as I'm running 6.7

lwidomski 2020-09-18 13:33 reporter ~0004041	Hello, Vpshere 6.5 ESXi 6.5 Bareos 19.2.7 I tested restore a virtual machine with a large vmdk (3TB) disk. The machine only restores correctly when restored to its original location. However, restore with the option to a local vmdk file does not work (localvmdk=yes). Restore JOB log: bareos-dir Start Restore Job RestoreFiles.2020-09-04_15.02.29_48 Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 Using Device "VTL1-Drive1" to read. Connected Client: bareos-vmware-fd at 10.201.xx.xx:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 Handshake: Immediate TLS Encryption: TLS_CHACHA20_POLY1305_SHA256 bareos-sd Ready to read from volume "VT1004L3" on device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst). Forward spacing Volume "VT1004L3" to file:block 55:0. bareos-vmware-fd-fd Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 bareos-vmware-fd-fd Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000] Error: filed/restore.cc:1296 Write error on /tmp/bareos-restores/VMS/FLT_Datacenter/TestEnvironment/w10test/[3PARb] w10test/w10test_1.vmdk: Przerwany potok Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000] Fatal error: python-fd: plugin_io[IO_CLOSE]: bareos_vadp_dumper returncode: 1 bareos-sd Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted. Fatal error: stored/read.cc:159 Error sending to File daemon. ERR=Połączenie zerwane przez drugą stronę Error: lib/bsock_tcp.cc:475 Socket has errors=1 on call to client:10.201.xx.xx:9103 Releasing device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst). bareos-dir Error: Bareos bareos-dir 19.2.7 (16Apr20): Build OS: Linux-3.10.0-1062.18.1.el7.x86_64 debian Debian GNU/Linux 10 (buster) JobId: 471 Job: RestoreFiles.2020-09-04_15.02.29_48 Restore Client: bareos-vmware-fd Start time: 04-wrz-2020 15:02:31 End time: 04-wrz-2020 15:07:53 Elapsed time: 5 mins 22 secs Files Expected: 4 Files Restored: 2 Bytes Restored: 29,709,370,403 Rate: 92265.1 KB/s FD Errors: 2 FD termination status: Fatal Error SD termination status: Fatal Error Bareos binary info: bareos.org build: Get official binaries and vendor support on bareos.com Can anyone confirm this?

stephand 2021-06-10 17:34 developer ~0004155	This is probably fixed with https://github.com/bareos/bareos/pull/826, Please check if it works with packages from http://download.bareos.org/bareos/experimental/nightly/ Currently I can't verify with disk > 2TB as vSphere 7 doesn't let me overcommit more than available datastore diskspace.

derk@twistedbytes.eu 2021-07-15 14:08 reporter ~0004177	I have something similar. Restores normally work, but with a plugin / pipe backup is does not. What i see is that the sd wants to connect to port 9103 on the client: XXXXXXXXXX-sd JobId 291630: Error: lib/bsock_tcp.cc:440 Wrote 49427 bytes to client:<IP6>:9103, but only 32768 accepted. In the previous reports the same: Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted. That port is not in use on the client. So why is the SD connecting to that port?

derk@twistedbytes.eu 2021-07-15 15:56 reporter ~0004178	Ignore me. I made an error in the restore shell script.

bruno-at-bareos 2023-12-13 13:32 manager ~0005625	was fixed

Date Modified	Username	Field	Change
2016-06-25 16:30	zacha	New Issue
2016-07-30 02:13	Carl Lau	Note Added: 0002327
2016-07-30 02:14	Carl Lau	Note Edited: 0002327
2016-11-28 09:04	zacha	Note Added: 0002453
2019-01-16 18:12	stephand	Assigned To	=> stephand
2019-01-16 18:12	stephand	Status	new => feedback
2019-01-16 18:12	stephand	Note Added: 0003201
2019-01-16 22:03	zacha	Note Added: 0003202
2019-01-16 22:03	zacha	Status	feedback => assigned
2019-05-12 09:02	cmlara	Note Added: 0003364
2020-09-18 13:33	lwidomski	Note Added: 0004041
2021-06-10 17:34	stephand	Status	assigned => feedback
2021-06-10 17:34	stephand	Note Added: 0004155
2021-07-15 14:08	derk@twistedbytes.eu	Note Added: 0004177
2021-07-15 15:56	derk@twistedbytes.eu	Note Added: 0004178
2023-12-13 13:32	bruno-at-bareos	Status	feedback => closed
2023-12-13 13:32	bruno-at-bareos	Resolution	open => fixed
2023-12-13 13:32	bruno-at-bareos	Note Added: 0005625

Reporting new Issues is disabled, please Report new Issues at https://github.com/bareos/bareos/issues

View Issue Details

Activities

Issue History