View Issue Details

IDProjectCategoryView StatusLast Update
0000670bareos-corevmware pluginpublic2023-12-13 13:32
Reporterzacha Assigned Tostephand  
PriorityhighSeveritycrashReproducibilityalways
Status closedResolutionfixed 
PlatformLinuxOSDebianOS Version8
Product Version15.4.0 
Summary0000670: backup of large vmdks corrupt
Descriptionbareos-vmware-plugin 16.2.2
bareos-vmware-vix-disklib5 5.5.4-2454786
bareos-vadp-dumper 16.2.2
esxi 5.5

backup of large vmdk files (>2TB nominal capacity) is corrupt.
Steps To Reproducecreate a virtual disk of size > 2TB, can be thin provisioned.
write any data on the disk

create a second disk e.g. of size 1TB, thin provisioned too.
copy the exact same data to the smaller disk. Does not need to be much data, just a gig or so.

take a full backup of the virtual machine.

try to restore the machine

1. the size of the backup will be smaller than the actual disk
(we realized backup up a 4TB disk which is currently filled with 1,8TB data resulted in a ~240GB backup size which if of course not possible - no compression used in the backup job, we are compressing transparently in the fc target)
2. when trying to restore it will result in something like

25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output:
Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file.
Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000]

25-Jun 16:13 valerian-sd JobId 200273: Error: bsock_tcp.c:405 Write error sending 65536 bytes to client:127.0.0.1:9103: ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Error: restore.c:1239 Write error on /tmp/bareos-restores/VMS/onesty-tech-cb/support/augustus-old/[ha-10k] augustus-old/augustus-old_2.vmdk: Broken pipe 25-Jun 16:13 valerian-sd JobId 200273: Fatal error: read.c:154 Error sending to File daemon. ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output:
Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file.
Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000]


I assume that this might be related in change of metadata possibly with disks > 2TB?

if you like I can provide two example disks filled with exactly the same data, one will backup correctly, one not.

what is really ugly is that the backup completes without any error (backup ok). we just noticed this occasionaly as we wondered the backup was much smaller as expected.
TagsNo tags attached.

Activities

Carl Lau

Carl Lau

2016-07-30 02:13

reporter   ~0002327

Last edited: 2016-07-30 02:14

maybe the problem is relative to this issue that have resolved by VDDK 6.0.2 release.
Pleae refer to this link: https://www.vmware.com/support/developer/vddk/vddk-602-releasenotes.html

"VDDK cannot HotAdd a > 2TB disk on VVol datastores.
When trying to open a > 2TB virtual disk for writing on VVol datastores, the following error message appeared: “Failed to hot-add SCSI targets: Vmomi::MethodFault::Exception: vim.fault.GenericVmConfigFault.” The fix is actually not in VDDK but in ESXi hosts, which should be upgraded to vSphere 6.0 U2."

zacha

zacha

2016-11-28 09:04

reporter   ~0002453

I cannot see any connection between the both issues at least it is not obvious to me. What did you lead to the assumption those bose might be related? Just that both stand in connection with > 2TB vmdk files? We don't even use virtual volumes on vsphere. Just plain old data stores.

The > 2TB virtual disk works correctly just that the backed up files are corrupt.

Anyone able to back up vmdk files > 2TB CONFIGURED size? Again, the disk does not have to have 2TB to test this, just have > 2TB configured. So can be easily tested with ANY datastore.
stephand

stephand

2019-01-16 18:12

developer   ~0003201

This bug entry is very old, and nobody else added any comment.

To be sure that this is a bug in the Bareos code, any information would be appreciated to know if this is problem still exists with newer vSphere versions (6.0/6.5) and the current Bareos version 17.2 where the VMWare Plugin is using VDDK 6.5.
zacha

zacha

2019-01-16 22:03

reporter   ~0003202

I never resolved this. It was easy to reproduce as shown above. But I don't have no possibility anymore to reproduce because I am not working in this environment anymore. So I am not able to contribute.
cmlara

cmlara

2019-05-12 09:02

reporter   ~0003364

Tested on
Vpshere 6.7
ESXi 6.7u2 -- Local storage only (single host)

Created a new vm with a single disk 3TB in size.
Install OS
Reset CBT (needed because some of my testing messed with the CBT tracking making it think the entire disk needed to be backed up, this shouldn't normally occur to others and activating CBT per manual should be fine)
Backup as per instructions in manual (backup is about 1gb so maybe this isn't enough to trigger but I think were good based on reading the docs)
(note: I ended up accidentally performing the backup while the machine was powered off but since the backup works on a snapshot this shouldn't matter)

Restored as per instructions in manual
Boot VM and all looks good

I'll be putting this into my environment soon enough to backup some larger VM's though I'm not sure yet if I am going to put it on the 2TB vm's or not (I'm starting to become inclined to do so as this may make life much easier) this will however take a bit longer as I'm still moving datasets around and the 2TB+ machines haven been provisioned yet.

Per: https://kb.vmware.com/s/article/2058287 HotAdd/Extend was not permitted past 2TB for ESXi below 6.5.
In addition per: https://code.vmware.com/doc/vddkDataStruct.5.5.html (linked from the bareos manual) the HotAdd limit for VMFS3 was also determined by block size (8MB block was 2TB, 4mb was 1TB, etc) this went away in VMFS5

Developers probably already know this but for a bit of history to make it easier to find:
 vSphere Storage APIs – Data Protection (formerly known as VADP) as noted on the vddkDataStruct page will choose 1 of 3 methods for how to download the virtual disk image.

When the bareos-fd is running inside a GuestVM that has direct access to the same storage the preferred method is via HotAdd to the VM (the VMWARE API actually does a guest config change to the system holding bareos-fd to map the existing VMDK to it as an additional attached disk) which for files larger than 2TB will require ESXI 6.5 or above.

If anyone can not upgrade to 6.5 and still hits this issue I believe they should be able to use the bareos-vmware plugin transport parameter and switch to to one of the other transports to avoid this.

If I put this on the larger datasets and see it occur I will post so, but at the moment I'm feeling pretty confident based on VMWare's documentation that I shouldn't hit this as I'm running 6.7
lwidomski

lwidomski

2020-09-18 13:33

reporter   ~0004041

Hello,
Vpshere 6.5
ESXi 6.5
Bareos 19.2.7

I tested restore a virtual machine with a large vmdk (3TB) disk. The machine only restores correctly when restored to its original location. However, restore with the option to a local vmdk file does not work (localvmdk=yes).

Restore JOB log:

bareos-dir Start Restore Job RestoreFiles.2020-09-04_15.02.29_48
 Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256
 Using Device "VTL1-Drive1" to read.
 Connected Client: bareos-vmware-fd at 10.201.xx.xx:9102, encryption: TLS_CHACHA20_POLY1305_SHA256
  Handshake: Immediate TLS
  Encryption: TLS_CHACHA20_POLY1305_SHA256
bareos-sd Ready to read from volume "VT1004L3" on device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst).
 Forward spacing Volume "VT1004L3" to file:block 55:0.
bareos-vmware-fd-fd Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256
bareos-vmware-fd-fd Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output:
Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000]
 
Error: filed/restore.cc:1296 Write error on /tmp/bareos-restores/VMS/FLT_Datacenter/TestEnvironment/w10test/[3PARb] w10test/w10test_1.vmdk: Przerwany potok
 Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output:
Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000]
 Fatal error: python-fd: plugin_io[IO_CLOSE]: bareos_vadp_dumper returncode: 1
bareos-sd
Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted.
 Fatal error: stored/read.cc:159 Error sending to File daemon. ERR=Połączenie zerwane przez drugą stronę
 
Error: lib/bsock_tcp.cc:475 Socket has errors=1 on call to client:10.201.xx.xx:9103
 Releasing device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst).
bareos-dir
Error: Bareos bareos-dir 19.2.7 (16Apr20):
  Build OS: Linux-3.10.0-1062.18.1.el7.x86_64 debian Debian GNU/Linux 10 (buster)
  JobId: 471
  Job: RestoreFiles.2020-09-04_15.02.29_48
  Restore Client: bareos-vmware-fd
  Start time: 04-wrz-2020 15:02:31
  End time: 04-wrz-2020 15:07:53
  Elapsed time: 5 mins 22 secs
  Files Expected: 4
  Files Restored: 2
  Bytes Restored: 29,709,370,403
  Rate: 92265.1 KB/s
  FD Errors: 2
  FD termination status: Fatal Error
  SD termination status: Fatal Error
  Bareos binary info: bareos.org build: Get official binaries and vendor support on bareos.com

Can anyone confirm this?
stephand

stephand

2021-06-10 17:34

developer   ~0004155

This is probably fixed with https://github.com/bareos/bareos/pull/826,
Please check if it works with packages from http://download.bareos.org/bareos/experimental/nightly/
Currently I can't verify with disk > 2TB as vSphere 7 doesn't let me overcommit more than available datastore diskspace.
derk@twistedbytes.eu

derk@twistedbytes.eu

2021-07-15 14:08

reporter   ~0004177

I have something similar. Restores normally work, but with a plugin / pipe backup is does not.
What i see is that the sd wants to connect to port 9103 on the client:

XXXXXXXXXX-sd JobId 291630: Error: lib/bsock_tcp.cc:440 Wrote 49427 bytes to client:<IP6>:9103, but only 32768 accepted.
In the previous reports the same:
Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted.

That port is not in use on the client. So why is the SD connecting to that port?
derk@twistedbytes.eu

derk@twistedbytes.eu

2021-07-15 15:56

reporter   ~0004178

Ignore me. I made an error in the restore shell script.
bruno-at-bareos

bruno-at-bareos

2023-12-13 13:32

manager   ~0005625

was fixed

Issue History

Date Modified Username Field Change
2016-06-25 16:30 zacha New Issue
2016-07-30 02:13 Carl Lau Note Added: 0002327
2016-07-30 02:14 Carl Lau Note Edited: 0002327
2016-11-28 09:04 zacha Note Added: 0002453
2019-01-16 18:12 stephand Assigned To => stephand
2019-01-16 18:12 stephand Status new => feedback
2019-01-16 18:12 stephand Note Added: 0003201
2019-01-16 22:03 zacha Note Added: 0003202
2019-01-16 22:03 zacha Status feedback => assigned
2019-05-12 09:02 cmlara Note Added: 0003364
2020-09-18 13:33 lwidomski Note Added: 0004041
2021-06-10 17:34 stephand Status assigned => feedback
2021-06-10 17:34 stephand Note Added: 0004155
2021-07-15 14:08 derk@twistedbytes.eu Note Added: 0004177
2021-07-15 15:56 derk@twistedbytes.eu Note Added: 0004178
2023-12-13 13:32 bruno-at-bareos Status feedback => closed
2023-12-13 13:32 bruno-at-bareos Resolution open => fixed
2023-12-13 13:32 bruno-at-bareos Note Added: 0005625