View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000670 | bareos-core | vmware plugin | public | 2016-06-25 16:30 | 2023-12-13 13:32 |
Reporter | zacha | Assigned To | stephand | ||
Priority | high | Severity | crash | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | Debian | OS Version | 8 |
Product Version | 15.4.0 | ||||
Summary | 0000670: backup of large vmdks corrupt | ||||
Description | bareos-vmware-plugin 16.2.2 bareos-vmware-vix-disklib5 5.5.4-2454786 bareos-vadp-dumper 16.2.2 esxi 5.5 backup of large vmdk files (>2TB nominal capacity) is corrupt. | ||||
Steps To Reproduce | create a virtual disk of size > 2TB, can be thin provisioned. write any data on the disk create a second disk e.g. of size 1TB, thin provisioned too. copy the exact same data to the smaller disk. Does not need to be much data, just a gig or so. take a full backup of the virtual machine. try to restore the machine 1. the size of the backup will be smaller than the actual disk (we realized backup up a 4TB disk which is currently filled with 1,8TB data resulted in a ~240GB backup size which if of course not possible - no compression used in the backup job, we are compressing transparently in the fc target) 2. when trying to restore it will result in something like 25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file. Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000] 25-Jun 16:13 valerian-sd JobId 200273: Error: bsock_tcp.c:405 Write error sending 65536 bytes to client:127.0.0.1:9103: ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Error: restore.c:1239 Write error on /tmp/bareos-restores/VMS/onesty-tech-cb/support/augustus-old/[ha-10k] augustus-old/augustus-old_2.vmdk: Broken pipe 25-Jun 16:13 valerian-sd JobId 200273: Fatal error: read.c:154 Error sending to File daemon. ERR=Connection reset by peer 25-Jun 16:13 valerian-fd JobId 200273: Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Warning: VixDiskLib: Invalid configuration file parameter. Failed to read configuration file. Failed to create Logical Disk for /tmp/bareos-restores/[ha-10k] augustus-old/augustus-old_2.vmdk, One of the parameters supplied is invalid [16000] I assume that this might be related in change of metadata possibly with disks > 2TB? if you like I can provide two example disks filled with exactly the same data, one will backup correctly, one not. what is really ugly is that the backup completes without any error (backup ok). we just noticed this occasionaly as we wondered the backup was much smaller as expected. | ||||
Tags | No tags attached. | ||||
maybe the problem is relative to this issue that have resolved by VDDK 6.0.2 release. Pleae refer to this link: https://www.vmware.com/support/developer/vddk/vddk-602-releasenotes.html "VDDK cannot HotAdd a > 2TB disk on VVol datastores. When trying to open a > 2TB virtual disk for writing on VVol datastores, the following error message appeared: “Failed to hot-add SCSI targets: Vmomi::MethodFault::Exception: vim.fault.GenericVmConfigFault.” The fix is actually not in VDDK but in ESXi hosts, which should be upgraded to vSphere 6.0 U2." |
|
I cannot see any connection between the both issues at least it is not obvious to me. What did you lead to the assumption those bose might be related? Just that both stand in connection with > 2TB vmdk files? We don't even use virtual volumes on vsphere. Just plain old data stores. The > 2TB virtual disk works correctly just that the backed up files are corrupt. Anyone able to back up vmdk files > 2TB CONFIGURED size? Again, the disk does not have to have 2TB to test this, just have > 2TB configured. So can be easily tested with ANY datastore. |
|
This bug entry is very old, and nobody else added any comment. To be sure that this is a bug in the Bareos code, any information would be appreciated to know if this is problem still exists with newer vSphere versions (6.0/6.5) and the current Bareos version 17.2 where the VMWare Plugin is using VDDK 6.5. |
|
I never resolved this. It was easy to reproduce as shown above. But I don't have no possibility anymore to reproduce because I am not working in this environment anymore. So I am not able to contribute. | |
Tested on Vpshere 6.7 ESXi 6.7u2 -- Local storage only (single host) Created a new vm with a single disk 3TB in size. Install OS Reset CBT (needed because some of my testing messed with the CBT tracking making it think the entire disk needed to be backed up, this shouldn't normally occur to others and activating CBT per manual should be fine) Backup as per instructions in manual (backup is about 1gb so maybe this isn't enough to trigger but I think were good based on reading the docs) (note: I ended up accidentally performing the backup while the machine was powered off but since the backup works on a snapshot this shouldn't matter) Restored as per instructions in manual Boot VM and all looks good I'll be putting this into my environment soon enough to backup some larger VM's though I'm not sure yet if I am going to put it on the 2TB vm's or not (I'm starting to become inclined to do so as this may make life much easier) this will however take a bit longer as I'm still moving datasets around and the 2TB+ machines haven been provisioned yet. Per: https://kb.vmware.com/s/article/2058287 HotAdd/Extend was not permitted past 2TB for ESXi below 6.5. In addition per: https://code.vmware.com/doc/vddkDataStruct.5.5.html (linked from the bareos manual) the HotAdd limit for VMFS3 was also determined by block size (8MB block was 2TB, 4mb was 1TB, etc) this went away in VMFS5 Developers probably already know this but for a bit of history to make it easier to find: vSphere Storage APIs – Data Protection (formerly known as VADP) as noted on the vddkDataStruct page will choose 1 of 3 methods for how to download the virtual disk image. When the bareos-fd is running inside a GuestVM that has direct access to the same storage the preferred method is via HotAdd to the VM (the VMWARE API actually does a guest config change to the system holding bareos-fd to map the existing VMDK to it as an additional attached disk) which for files larger than 2TB will require ESXI 6.5 or above. If anyone can not upgrade to 6.5 and still hits this issue I believe they should be able to use the bareos-vmware plugin transport parameter and switch to to one of the other transports to avoid this. If I put this on the larger datasets and see it occur I will post so, but at the moment I'm feeling pretty confident based on VMWare's documentation that I shouldn't hit this as I'm running 6.7 |
|
Hello, Vpshere 6.5 ESXi 6.5 Bareos 19.2.7 I tested restore a virtual machine with a large vmdk (3TB) disk. The machine only restores correctly when restored to its original location. However, restore with the option to a local vmdk file does not work (localvmdk=yes). Restore JOB log: bareos-dir Start Restore Job RestoreFiles.2020-09-04_15.02.29_48 Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 Using Device "VTL1-Drive1" to read. Connected Client: bareos-vmware-fd at 10.201.xx.xx:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 Handshake: Immediate TLS Encryption: TLS_CHACHA20_POLY1305_SHA256 bareos-sd Ready to read from volume "VT1004L3" on device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst). Forward spacing Volume "VT1004L3" to file:block 55:0. bareos-vmware-fd-fd Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 bareos-vmware-fd-fd Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000] Error: filed/restore.cc:1296 Write error on /tmp/bareos-restores/VMS/FLT_Datacenter/TestEnvironment/w10test/[3PARb] w10test/w10test_1.vmdk: Przerwany potok Fatal error: python-fd: check_dumper(): bareos_vadp_dumper returncode: 1 error output: Failed to create Logical Disk for /tmp/bareos-restores/[3PARb] w10test/w10test_1.vmdk, One of the parameters supplied is invalid [16000] Fatal error: python-fd: plugin_io[IO_CLOSE]: bareos_vadp_dumper returncode: 1 bareos-sd Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted. Fatal error: stored/read.cc:159 Error sending to File daemon. ERR=Połączenie zerwane przez drugą stronę Error: lib/bsock_tcp.cc:475 Socket has errors=1 on call to client:10.201.xx.xx:9103 Releasing device "VTL1-Drive1" (/dev/tape/by-id/scsi-1IBM_ULT3580-TD3_0521027762-nst). bareos-dir Error: Bareos bareos-dir 19.2.7 (16Apr20): Build OS: Linux-3.10.0-1062.18.1.el7.x86_64 debian Debian GNU/Linux 10 (buster) JobId: 471 Job: RestoreFiles.2020-09-04_15.02.29_48 Restore Client: bareos-vmware-fd Start time: 04-wrz-2020 15:02:31 End time: 04-wrz-2020 15:07:53 Elapsed time: 5 mins 22 secs Files Expected: 4 Files Restored: 2 Bytes Restored: 29,709,370,403 Rate: 92265.1 KB/s FD Errors: 2 FD termination status: Fatal Error SD termination status: Fatal Error Bareos binary info: bareos.org build: Get official binaries and vendor support on bareos.com Can anyone confirm this? |
|
This is probably fixed with https://github.com/bareos/bareos/pull/826, Please check if it works with packages from http://download.bareos.org/bareos/experimental/nightly/ Currently I can't verify with disk > 2TB as vSphere 7 doesn't let me overcommit more than available datastore diskspace. |
|
I have something similar. Restores normally work, but with a plugin / pipe backup is does not. What i see is that the sd wants to connect to port 9103 on the client: XXXXXXXXXX-sd JobId 291630: Error: lib/bsock_tcp.cc:440 Wrote 49427 bytes to client:<IP6>:9103, but only 32768 accepted. In the previous reports the same: Error: lib/bsock_tcp.cc:435 Wrote 40626 bytes to client:10.201.xx.xx:9103, but only 32768 accepted. That port is not in use on the client. So why is the SD connecting to that port? |
|
Ignore me. I made an error in the restore shell script. | |
was fixed | |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-06-25 16:30 | zacha | New Issue | |
2016-07-30 02:13 | Carl Lau | Note Added: 0002327 | |
2016-07-30 02:14 | Carl Lau | Note Edited: 0002327 | |
2016-11-28 09:04 | zacha | Note Added: 0002453 | |
2019-01-16 18:12 | stephand | Assigned To | => stephand |
2019-01-16 18:12 | stephand | Status | new => feedback |
2019-01-16 18:12 | stephand | Note Added: 0003201 | |
2019-01-16 22:03 | zacha | Note Added: 0003202 | |
2019-01-16 22:03 | zacha | Status | feedback => assigned |
2019-05-12 09:02 | cmlara | Note Added: 0003364 | |
2020-09-18 13:33 | lwidomski | Note Added: 0004041 | |
2021-06-10 17:34 | stephand | Status | assigned => feedback |
2021-06-10 17:34 | stephand | Note Added: 0004155 | |
2021-07-15 14:08 | derk@twistedbytes.eu | Note Added: 0004177 | |
2021-07-15 15:56 | derk@twistedbytes.eu | Note Added: 0004178 | |
2023-12-13 13:32 | bruno-at-bareos | Status | feedback => closed |
2023-12-13 13:32 | bruno-at-bareos | Resolution | open => fixed |
2023-12-13 13:32 | bruno-at-bareos | Note Added: 0005625 |