View Issue Details

IDProjectCategoryView StatusLast Update
0001180bareos-corestorage daemonpublic2020-02-11 17:19
Reporterantiduh Assigned Toarogge  
PriorityimmediateSeverityblockReproducibilityalways
Status closedResolutionfixed 
Platformx86-64OSFreeBSDOS Version12.1
Product Version19.2.5 
Fixed in Version19.2.6 
Summary0001180: CRC checksum algorithm changed between 18.2.7 and 19.2.5, all volumes fail
DescriptionI tried upgrading from 18.2.7 to 19.2.5 today.

After upgrading, every single time stored tries to load a volume, it results in a crc checksum error while reading the first block in the volume:

06-Feb 16:37 HomeFile JobId 8110: Error: stored/block.cc:350 Volume data error at 0:0!
Block checksum mismatch in block=0 len=192: calc=f625ccaf blk=64bbb13a
06-Feb 16:37 HomeFile JobId 8110: Warning: Volume "Incr-0012" not on device "FileStorage" (/space/bareos).
06-Feb 16:37 HomeFile JobId 8110: Marking Volume "Incr-0012" in Error in Catalog.

06-Feb 16:43 HomeFile JobId 8111: Error: stored/block.cc:350 Volume data error at 0:0!
Block checksum mismatch in block=0 len=208: calc=8dc2aa1c blk=466e7c3c
06-Feb 16:37 angst-fd JobId 8111: ACL support is enabled
06-Feb 16:43 HomeFile JobId 8111: Warning: Volume "Full-0005" not on device "FileStorage" (/space/bareos).
06-Feb 16:43 HomeFile JobId 8111: Marking Volume "Full-0005" in Error in Catalog.

I suspect the files are fine:

- The files are on a raidz1 3-disk ZFS pool, and ZFS reports no sha checksum errors.
- The volumes were working just fine this morning before the update.
- The drives are in good health and there are no IO errors reported by the OS.

I noticed that 6 months ago, you completely replaced the CRC implementation used:
https://github.com/bareos/bareos/commit/838aef14bddc69241221900ba962690ac9e26203#diff-5d8d24f7d7fbd164fc11057003dd30c5

Any chance you accidentally installed a CRC implementation that gives different answers than the historical implementation? I would expect that you tested this or that someone else would've run in to this, but it's a suspicious coincidence.

For what it's worth, Bareos is able to plow ahead and save the backup to a fresh volume after recycling.

I'm running on FreeBSD 12.1 on both the client and server machine.
     FreeBSD masheen.redacted.com 12.1-STABLE FreeBSD 12.1-STABLE r357422 GENERIC amd64

I tried upgrading from 18.2.7 to 19.2.5 today.

After upgrading, every single time stored tries to load a volume, it results in a crc checksum error while reading the first block in the volume:

06-Feb 16:37 HomeFile JobId 8110: Error: stored/block.cc:350 Volume data error at 0:0!
Block checksum mismatch in block=0 len=192: calc=f625ccaf blk=64bbb13a
06-Feb 16:37 HomeFile JobId 8110: Warning: Volume "Incr-0012" not on device "FileStorage" (/space/bareos).
06-Feb 16:37 HomeFile JobId 8110: Marking Volume "Incr-0012" in Error in Catalog.

06-Feb 16:43 HomeFile JobId 8111: Error: stored/block.cc:350 Volume data error at 0:0!
Block checksum mismatch in block=0 len=208: calc=8dc2aa1c blk=466e7c3c
06-Feb 16:37 angst-fd JobId 8111: ACL support is enabled
06-Feb 16:43 HomeFile JobId 8111: Warning: Volume "Full-0005" not on device "FileStorage" (/space/bareos).
06-Feb 16:43 HomeFile JobId 8111: Marking Volume "Full-0005" in Error in Catalog.

I suspect the files are fine:

- The files are on a raidz1 3-disk ZFS pool, and ZFS reports no sha checksum errors.
- The volumes were working just fine this morning before the update.
- The drives are in good health and there are no IO errors reported by the OS.

I noticed that 6 months ago, you completely replaced the CRC implementation used:
https://github.com/bareos/bareos/commit/838aef14bddc69241221900ba962690ac9e26203#diff-5d8d24f7d7fbd164fc11057003dd30c5

Any chance you accidentally installed a CRC implementation that gives different answers than the historical implementation? I would expect that you tested this or that someone else would've run in to this, but it's a suspicious coincidence.

For what it's worth, Bareos is able to plow ahead and save the backup to a fresh volume after recycling.

I'm running on FreeBSD 12.1 on both the client and server machine.
     FreeBSD masheen.redacted.com 12.1-STABLE FreeBSD 12.1-STABLE r357422 GENERIC amd64

Downgrading back down to 18.2.5 fixes everything aside from the volumes that are now tainted by the 19.2.5 CRC.
Steps To Reproduce1) Install 18.2.7
2) Create a fresh instance using file volumes
3) Back up some files
4) Install 19.2.5
5) Attempt to restore, observe failure
TagsNo tags attached.

Relationships

related to 0001177 closedarogge Release Bareos 19.2.6 

Activities

arogge

arogge

2020-02-07 07:50

manager   ~0003740

Hi,

thanks for writing a report.
The correctness of the CRC is tested automatically. However, such a test may or may not be 100% accurate. As far as I can tell the old and the new algorithm both yield the exact same results.
Nevertheless I'll try to reproduce your issue.

Do you see the same issue with volumes created on 19 when read in 18?
arogge

arogge

2020-02-07 08:00

manager   ~0003741

For what it's worth: if you run something like ZFS and don't require the checksumming, you can disable it with "Block Checksum" in the SD's device configuration. (I'm not encouraging you to do so and it will be hard to go back, but you can work around the problem with this).
arogge

arogge

2020-02-07 08:13

manager   ~0003742

I can use 18.2's "bls" on a volume written with 19.2, so at least it doesn't seem to be an obvious problem.
arogge

arogge

2020-02-07 08:20

manager   ~0003743

Is it possible to dump the first few blocks of your volumes using dd and attach these, so I have some test-data to work with? I know that this might contain some sensitive data, so please make sure it doesn't. However, it would really help me.
If you cannot provide this, maybe you could run a test yourself by trying to read the volumes written with 18.2/19.2 on FreeBSD with a bls/bscan/bextract on Linux (and 18.2/19.2).
arogge

arogge

2020-02-07 09:15

manager   ~0003745

I can reproduce this on FreeBSD.
However, it works on Linux and when I copy the volume from FreeBSD to a Linux machine it can be read.
So it looks like the block checksumming is broken on FreeBSD.
arogge

arogge

2020-02-07 11:29

manager   ~0003746

Looks like the endianess is not detected correctly on FreeBSD.
arogge

arogge

2020-02-07 17:05

manager   ~0003750

I have created a PR that will fix the problem on GitHub: https://github.com/bareos/bareos/pull/412
Testing packages are building right now and should show up at https://download.bareos.org/bareos/experimental/CD/PR-412/ later today.
I would be glad if you could check that this change actually fixes your problem. I have tested the change, but I also did the testing for the original change and it looked good to all of us.
antiduh

antiduh

2020-02-07 23:00

reporter   ~0003753

That's great news, I'll do a test sometime in the next few hours and let you know.

Thank you so much for the quick turnaround time on this!
antiduh

antiduh

2020-02-09 18:46

reporter   ~0003755

Took me a while to get the patch tested, but it looks like it's working just fine now. Thanks!
arogge

arogge

2020-02-10 10:39

manager   ~0003757

The fix has been merged into the master-branch and will be backported to 19.2. The next release 19.2.6 will contain a fix.
arogge

arogge

2020-02-10 11:22

manager   ~0003760

Fix committed to bareos bareos-19.2 branch with changesetid 12810.
arogge

arogge

2020-02-11 17:19

manager   ~0003786

Fixed in Bareos 19.2.6

Related Changesets

bareos: master 4e482a27

2020-02-07 11:30

arogge

Ported: N/A

Details Diff
tests: test crc32 with a real label block

Bug 0001180: CRC checksum algorithm changed between 18.2.7 and 19.2.5

Previously the crc32 tests did only rudimentary changes, but did not
check with a real bareos block. This patch now adds a dumped label block
from a test-installation and calculates the checksum for that.
This patch also changes the pattern for another test, so it triggers on
an endianess problem too.
Affected Issues
0001180
mod - core/src/tests/test_crc32.cc Diff File

bareos: bareos-19.2 bf4250b8

2020-02-07 11:30

arogge

Ported: N/A

Details Diff
tests: test crc32 with a real label block

Bug 0001180: CRC checksum algorithm changed between 18.2.7 and 19.2.5

Previously the crc32 tests did only rudimentary changes, but did not
check with a real bareos block. This patch now adds a dumped label block
from a test-installation and calculates the checksum for that.
This patch also changes the pattern for another test, so it triggers on
an endianess problem too.

(cherry picked from commit 4e482a27661ae6221e077811b714e6cb985fdb5e)
Affected Issues
0001180
mod - core/src/tests/test_crc32.cc Diff File

bareos: master ee0b908a

2020-02-07 17:58

arogge

Ported: N/A

Details Diff
stored: use correct algorithm on FreeBSD

Fixes 0001180: CRC checksum algorihm changed between 18.2.7 and 19.2.5

Previously crc32.cc did not detect when it couldn't
find out what endianess the machine was. This is now
fixed so that
1. FreeBSD detects endianess correctly
2. the compile fails when there is no __BYTE_ORDER
Affected Issues
0001180
mod - core/src/stored/crc32/crc32.cc Diff File

bareos: bareos-19.2 ace6c834

2020-02-07 17:58

arogge

Ported: N/A

Details Diff
stored: use correct algorithm on FreeBSD

Fixes 0001180: CRC checksum algorihm changed between 18.2.7 and 19.2.5

Previously crc32.cc did not detect when it couldn't
find out what endianess the machine was. This is now
fixed so that
1. FreeBSD detects endianess correctly
2. the compile fails when there is no __BYTE_ORDER

(cherry picked from commit ee0b908a19cd740379e483da01d3054c9ccfd0a9)
Affected Issues
0001180
mod - core/src/stored/crc32/crc32.cc Diff File

Issue History

Date Modified Username Field Change
2020-02-07 00:24 antiduh New Issue
2020-02-07 07:50 arogge Note Added: 0003740
2020-02-07 08:00 arogge Note Added: 0003741
2020-02-07 08:13 arogge Note Added: 0003742
2020-02-07 08:20 arogge Assigned To => arogge
2020-02-07 08:20 arogge Status new => feedback
2020-02-07 08:20 arogge Note Added: 0003743
2020-02-07 09:15 arogge Status feedback => confirmed
2020-02-07 09:15 arogge Note Added: 0003745
2020-02-07 11:29 arogge Note Added: 0003746
2020-02-07 17:05 arogge Note Added: 0003750
2020-02-07 23:00 antiduh Note Added: 0003753
2020-02-09 18:46 antiduh Note Added: 0003755
2020-02-10 10:37 arogge Relationship added related to 0001177
2020-02-10 10:39 arogge Status confirmed => resolved
2020-02-10 10:39 arogge Resolution open => fixed
2020-02-10 10:39 arogge Fixed in Version => 19.2.6
2020-02-10 10:39 arogge Note Added: 0003757
2020-02-10 11:22 arogge Changeset attached => bareos master ee0b908a
2020-02-10 11:22 arogge Changeset attached => bareos master 4e482a27
2020-02-10 11:22 arogge Changeset attached => bareos bareos-19.2 ace6c834
2020-02-10 11:22 arogge Changeset attached => bareos bareos-19.2 bf4250b8
2020-02-10 11:22 arogge Note Added: 0003760
2020-02-11 17:19 arogge Status resolved => closed
2020-02-11 17:19 arogge Note Added: 0003786