View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000435 | bareos-core | storage daemon | public | 2015-03-10 13:41 | 2015-07-10 14:09 |
Reporter | rOger | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | no change required | ||
Platform | Linux | OS | Ubuntu | OS Version | 12.04 |
Product Version | 14.2.2 | ||||
Summary | 0000435: bareos-sd with scsicrypto plugin and LTO6 drive will not reset drive properly after encryption key was used | ||||
Description | We are using a HP Ultrium-6 drive with LTO5 tapes and scsicrypto plugin enabled. After a successful backup job on a volume A another job on volume B will fail with Read/Write Error. The best work around we found is to power cycle the drive. After that the first backup run works like expected - the second one will fail again. | ||||
Steps To Reproduce | Basically it is easy; every time the backup starts the access to the drive will fail with a read/write error, see example logs of bareos in "Additional Information". Also if you trigger the backup job manually it will fail with the same error message. If you first do a power cycle of the drive (and of course with the same tape as before) the job will run without issues. As a remark: we want to have more than one backup on a volume (tape) so we defined the corresponding pool like: Pool { Name = WeeklyData Pool Type = Backup AutoPrune = Yes VolumeRetention = 2 weeks Maximum Volumes = 5 Maximum Volume Jobs = 3 Recycle = Yes Storage = TapeLTO5 Label Format = "WeeklyData-LTO5-" } | ||||
Additional Information | Environment: * Ubuntu 12.04, latest patches, no HWE (means still kernel 3.2.0-76-generic #111-Ubuntu SMP) * Tape Drive ** HP StoreEver Ultrium-6 6250 External SAS (HP ProductID: EH970A) ** Firmware 353D/Standalone (latest one) * The tests mentioned in the bareos manual where all successful Log lines of bareos: 01-Mar 23:30 fs0-sd JobId 45: Recycled volume "WeeklyData-LTO5-0001" on device "TapeStorageLTO5" (/dev/nst0), all previous data lost. 01-Mar 23:30 fs0-sd JobId 45: Error: block.c:631 Write error at 0:1 on device "TapeStorageLTO5" (/dev/nst0). ERR=Device or resource busy. 01-Mar 23:30 fs0-sd JobId 45: Error: block.c:645 Write error on fd=6 at file:blk 0:1 on device "TapeStorageLTO5" (/dev/nst0). ERR=Device or resource busy. 01-Mar 23:31 fs0-sd JobId 45: Error: block.c:1004 Read error on fd=6 at file:blk 0:4294967295 on device "TapeStorageLTO5" (/dev/nst0). ERR=Device or resource busy. 01-Mar 23:31 fs0-sd JobId 45: Error: Re-read last block at EOT failed. ERR=block.c:1004 Read error on fd=6 at file:blk 0:4294967295 on device "TapeStorageLTO5" (/dev/nst0). 01-Mar 23:31 fs0-sd JobId 45: End of medium on Volume "WeeklyData-LTO5-0001" Bytes=64,512 Blocks=0 at 01-Mar-2015 23:31. 01-Mar 23:31 fs0-sd JobId 45: Please mount append Volume "WeeklyData-LTO5-0003" or label a new one for: More information or configuration snippets are available upon request. | ||||
Tags | No tags attached. | ||||
Think for starters we need a debug output level 200 of the storage daemon to see what the plugin is doing and what not and why. e.g. bareos-sd -f -d 200 > logfile 2>&1 run as user bareos via su - bareos -s /bin/bash And probably the exact config of the tape device in the bareos-sd.conf |
|
As requested I attached the desired files (for security reasons we removed keys). | |
Ok log shows that the plugin thinks it doesn't need to clear the encryption key because no encryption is enabled on the drive. Given you have the following in your config: Query Crypto Status = yes; It means that the plugin queries the actual encryption status of the drive in determining ic crypto is enabled. You could change Query Crypto Status = yes; to Query Crypto Status = no; then it doesn't query the drive but in our testing that always worked ok. I would be interested in the output of status storage=<storage_name> when the volume is loaded as that prints the encryption status of the drive and uses the same low level SCSI page so we can see what the drive encryption status is. |
|
20150317-bareos-status-storage.txt (2,917 bytes)
*status storage=TapeLTO5 Connecting to Storage daemon TapeLTO5 at fs0-sd.terreactive.ch:9103 fs0-sd Version: 14.2.2 (12 December 2014) x86_64-pc-linux-gnu ubuntu Ubuntu 12.04 LTS Daemon started 11-Mar-15 11:47. Jobs: run=9, running=0. Heap: heap=270,336 smbytes=197,003 max_bytes=2,291,483 bufs=155 max_bufs=185 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0 bwlimit=0kB/s Plugin Info: Plugin : scsicrypto-sd.so Description: SCSI Encryption Storage Daemon Plugin Version : 1, Date: March 2012 Author : Marco van Wieringen License : Bareos AGPLv3 Usage : (No usage yet) Running Jobs: No Jobs running. ==== Jobs waiting to reserve a drive: ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name =================================================================== 62 Incr 391 121.8 M OK 10-Mar-15 23:16 DataBackup 63 Incr 367 1.257 G OK 11-Mar-15 23:16 DataBackup 64 Incr 463 63.22 M OK 12-Mar-15 23:16 DataBackup 65 Incr 324 46.41 M OK 13-Mar-15 23:16 DataBackup 66 Incr 4,958 1.709 G OK 14-Mar-15 23:16 DataBackup 67 Incr 238 18.92 M OK 15-Mar-15 23:16 DataBackup 68 Full 17 1.019 M Cancel 16-Mar-15 08:21 DataBackup 69 Full 17 1.019 M Cancel 16-Mar-15 08:45 DataBackup 70 Full 17 1.019 M Cancel 16-Mar-15 08:49 DataBackup 71 Incr 237 18.14 M OK 16-Mar-15 23:16 DataBackup ==== Device status: Device "FileStorage" (/bareos) is not open. Configured device capabilities: EOF BSR BSF FSR FSF EOM !REM RACCESS AUTOMOUNT LABEL !ANONVOLS ALWAYSOPEN Device state: !OPENED !TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 reserves=0 block=0 Attached Jobs: Device parameters: Archive name: /bareos Device name: FileStorage File=0 block=0 Min block=64512 Max block=64512 == Device "TapeStorageLTO5" (/dev/nst0) is not open. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS !ALWAYSOPEN Device state: !OPENED TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 reserves=0 block=0 Attached Jobs: Device parameters: Archive name: /dev/nst0 Device name: TapeStorageLTO5 File=0 block=0 Min block=64512 Max block=64512 == Device "TapeStorageLTO6" (/dev/nst0) is not open. Configured device capabilities: EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS !ALWAYSOPEN Device state: !OPENED TAPE !LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT !MOUNTED num_writers=0 reserves=0 block=0 Attached Jobs: Device parameters: Archive name: /dev/nst0 Device name: TapeStorageLTO6 File=0 block=0 Min block=0 Max block=1048576 == ==== Used Volume status: ==== ==== |
|
I attached the output of status storage=TapeLTO5. The tape was still in the drive. I'm not sure what you want to see. But due the fact we set AlwaysOpen=no I assume that you do not see what you expected. I tried thereafter a "mount" and then the output looks basically like the one already attached except the end with the info about Used Volume status: ---- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---- Used Volume status: WeeklyData-LTO5-0001 on device "TapeStorageLTO5" (/dev/nst0) Reader=0 writers=0 reserves=0 volinuse=0 ==== ==== ---- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---- Even the change of the mentioned flag (Query Crypto Status to "no") does not mean that the issue is solved, right? |
|
I was somewhat lured into thinking that the plugin didn't want to clear the key but it seems there are multiple backups in the debug log so the real info somehow got lost. Its seems this is what happens: fs0-sd: scsicrypto-sd.c:401-0 scsicrypto-sd: Loading new crypto key <key> ... fs0-sd: block.c:591-0 ===== write retry=1 status=-1 errno=16: ERR=Device or resource busy fs0-sd: block.c:591-0 ===== write retry=2 status=-1 errno=16: ERR=Device or resource busy fs0-sd: block.c:591-0 ===== write retry=3 status=-1 errno=16: ERR=Device or resource busy ... fs0-sd: scsicrypto-sd.c:457-0 scsicrypto-sd: Clearing crypto key So the plugin is setting the key and clearing it but after setting it the drive seems to go into some weird state where it doesn't want things to be written to it. Normally that should not happen even when you changed keys over time it will just mean you cannot read older data on the tape because of changing the key. But in this case it puts the drive into some weird mode. Maybe it makes more sense to test the encryption of the drive outside of the storage daemon using bscrypto (which essentially uses the same low level routines) on a test tape. e.g. disable the bareos-sd and insert a test tape into the drive. # bscrypto -s - /dev/sg? Type Some key^D # bscrypto -e /dev/sg? # tar cvf /dev/nst0 /etc # bscrypto -c /dev/sg? First sets a key, second gets the drive encryption status. Then we write some data using tar last clears the key again. That should at least show how the drive reacts to the crypto operations. |
|
I tried the thing you suggested: # bscrypto -s - /dev/sg5 Type Some key^D # bscrypto -e /dev/sg5 Drive encryption status: Encryption Mode: Encrypt Decryption Mode: Mixed Raw Decryption Mode Disabled (RDMD): Enabled Volume Contains Encrypted Logical Blocks (VCELB): Enabled Logical Block encryption parameters: Application Managed Key Associated Data (KAD) Descriptor: Normal key # tar cvf /dev/nst0 /etc # bscrypto -c /dev/sg5 After this reading from the tape was not possible -> OK because key was missing. Then I setup the key again and tried to reread the tape. Worked -> OK I cleared the key and ejected the tape and mounted it again. After setting up the key again I was able to read the backup from tape. -> OK So in general it seems to work. Could it be that the positioning operation on the tape could produce the error? Background: we want to make three backups on the same tape. I got a similar error in the case I forgot to rewind the tape to the beginning when I tried to do a restore/test with tar. |
|
I just uploaded a new log file of the storage daemon. From the logs it looks for me that bareos is not able to append to the tape if there is already a backup on it. As you can see in the logs it mounts the tape with the encryption key. Then it tries to write the "session_label". I do not understand the logs; it mentions that the writing of the label seems to be ok ("Leave" log line) but then the following block writing operation fails: This operation fails according to the logs with the lines "retry=[1..3] status=-1 errno=16: ERR=Device or resource busy". After some tries it give up on this tape and want a new one (volume). I'm not sure about this block write error if it belongs to the labeling or already data will be written. Is there a way to confirm this issue somehow? Could we add some debug code to get more debug information? |
|
The new log shows exactly the same as before. After setting the crypto key the drive gets inaccessible for any further write. Some things that are strange in your sd config: - Two storage entries accessing the same physical drive (May work but not really within the design envelope) Maybe change the media type to LTO (instead of LTO4/LTO5) - Always open to no on tape No idea why you set this but for tapes that is most of the time not the best option. e.g. is keeps also rewinding tapes this way to re-read the tape label. B.T.W. the tape label gets reread but it doesn't get updated after its written once to the tape. We also made the actual tape label unencrypted as then we can determine a encrypted tape vs a empty tape without a label. Have you tried using the default blocksize on the tape instead of this big blocksize ? Any messages in the unix messages given that the device gives an EBUSY I wonder if the ST driver logs something. |
|
* Two storage entries: As I understood this is to specify which media the drive supports. Unfortunatly a LTO6 drive does not automatically support all older LTO tapes; as an example LTO4 is not supported on such drives. I'm not aware that different media types could be specifed in "Media Type" so this was for me the "cleanest" solution. To be sure that this config section does not interferer with "our" problem I changed the device to /dev/nst1 * Always Open: We set this option to NO because we anyway eject the tape after a backup operation to bring it to a different location (keyword BCM). So when we set this option to YES we just have to unmount it after a backup job by hand - anyway as soon as the tape gets ejected it was rewound before. So in my opinion I do not see any benefit of setting this value to YES. Yes, we tried also with the default block size with the same error. Unfortunatly the ST driver does not log anything - is there a special option to force the ST driver to log something? |
|
You could try if setting always open to yes and release (e.g. not unmount) the device after the backup to see if that changes anything. Given the exact same setting using bscrypto work it seems the drive is somewhat sick as it locks itself with device busy after loading a key. Never seen that on any piece of other hardware so I'm out of ideas. Given the ST driver also doesn't log anything only option left if trying with always open = yes. If that also doesn't bring anything then I guess we are out of options for a non subscription bug report. |
|
As it turned out the problem was the setting Maximum Blocksize = 1048576; We removed it so it reverted to default and as it looks now it works like expected (as mentioned before). In my opinion the documentation should be adjusted because we followed the section to optimize the tape speed which suggests to adjust this parameter according to the defined procedure in the manual. Another side note: You must be careful with mounting of the tape; in our experience the encryption was only active if we mounted the tape manual before a scheduled task wrote data on it. From my point of view this bug can be closed. |
|
Given that each vendor implements its hardware different it can happen that it doesn't support hardware encryption with bigger blocksizes that was also one of the reasons that I advised earlier to test with the default blocksize. As to mounting the tape every trace I saw until now showed things worked fine as to loading the key and the right key was cleared or loaded when needed. The key gets loaded when the tape label is verified by an event that is then triggered and that is true for any invocation of a backup e.g. mounted volume or not. I will close the ticket and ask the original author of the tape tuning document to add a section on the EBUSY error the Linux ST driver seems to issue when it has buffering problems with bigger blocksizes. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2015-03-10 13:41 | rOger | New Issue | |
2015-03-10 16:29 | mvwieringen | Note Added: 0001305 | |
2015-03-10 16:30 | mvwieringen | Status | new => feedback |
2015-03-16 10:10 | rOger | File Added: bareos-sd.conf | |
2015-03-16 10:10 | rOger | File Added: 20150316-bareos-sd_NOT_WORKING.log | |
2015-03-16 10:11 | rOger | Note Added: 0001317 | |
2015-03-16 10:11 | rOger | Status | feedback => new |
2015-03-16 18:08 | mvwieringen | Note Added: 0001318 | |
2015-03-16 18:08 | mvwieringen | Status | new => feedback |
2015-03-17 08:20 | rOger | File Added: 20150317-bareos-status-storage.txt | |
2015-03-17 08:28 | rOger | Note Added: 0001319 | |
2015-03-17 08:28 | rOger | Status | feedback => new |
2015-03-17 14:43 | mvwieringen | Note Added: 0001320 | |
2015-03-17 14:43 | mvwieringen | Status | new => feedback |
2015-03-17 16:23 | rOger | Note Added: 0001322 | |
2015-03-17 16:23 | rOger | Status | feedback => new |
2015-03-23 10:18 | rOger | File Added: 20150323_bareos-sd.log | |
2015-03-23 10:59 | rOger | Note Added: 0001333 | |
2015-03-24 11:21 | mvwieringen | Note Added: 0001336 | |
2015-03-24 11:21 | mvwieringen | Status | new => feedback |
2015-03-26 16:43 | rOger | Note Added: 0001653 | |
2015-03-26 16:43 | rOger | Status | feedback => new |
2015-03-26 17:05 | mvwieringen | Note Added: 0001654 | |
2015-03-26 17:05 | mvwieringen | Status | new => feedback |
2015-07-01 08:53 | rOger | Note Added: 0001779 | |
2015-07-01 08:53 | rOger | Status | feedback => new |
2015-07-10 14:07 | mvwieringen | Note Added: 0001787 | |
2015-07-10 14:09 | mvwieringen | Status | new => closed |
2015-07-10 14:09 | mvwieringen | Resolution | open => no change required |