View Issue Details

IDProjectCategoryView StatusLast Update
0000200bareos-corestorage daemonpublic2015-03-25 19:19
Reporterzacha Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
PlatformLinuxOSDebianOS Version6
Product Version13.2.0 
Fixed in Version13.3.0 
Summary0000200: Storage Daemon Tape Alert Useless / Defective
DescriptionWhile there is no documentation about how to use the Tape Alert directive in the SD despite the comments in the sample file it seems to be defective or at least useless. In the sample config the command is

smartctl -H -l error %c

Where $c is the changer device and not the Tape drive. This means that this can only be used with a Library if at all it can be, cause I doubt that most libraries will report smart data for their drives.

If you change the command line to

smartctl -H -l error %a

The tape drive is being queried but no data ca be read from it cause it is blocked by bacula at this time:

04-Jul 13:05 hbnas-sd JobId 11: Alert: Smartctl open device: /dev/nst0 failed: Device or resource busy
04-Jul 13:05 hbnas-sd JobId 11: 3997 Bad alert command: sh -c 'smartctl -H -l error /dev/nst0': ERR=Child exited with code 2.

If you change the tape drive's config to not be always opened it does not work anyway:

04-Jul 13:10 hbnas-sd JobId 15: Alert: Smartctl open device: /dev/nst0 failed: INQUIRY failed
04-Jul 13:10 hbnas-sd JobId 15: 3997 Bad alert command: sh -c 'smartctl -H -l error /dev/nst0': ERR=Child exited with code 2.

While the bacula sd is stopped the command can be run without problems. And if you chose the drive to be not "always open" you even can run it while the sd is running and sitting idle. It seems that the command is at least issued at the wrong time. The tape should be queried just at the beginning of the job as soon as a tape has been mounted (or is already mounted).
TagsNo tags attached.

Activities

zacha

zacha

2013-07-04 13:23

reporter   ~0000489

of course I mean bareos-sd not bacula.
mvwieringen adm

mvwieringen adm

2013-07-04 14:45

administrator   ~0000490

I think that you should run the tapealert on an SCSI generic device on Linux.

In Solaris I have this

/opt/ELMbareos/sbin/tapeinfo -f /dev/scsi/sequential/c0t0d0 | /bin/sed -n /TapeAlert/p

Which opens the tape drive (/dev/rmt/0ubn) via an sgen (SCSI generic) driver.

tapeinfo is part of the mtx package so you might have better luck with that.
mvwieringen adm

mvwieringen adm

2013-07-04 14:46

administrator   ~0000491

I think you have to hardcode the sg device however. e.g. don't use a % expansion.
zacha

zacha

2013-07-04 15:45

reporter   ~0000492

Hello!

I will try that- but I doubt I can query the sg device while the nst is in use. In fact on my test box I don't even have sg devices and I have to find out why first. But anyway this does not help if you have multiple tape devices (which will be the mayority of all cases I assume) and so you would always query a specific one, not that one that is currently being used by the job. Of course one could call a script with %a and let this one find the corresponding sg device but in lack of documentation I would not know what this script should return?

But about the tapeinfo part I already tried both- tapeinfo and smartctl (I have installed both btw) but tapeinfo isn't able to query the device too, while running the job.
zacha

zacha

2013-07-04 15:46

reporter   ~0000493

tapeinfo -f /dev/nst0
cannot open SCSI device '/dev/nst0' - Device or resource busy
mvwieringen

mvwieringen

2013-07-04 15:56

developer   ~0000495

Opening the tape device is never going to work e.g. you can only open
the tape device ones. so it being tapeinfo or smartctl doesn't matter.
We test with mhvtl on Linux and there we get a set of 10 sg devices
(4 drives, 1 robot interface, 4 other drives and an other robot interface).
And I'm always sure you CAN open an SG device in parallel, you only have
to map your sg devices with lsscsi. As to the fact that nst devices change
on Linux I thought that was fixed with udev rules eventually.
zacha

zacha

2013-07-04 16:11

reporter   ~0000496

hello marco. I was just googling a bit and found out that this is an issue with some versions of udev on debian squeeze so that no sg devices are being created. i will just test further and see if I can get one. on our production system we have those (sg devices) i will give further feedback as soon as i have been able to test with the sg device.
zacha

zacha

2013-07-08 15:54

reporter   ~0000511

Hello!

I tried again with scsi generic device. Indeed it is possible to open the sg device during the nst/st is in use. The tapealert query seems to work at least I don't see any warnings in my director messages when using this one:

Alert Command = "sh -c 'tapeinfo -f /dev/sg5 |grep TapeAlert|cat'"

But as I said befor it would be a nice enhancement when one could use a variable for this command. Documentation should be done for the Alert Command too. Is it possible to react in some fahsion when there is a tape alert active?
mvwieringen

mvwieringen

2013-07-09 08:56

developer   ~0000512

I looked into the config parser and you can also specify the changer command
as part of the definition of a non changer setup e.g. in the device section
of the drive. I agree that this is not very intuitive but that is how it
was designed in Bacula. The original Bacula documentation also has a segment
about the alert command so there is documentation.

Currently the alert command is also kind of useless as it only is added to
the Job report as information (as to why the job failed etc.)

Changing that would be a feature request which first needs some design as
to what would be really usefull. I would say we need to have a discussion
about that on one of the bareos mailinglists first and then create a
feature request in Mantis and then when we find the time there will
be an implementation.

For now you should just define the changer command in your device section
and use the %c substitution.
zacha

zacha

2013-07-09 09:28

reporter   ~0000513

hello marco,

thanks again for the reply. what you state is good workaround for a single tape drive- BUT does not work at leat with our changer devices.

host:~# tapeinfo -f /dev/changer1
Product Type: Medium Changer
Vendor ID: 'HP '
Product ID: 'MSL6000 Series '
Revision: '2.00'
Attached Changer API: No
SerialNumber: '80000090 '
SCSI ID: 0
SCSI LUN: 0
Ready: yes


host:~# tapeinfo -f /dev/changer0
Product Type: Medium Changer
Vendor ID: 'OVERLAND'
Product ID: 'NEO Series '
Revision: '0510'
Attached Changer API: No
SerialNumber: 'XXXXX'
TapeAlert[14]: Undefined.
TapeAlert[15]: Undefined.
SCSI ID: 1
SCSI LUN: 1
Ready: yes

I don't know why the Overland does not report the TapeAlert correctly at least it has the corresponding fields- but if it reported them at all there was not easy possibility to match the reported drives with the currently used ones- but in general it was good to know which tape drive currently has a problem- so bacula could try to avoid using this for further schedules backups until the operator has fixed the problen (e.g. cleaned the drive).

The MSL6000 is not a real tape library but a vtl for d2d backups- so a real MSL could possibly report the tapealert correctly.
mvwieringen

mvwieringen

2013-07-09 10:26

developer   ~0000514

I have read the tapealert web pages and it seems a changer should do the
right thing so it might be that its just a problem with these particular
devices.

http://www.tapealert.org/archives/23

As to reacting to tape alerts (which also covers changer alerts).
I think it makes more sense to reuse some of the low level scsi stuff we
added to Bareos for SCSI crypto support and create an extra storage daemon
event and then hook in a storage daemon plugin (analog to how scsi crypto
works) that interprets the tape alerts and does anything smart with it.
The problem with tapeinfo and smartctl is that they give the info a text
then we have to parse that again to perform the actual action. As we already
have most of the stuff for doing SCSI low level commands for the important
platforms a storage daemon plugin makes more sense.
zacha

zacha

2013-07-09 10:42

reporter   ~0000515

hello.

I agree that the changer should report the tape alert correctly and that this is most probably a malfunction of our changer's firmware. But what should bareos do with the information that SOME tape device in the changer has a problem? It still does not know how this tape device is referenced in it's sd's config. There has to be at least a matching of the changer's numbering to the bareos internal tape numbering- cause it should not matter for a job if another tape has some problem it- still should continue as normal- and the other way round it should somehow intercept if the tape that the current job is using has a tape alert set. If one added a particular config option for every tape drive - lets say

"Query Device Name = /dev/sg?"

and made this available lets say as %q, it could

a) directly be used in the tape alert command
b) bareos would directly know which drive has a problem

This would be very handy for any type of device. Of course the main part is afterwards reacting in a suitable way to this error.
mvwieringen adm

mvwieringen adm

2013-08-13 03:12

administrator   ~0000577

Fix committed to bareos master branch with changesetid 611.
mvwieringen

mvwieringen

2015-03-25 16:51

developer   ~0001492

Fix committed to bareos2015 bareos-14.2 branch with changesetid 5099.
joergs

joergs

2015-03-25 19:19

developer   ~0001639

Due to the reimport of the Github repository to bugs.bareos.org, the status of some tickets have been changed. These tickets will be closed again.
Sorry for the noise.

Related Changesets

bareos: master d518ff2b

2013-07-09 11:25

mvwieringen adm

Ported: N/A

Details Diff
Add config option for storing a diagnostic device.

For some setups with an autochanger you might want to query the
individual drives for tape alerts. As you cannot open the tape
device twice you need to access the drive via a SCSI generic
device. We now have a per device diagnostic device config variable
which you can expand using a %D in the tape alert cmdline. Normally
you should query the autochanger for tape alerts and that should also
report any tape drive errors but some devices implement this poorly and
as such it doesn't work. This option allows you to work around that and
actually ask the drive for any tape alerts.

Fixes 0000200: Storage Daemon Tape Alert Useless / Defective
Affected Issues
0000200
mod - src/stored/stored_conf.h Diff File
mod - src/stored/stored_conf.c Diff File
mod - src/stored/sd_plugins.c Diff File

bareos2015: bareos-14.2 6280d609

2013-07-09 13:25

mvwieringen

Ported: N/A

Details Diff
Add config option for storing a diagnostic device.

For some setups with an autochanger you might want to query the
individual drives for tape alerts. As you cannot open the tape
device twice you need to access the drive via a SCSI generic
device. We now have a per device diagnostic device config variable
which you can expand using a %D in the tape alert cmdline. Normally
you should query the autochanger for tape alerts and that should also
report any tape drive errors but some devices implement this poorly and
as such it doesn't work. This option allows you to work around that and
actually ask the drive for any tape alerts.

Fixes 0000200: Storage Daemon Tape Alert Useless / Defective
Affected Issues
0000200
mod - src/stored/sd_plugins.c Diff File
mod - src/stored/stored_conf.c Diff File
mod - src/stored/stored_conf.h Diff File

Issue History

Date Modified Username Field Change
2013-07-04 13:21 zacha New Issue
2013-07-04 13:23 zacha Note Added: 0000489
2013-07-04 14:45 mvwieringen adm Note Added: 0000490
2013-07-04 14:46 mvwieringen adm Note Added: 0000491
2013-07-04 14:46 mvwieringen adm Assigned To => mvwieringen adm
2013-07-04 14:46 mvwieringen adm Status new => feedback
2013-07-04 15:21 mvwieringen adm Assigned To mvwieringen adm => mvwieringen
2013-07-04 15:21 mvwieringen adm Status feedback => assigned
2013-07-04 15:45 zacha Note Added: 0000492
2013-07-04 15:46 zacha Note Added: 0000493
2013-07-04 15:56 mvwieringen Note Added: 0000495
2013-07-04 15:56 mvwieringen Status assigned => feedback
2013-07-04 16:08 mvwieringen Assigned To mvwieringen => pstorz
2013-07-04 16:08 mvwieringen Status feedback => assigned
2013-07-04 16:11 zacha Note Added: 0000496
2013-07-08 15:54 zacha Note Added: 0000511
2013-07-09 08:56 mvwieringen Note Added: 0000512
2013-07-09 08:56 mvwieringen Status assigned => feedback
2013-07-09 09:28 zacha Note Added: 0000513
2013-07-09 09:28 zacha Status feedback => assigned
2013-07-09 10:26 mvwieringen Note Added: 0000514
2013-07-09 10:42 zacha Note Added: 0000515
2013-07-10 10:03 mvwieringen Changeset attached => bareos master 2952144e
2013-07-10 10:03 mvwieringen Assigned To pstorz => mvwieringen
2013-07-10 10:03 mvwieringen Status assigned => resolved
2013-07-10 10:03 mvwieringen Resolution open => fixed
2013-07-12 17:20 mvwieringen adm Assigned To mvwieringen =>
2013-07-12 17:20 mvwieringen adm Status resolved => closed
2013-07-12 17:20 mvwieringen adm Fixed in Version => 13.3.0
2013-08-13 03:12 mvwieringen adm Changeset attached => bareos master d518ff2b
2013-08-13 03:12 mvwieringen adm Note Added: 0000577
2013-08-13 03:12 mvwieringen adm Assigned To => mvwieringen adm
2013-08-13 03:12 mvwieringen adm Status closed => resolved
2013-08-13 09:42 mvwieringen adm Assigned To mvwieringen adm =>
2013-08-13 09:42 mvwieringen adm Status resolved => closed
2015-03-25 16:51 mvwieringen Changeset attached => bareos2015 bareos-14.2 6280d609
2015-03-25 16:51 mvwieringen Note Added: 0001492
2015-03-25 16:51 mvwieringen Status closed => resolved
2015-03-25 19:19 joergs Note Added: 0001639
2015-03-25 19:19 joergs Status resolved => closed