View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update | 
|---|---|---|---|---|---|
| 0000200 | bareos-core | storage daemon | public | 2013-07-04 13:21 | 2015-03-25 19:19 | 
| Reporter | zacha | Assigned To | |||
| Priority | normal | Severity | minor | Reproducibility | always | 
| Status | closed | Resolution | fixed | ||
| Platform | Linux | OS | Debian | OS Version | 6 | 
| Product Version | 13.2.0 | ||||
| Fixed in Version | 13.3.0 | ||||
| Summary | 0000200: Storage Daemon Tape Alert Useless / Defective | ||||
| Description | While there is no documentation about how to use the Tape Alert directive in the SD despite the comments in the sample file it seems to be defective or at least useless. In the sample config the command is smartctl -H -l error %c Where $c is the changer device and not the Tape drive. This means that this can only be used with a Library if at all it can be, cause I doubt that most libraries will report smart data for their drives. If you change the command line to smartctl -H -l error %a The tape drive is being queried but no data ca be read from it cause it is blocked by bacula at this time: 04-Jul 13:05 hbnas-sd JobId 11: Alert: Smartctl open device: /dev/nst0 failed: Device or resource busy 04-Jul 13:05 hbnas-sd JobId 11: 3997 Bad alert command: sh -c 'smartctl -H -l error /dev/nst0': ERR=Child exited with code 2. If you change the tape drive's config to not be always opened it does not work anyway: 04-Jul 13:10 hbnas-sd JobId 15: Alert: Smartctl open device: /dev/nst0 failed: INQUIRY failed 04-Jul 13:10 hbnas-sd JobId 15: 3997 Bad alert command: sh -c 'smartctl -H -l error /dev/nst0': ERR=Child exited with code 2. While the bacula sd is stopped the command can be run without problems. And if you chose the drive to be not "always open" you even can run it while the sd is running and sitting idle. It seems that the command is at least issued at the wrong time. The tape should be queried just at the beginning of the job as soon as a tape has been mounted (or is already mounted). | ||||
| Tags | No tags attached. | ||||
| of course I mean bareos-sd not bacula. | |
| I think that you should run the tapealert on an SCSI generic device on Linux. In Solaris I have this /opt/ELMbareos/sbin/tapeinfo -f /dev/scsi/sequential/c0t0d0 | /bin/sed -n /TapeAlert/p Which opens the tape drive (/dev/rmt/0ubn) via an sgen (SCSI generic) driver. tapeinfo is part of the mtx package so you might have better luck with that. | |
| I think you have to hardcode the sg device however. e.g. don't use a % expansion. | |
| Hello! I will try that- but I doubt I can query the sg device while the nst is in use. In fact on my test box I don't even have sg devices and I have to find out why first. But anyway this does not help if you have multiple tape devices (which will be the mayority of all cases I assume) and so you would always query a specific one, not that one that is currently being used by the job. Of course one could call a script with %a and let this one find the corresponding sg device but in lack of documentation I would not know what this script should return? But about the tapeinfo part I already tried both- tapeinfo and smartctl (I have installed both btw) but tapeinfo isn't able to query the device too, while running the job. | |
| tapeinfo -f /dev/nst0 cannot open SCSI device '/dev/nst0' - Device or resource busy | |
| Opening the tape device is never going to work e.g. you can only open the tape device ones. so it being tapeinfo or smartctl doesn't matter. We test with mhvtl on Linux and there we get a set of 10 sg devices (4 drives, 1 robot interface, 4 other drives and an other robot interface). And I'm always sure you CAN open an SG device in parallel, you only have to map your sg devices with lsscsi. As to the fact that nst devices change on Linux I thought that was fixed with udev rules eventually. | |
| hello marco. I was just googling a bit and found out that this is an issue with some versions of udev on debian squeeze so that no sg devices are being created. i will just test further and see if I can get one. on our production system we have those (sg devices) i will give further feedback as soon as i have been able to test with the sg device. | |
| Hello! I tried again with scsi generic device. Indeed it is possible to open the sg device during the nst/st is in use. The tapealert query seems to work at least I don't see any warnings in my director messages when using this one: Alert Command = "sh -c 'tapeinfo -f /dev/sg5 |grep TapeAlert|cat'" But as I said befor it would be a nice enhancement when one could use a variable for this command. Documentation should be done for the Alert Command too. Is it possible to react in some fahsion when there is a tape alert active? | |
| I looked into the config parser and you can also specify the changer command as part of the definition of a non changer setup e.g. in the device section of the drive. I agree that this is not very intuitive but that is how it was designed in Bacula. The original Bacula documentation also has a segment about the alert command so there is documentation. Currently the alert command is also kind of useless as it only is added to the Job report as information (as to why the job failed etc.) Changing that would be a feature request which first needs some design as to what would be really usefull. I would say we need to have a discussion about that on one of the bareos mailinglists first and then create a feature request in Mantis and then when we find the time there will be an implementation. For now you should just define the changer command in your device section and use the %c substitution. | |
| hello marco, thanks again for the reply. what you state is good workaround for a single tape drive- BUT does not work at leat with our changer devices. host:~# tapeinfo -f /dev/changer1 Product Type: Medium Changer Vendor ID: 'HP ' Product ID: 'MSL6000 Series ' Revision: '2.00' Attached Changer API: No SerialNumber: '80000090 ' SCSI ID: 0 SCSI LUN: 0 Ready: yes host:~# tapeinfo -f /dev/changer0 Product Type: Medium Changer Vendor ID: 'OVERLAND' Product ID: 'NEO Series ' Revision: '0510' Attached Changer API: No SerialNumber: 'XXXXX' TapeAlert[14]: Undefined. TapeAlert[15]: Undefined. SCSI ID: 1 SCSI LUN: 1 Ready: yes I don't know why the Overland does not report the TapeAlert correctly at least it has the corresponding fields- but if it reported them at all there was not easy possibility to match the reported drives with the currently used ones- but in general it was good to know which tape drive currently has a problem- so bacula could try to avoid using this for further schedules backups until the operator has fixed the problen (e.g. cleaned the drive). The MSL6000 is not a real tape library but a vtl for d2d backups- so a real MSL could possibly report the tapealert correctly. | |
| I have read the tapealert web pages and it seems a changer should do the right thing so it might be that its just a problem with these particular devices. http://www.tapealert.org/archives/23 As to reacting to tape alerts (which also covers changer alerts). I think it makes more sense to reuse some of the low level scsi stuff we added to Bareos for SCSI crypto support and create an extra storage daemon event and then hook in a storage daemon plugin (analog to how scsi crypto works) that interprets the tape alerts and does anything smart with it. The problem with tapeinfo and smartctl is that they give the info a text then we have to parse that again to perform the actual action. As we already have most of the stuff for doing SCSI low level commands for the important platforms a storage daemon plugin makes more sense. | |
| hello. I agree that the changer should report the tape alert correctly and that this is most probably a malfunction of our changer's firmware. But what should bareos do with the information that SOME tape device in the changer has a problem? It still does not know how this tape device is referenced in it's sd's config. There has to be at least a matching of the changer's numbering to the bareos internal tape numbering- cause it should not matter for a job if another tape has some problem it- still should continue as normal- and the other way round it should somehow intercept if the tape that the current job is using has a tape alert set. If one added a particular config option for every tape drive - lets say "Query Device Name = /dev/sg?" and made this available lets say as %q, it could a) directly be used in the tape alert command b) bareos would directly know which drive has a problem This would be very handy for any type of device. Of course the main part is afterwards reacting in a suitable way to this error. | |
| Fix committed to bareos master branch with changesetid 611. | |
| Fix committed to bareos2015 bareos-14.2 branch with changesetid 5099. | |
| Due to the reimport of the Github repository to bugs.bareos.org, the status of some tickets have been changed. These tickets will be closed again. Sorry for the noise. | |
| bareos: master d518ff2b 2013-07-09 11:25 
		  Ported: N/ADetails Diff | Add config option for storing a diagnostic device. For some setups with an autochanger you might want to query the individual drives for tape alerts. As you cannot open the tape device twice you need to access the drive via a SCSI generic device. We now have a per device diagnostic device config variable which you can expand using a %D in the tape alert cmdline. Normally you should query the autochanger for tape alerts and that should also report any tape drive errors but some devices implement this poorly and as such it doesn't work. This option allows you to work around that and actually ask the drive for any tape alerts. Fixes 0000200: Storage Daemon Tape Alert Useless / Defective | Affected Issues 0000200 | |
| mod - src/stored/stored_conf.h | Diff File | ||
| mod - src/stored/stored_conf.c | Diff File | ||
| mod - src/stored/sd_plugins.c | Diff File | ||
| bareos2015: bareos-14.2 6280d609 2013-07-09 13:25 Ported: N/ADetails Diff | Add config option for storing a diagnostic device. For some setups with an autochanger you might want to query the individual drives for tape alerts. As you cannot open the tape device twice you need to access the drive via a SCSI generic device. We now have a per device diagnostic device config variable which you can expand using a %D in the tape alert cmdline. Normally you should query the autochanger for tape alerts and that should also report any tape drive errors but some devices implement this poorly and as such it doesn't work. This option allows you to work around that and actually ask the drive for any tape alerts. Fixes 0000200: Storage Daemon Tape Alert Useless / Defective | Affected Issues 0000200 | |
| mod - src/stored/sd_plugins.c | Diff File | ||
| mod - src/stored/stored_conf.c | Diff File | ||
| mod - src/stored/stored_conf.h | Diff File | ||
| Date Modified | Username | Field | Change | 
|---|---|---|---|
| 2013-07-04 13:21 | zacha | New Issue | |
| 2013-07-04 13:23 | zacha | Note Added: 0000489 | |
| 2013-07-04 14:45 |  | Note Added: 0000490 | |
| 2013-07-04 14:46 |  | Note Added: 0000491 | |
| 2013-07-04 14:46 |  | Assigned To | => mvwieringen adm | 
| 2013-07-04 14:46 |  | Status | new => feedback | 
| 2013-07-04 15:21 |  | Assigned To | mvwieringen adm => mvwieringen | 
| 2013-07-04 15:21 |  | Status | feedback => assigned | 
| 2013-07-04 15:45 | zacha | Note Added: 0000492 | |
| 2013-07-04 15:46 | zacha | Note Added: 0000493 | |
| 2013-07-04 15:56 | mvwieringen | Note Added: 0000495 | |
| 2013-07-04 15:56 | mvwieringen | Status | assigned => feedback | 
| 2013-07-04 16:08 | mvwieringen | Assigned To | mvwieringen => pstorz | 
| 2013-07-04 16:08 | mvwieringen | Status | feedback => assigned | 
| 2013-07-04 16:11 | zacha | Note Added: 0000496 | |
| 2013-07-08 15:54 | zacha | Note Added: 0000511 | |
| 2013-07-09 08:56 | mvwieringen | Note Added: 0000512 | |
| 2013-07-09 08:56 | mvwieringen | Status | assigned => feedback | 
| 2013-07-09 09:28 | zacha | Note Added: 0000513 | |
| 2013-07-09 09:28 | zacha | Status | feedback => assigned | 
| 2013-07-09 10:26 | mvwieringen | Note Added: 0000514 | |
| 2013-07-09 10:42 | zacha | Note Added: 0000515 | |
| 2013-07-10 10:03 | mvwieringen | Changeset attached | => bareos master 2952144e | 
| 2013-07-10 10:03 | mvwieringen | Assigned To | pstorz => mvwieringen | 
| 2013-07-10 10:03 | mvwieringen | Status | assigned => resolved | 
| 2013-07-10 10:03 | mvwieringen | Resolution | open => fixed | 
| 2013-07-12 17:20 |  | Assigned To | mvwieringen => | 
| 2013-07-12 17:20 |  | Status | resolved => closed | 
| 2013-07-12 17:20 |  | Fixed in Version | => 13.3.0 | 
| 2013-08-13 03:12 |  | Changeset attached | => bareos master d518ff2b | 
| 2013-08-13 03:12 |  | Note Added: 0000577 | |
| 2013-08-13 03:12 |  | Assigned To | => mvwieringen adm | 
| 2013-08-13 03:12 |  | Status | closed => resolved | 
| 2013-08-13 09:42 |  | Assigned To | mvwieringen adm => | 
| 2013-08-13 09:42 |  | Status | resolved => closed | 
| 2015-03-25 16:51 | mvwieringen | Changeset attached | => bareos2015 bareos-14.2 6280d609 | 
| 2015-03-25 16:51 | mvwieringen | Note Added: 0001492 | |
| 2015-03-25 16:51 | mvwieringen | Status | closed => resolved | 
| 2015-03-25 19:19 | joergs | Note Added: 0001639 | |
| 2015-03-25 19:19 | joergs | Status | resolved => closed | 


