View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000691 | bareos-core | storage daemon | public | 2016-08-24 10:31 | 2019-12-18 15:25 |
Reporter | medicmomcilo | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | random |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | Ubuntu | OS Version | 14.04 |
Product Version | 15.2.2 | ||||
Summary | 0000691: Backup job fails with "No Volume name given." | ||||
Description | We have occasional failed backup job due to 'No volume name given.' error. Job can be manually re-ran and it is successful. This job is running perfectly fine most of the time. Digging through the code I found that this error message should only be seen if using AutoChanger, which we're not. From the src/stored/dev.c: /* * If this is a virtual autochanger (i.e. changer_res != NULL) we simply use * the device name, assuming it has been appropriately setup by the "autochanger". */ if (!device->changer_res || device->changer_command[0] == 0) { if (VolCatInfo.VolCatName[0] == 0) { Mmsg(errmsg, _("Could not open file device %s. No Volume name given.\n"), print_name()); clear_opened(); return; } if (!IsPathSeparator(archive_name.c_str()[strlen(archive_name.c_str())-1])) { pm_strcat(archive_name, "/"); } pm_strcat(archive_name, getVolCatName()); } Do let me know what additional information if any, you require. | ||||
Steps To Reproduce | Unfortunately this is random occurance. | ||||
Tags | No tags attached. | ||||
Hello, We encounter the same error randomly with the same product version (and OS version). As for medicmomcilo, we can send additional information and logs. Thank you, Mickaël |
|
So what backends are you using? A single tape or disk? (or some other backend). | |
Hello, We use disks: several pools containing multiple media stored on disk. |
|
Hi, We are encountering the same issue. We are running CentOS 7.2.1511. uname -a reports: Linux 3.10.0-327.22.2.el7.x86_64 0000001 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux We also, are only using disk backends. Cheers, Ian |
|
Hi, We have multiple disk devices defined and that is all. However, we don't use autochanger device. Kind regards, Momcilo. |
|
Hi, Just upgraded to 16.2.4. We are still getting the same error on random jobs. What more information do you require to get this fixed? Cheers, Ian |
|
Confirmed, as several environments are affected by this problem, | |
> What more information do you require to get this fixed? Well, a setup to reproduce the problem would be perfect. Not being able to reproduce the problem makes it hard to fix it. Maybe a debug trace might bring some insight. You can enable the debug log by the bconsole command: setdebug level=100 trace=1 dir Also director configuration and some information about the environment would be helpful. |
|
Hi, I'm also getting these errors, please see my comments on the users group, maybe the info provided helps. https://groups.google.com/forum/?fromgroups=#!topic/bareos-users/Nbv-_fRWxaY I have debugging enabled for dir and sd, but it's kind of difficult to identify the related sections, because there's no timestamp in the tracefiles. I assume the number after filename:xxx- is the jobid, and I can tell you that, one of my jobs hich failed had 0 bytes, and even though failed. I cannot find the jobid at all in the sd trace, but there are some lines in the dir trace. let me try to upload the traces. thanks, Robert |
|
failed-job.log (3,258 bytes)
15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Start Backup JobId 8021, Job=bg1nc0502.2016-12-15_14.23.18_33 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Using Device "DiskDeviceIncr2" to write. 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Sending Accurate information. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 bg1nc0502 JobId 8021: Fatal error: dir_cmd.c:2211 Bad response to Append Data command. Wanted 3000 OK data , got 3903 Error append data 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Fatal error: Too many errors trying to mount device "DiskDeviceIncr2" (/data/bareos-storage/Incr). 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Error: Bareos muc1pro-backup-1-dir 15.2.3 (07Mar16): Build OS: x86_64-redhat-linux-gnu redhat CentOS release 6.7 (Final) JobId: 8021 Job: bg1nc0502.2016-12-15_14.23.18_33 Backup Level: Incremental, since=2016-12-14 00:25:59 Client: "bg1nc0502" 14.2.2 (12Dec14) x86_64-redhat-linux-gnu,redhat,CentOS release 5.11 (Final) FileSet: "bg1nc0502" 2016-11-30 16:15:55 Pool: "DiskDaily" (From command line) Catalog: "MyCatalog" (From Job resource) Storage: "DiskStorageIncr" (From Pool resource) Scheduled time: 15-Dec-2016 14:23:16 Start time: 15-Dec-2016 14:23:20 End time: 15-Dec-2016 14:23:21 Elapsed time: 1 sec Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: yes Volume name(s): Volume Session Id: 77 Volume Session Time: 1481703237 Last Volume Bytes: 0 (0 B) Non-fatal FD errors: 1 SD Errors: 1 FD termination status: Fatal Error SD termination status: Fatal Error Termination: *** Backup Error *** |
|
The xxx after the filename is the line number in the source code. | |
I see, and after that comes the job-id, right? would you please remove the trace file from this ticket? thank you. |
|
hi, I'm also getting messages like: 16-Dec 00:37 muc1pro-backup-1-sd JobId 8102: Job tw1kc0624.2016-12-16_00.13.09_12 is waiting. Cannot find any appendable volumes. Please use the "label" command to create a new Volume for: Storage: "DiskDeviceIncr2" (/data/bareos-storage/Incr) Pool: DiskDaily Media type: FileIncr although the pool limit is 600 Volumes and there are currently only 310 in use, and plenty of diskspace. If I try to run a single job, which requires new volumes to be labeled, it seem to work, but when the nightly schedule runs ( 5 in parallel) randomly throws either "No volume name given" or "Cannot find any appendable volumes" cheers, Robert |
|
Hi joergs, Our setup is as follows: Dedicated physical server is a Dell PowerEdge R710: - 2 x Intel Xeon X5650 Quad Core with Hyper Threading enabled, - 64GB RAM, - 6 x 900GB SAS in Raid5 (50GB /, 4.45TB /backup) - additional 6TB SAS via iSCSI (Dell Equallogic SANs) - 4 x 1GbE NICs connected to iSCSI VLAN in team config (round robin). - 4 x 1GbE NICs connected to Prod VLAN in team config (round robin). Running CentOS 7.2.1511 Was running Bareos 15.2 up until 1 month ago. Now running Bareos 16.2.4 - Upgraded and rejigged config files to match the new setup. Currently using MariaDB, but in the process of migrating to PGSQL. External storage node running in Google Cloud for Full & Diff migrations. Problem has persisted across versions. However this morning, I have just noted that all hourly incrementals were failing. I restarted the bareos-dir service. Now the subsequent jobs are successful. Attaching Bareos configuration as backup.tar.gz Once the jobs start failing again, I will enable debugging and rerun a job. Cheers, Ian |
|
hi all, I took a look into the trace file of the director, and seems the the problem is in the volume selection. I have defined a storage with 5 different devices pointing to the same folder. the storage with max 5 jobs and each device with max 1 job. looking at the trace file, here's the SELECT count for each job to find the volume: JobID : SELECT COUONT 10472 :10 10473 :121 10474 :189 10475 :291 10476 :189 used this command: for i in `seq 2 6` ; do echo -n 1047$i : ; grep -- "-1047$i" muc1pro-backup-1-dir.trace | egrep "SELECT" | wc -l ; done for the first job after the last select I see: Find_next_vol MediaId=3735 MediaId=3735 means volume=Incr-3735 in all the other 4 jobs, after some selects, the same volume is found but then comes a "unwanted_volumes=Incr-3735" and continues to run other selects. after the last select I see : Find_next_vol MediaId=0 , but there's no volume with mediaid = 0 , and most likely therefore it throws a no volume name given... here you can see the selects and find_next_vol records in the trace: grep -- "-10472" muc1pro-backup-1-dir.trace | egrep --color=always "SELECT|Find_next_vol|unwanted" | less -r replace 10472 in the grep with other jobid's from above. attaching the trace file. see the attached affected jobid's above. bareos-15.2.3-trace-centos6.tar.gz Thanks, Robert |
|
Hi all, we hit the same issue from time to time. CentOS 6, Bareos 15.2.2-37.1 No trace, only logs: 01-Mär 17:38 bareos-dir JobId 439358: console command: run BeforeJob "prune files client=clientname yes" 01-Mär 17:38 bareos-dir JobId 439358: Start Backup JobId 439358, Job=jobname.2017-03-01_17.38.33_29 01-Mär 17:38 bareos-dir JobId 439358: Using Device "FileStorageName0" to write. 01-Mär 17:39 bareos-dir JobId 439358: Fatal error: Catalog error creating JobMedia record. sql_create.c:153 Update Media record UPDATE Media SET EndFile=0, EndBlock=0 WHERE MediaId=0 failed: ERR= 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Error creating JobMedia record: 1992 Create JobMedia error 01-Mär 17:39 clientname JobId 439358: Fatal error: dir_cmd.c:2400 Bad response to Append Data command. Wanted 3000 OK data 01-Mär 17:39 bareos-sd JobId 439358: Marking Volume "VOLUME41965" in Error in Catalog. 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: askdir.c:357 NULL Volume name. This shouldn't happen!!! 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Job 439358 canceled. Regards, Christian |
|
Hi there. Sam issue, pretty sure it's because of multiple devices defined (needed to forever incremental). |
|
I think I managed to track down properly this bug. It seems to be introduced by a large fix for split brain problem[1]. Looks like there is a race condition for volumes on backup start. This causes a lot of volumes to be 'reserved' and marked as 'unwanted' for other backups. If a request is made for a volume that is 'unwanted' then a request should be repeated. However, only case where this is properly handled is if you have ran out of all volumes and in need to purge/recycle oldest. Only in such case it would retry on failure to get volume. I am attaching proposed patch which would retry on every failure. Patch is minimal and easy to review before you decide on implementing or not. This is tested in small environment and proved to work fine. Couldn't confirm in our production due to build limitations we have. [1] https://github.com/bareos/bareos/commit/fa7a54a3b19f2f003a076b36532e277d98aaa812 Kind regards, Momo. |
|
0001-Fixes-bug-0000691-No-volume-name-given.patch (617 bytes)
From eb9e6b9f95c04fcedd04f9e76b5694a277f48dc7 Mon Sep 17 00:00:00 2001 From: Momcilo Medic <medicmomcilo@gmail.com> Date: Thu, 8 Jun 2017 15:10:17 +0200 Subject: [PATCH] Fixes bug 0000691 - 'No volume name given' --- src/cats/sql_find.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/cats/sql_find.c b/src/cats/sql_find.c index 77e2106f..3687dc1f 100644 --- a/src/cats/sql_find.c +++ b/src/cats/sql_find.c @@ -493,7 +493,7 @@ retry_fetch: break; } - if (!found_candidate && find_oldest) { + if (!found_candidate) { item++; goto retry_fetch; } -- 2.13.0 |
|
Fix committed to bareos bareos-16.2 branch with changesetid 7019. | |
Fix committed to bareos bareos-15.2 branch with changesetid 7026. | |
@Momo: thank you for your patch. We analysed this problem in a customer environment, and think the patch Philipp did make, solved the problem. It just did take longer, until we pushed it to github. | |
bareos: bareos-16.2 d1ff29b3 2017-04-25 12:50 Committer: joergs Ported: N/A Details Diff |
Don't return empty volname if volume is on unwanted vols list When the volume was on the unwanted_list, the loop was continued but the number of sql rows was not checked, so that the code continued and returned an empty volumename. Fixes 0000691: Backup job fails with "No Volume name given." |
Affected Issues 0000691 |
|
mod - src/cats/sql_find.c | Diff File | ||
bareos: bareos-15.2 627d8d7e 2017-04-25 12:50 Ported: N/A Details Diff |
Don't return empty volname if volume is on unwanted vols list When the volume was on the unwanted_list, the loop was continued but the number of sql rows was not checked, so that the code continued and returned an empty volumename. Fixes 0000691: Backup job fails with "No Volume name given." |
Affected Issues 0000691 |
|
mod - src/cats/sql_find.c | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-08-24 10:31 | medicmomcilo | New Issue | |
2016-09-21 10:37 | geovariances | Note Added: 0002356 | |
2016-09-22 17:16 | joergs | Note Added: 0002357 | |
2016-09-22 18:27 | geovariances | Note Added: 0002358 | |
2016-09-23 03:48 | ianbrown78 | Note Added: 0002359 | |
2016-09-25 12:19 | medicmomcilo | Note Added: 0002369 | |
2016-10-13 10:51 | joergs | Relationship added | related to 0000641 |
2016-10-13 10:52 | joergs | Relationship deleted | related to 0000641 |
2016-10-13 10:53 | joergs | Relationship added | related to 0000647 |
2016-12-05 05:07 | ianbrown78 | Note Added: 0002462 | |
2016-12-12 18:41 | joergs | Note Added: 0002467 | |
2016-12-12 18:41 | joergs | Status | new => confirmed |
2016-12-12 18:50 | joergs | Note Added: 0002468 | |
2016-12-15 15:32 | nrobert13 | Note Added: 0002471 | |
2016-12-15 15:43 | nrobert13 | File Added: bareos-dir.trace | |
2016-12-15 15:44 | nrobert13 | File Added: failed-job.log | |
2016-12-15 15:45 | joergs | Note Added: 0002472 | |
2016-12-15 15:48 | nrobert13 | Note Added: 0002473 | |
2016-12-15 15:49 | nrobert13 | Note Edited: 0002473 | |
2016-12-16 08:47 | nrobert13 | Note Added: 0002476 | |
2016-12-19 01:27 | ianbrown78 | Note Added: 0002477 | |
2016-12-19 01:27 | ianbrown78 | File Added: backup.tar.gz | |
2017-01-04 16:34 | nrobert13 | Note Added: 0002497 | |
2017-01-04 16:34 | nrobert13 | File Added: bareos-15.2.3-trace-centos6.tar.gz | |
2017-03-01 17:59 | DocFraggle | Note Added: 0002586 | |
2017-05-21 13:00 | daftu | Note Added: 0002643 | |
2017-06-08 12:43 | joergs | Relationship added | child of 0000794 |
2017-06-08 12:44 | joergs | Relationship added | child of 0000798 |
2017-06-08 15:17 | medicmomcilo | Note Added: 0002648 | |
2017-06-08 15:17 | medicmomcilo | File Added: 0001-Fixes-bug-0000691-No-volume-name-given.patch | |
2017-06-08 15:21 | joergs | Changeset attached | => bareos bareos-16.2 d1ff29b3 |
2017-06-08 15:21 | joergs | Note Added: 0002650 | |
2017-06-08 15:21 | joergs | Status | confirmed => resolved |
2017-06-08 15:21 | joergs | Resolution | open => fixed |
2017-06-08 15:23 | pstorz | Changeset attached | => bareos bareos-15.2 627d8d7e |
2017-06-08 15:23 | pstorz | Note Added: 0002653 | |
2017-06-08 16:31 | joergs | Note Added: 0002662 | |
2019-12-18 15:25 | arogge | Status | resolved => closed |