0000691: Backup job fails with "No Volume name given."

ID	Project	Category	View Status	Date Submitted	Last Update

0000691	bareos-core	storage daemon	public	2016-08-24 10:31	2019-12-18 15:25

Reporter	medicmomcilo	Assigned To
Priority	normal	Severity	major	Reproducibility	random
Status	closed	Resolution	fixed
Platform	Linux	OS	Ubuntu	OS Version	14.04
Product Version	15.2.2

Summary	0000691: Backup job fails with "No Volume name given."
Description	We have occasional failed backup job due to 'No volume name given.' error. Job can be manually re-ran and it is successful. This job is running perfectly fine most of the time. Digging through the code I found that this error message should only be seen if using AutoChanger, which we're not. From the src/stored/dev.c: /* * If this is a virtual autochanger (i.e. changer_res != NULL) we simply use * the device name, assuming it has been appropriately setup by the "autochanger". */ if (!device->changer_res \|\| device->changer_command[0] == 0) { if (VolCatInfo.VolCatName[0] == 0) { Mmsg(errmsg, _("Could not open file device %s. No Volume name given.\n"), print_name()); clear_opened(); return; } if (!IsPathSeparator(archive_name.c_str()[strlen(archive_name.c_str())-1])) { pm_strcat(archive_name, "/"); } pm_strcat(archive_name, getVolCatName()); } Do let me know what additional information if any, you require.
Steps To Reproduce	Unfortunately this is random occurance.
Tags	No tags attached.

geovariances 2016-09-21 10:37 reporter ~0002356	Hello, We encounter the same error randomly with the same product version (and OS version). As for medicmomcilo, we can send additional information and logs. Thank you, Mickaël

joergs 2016-09-22 17:16 developer ~0002357	So what backends are you using? A single tape or disk? (or some other backend).

geovariances 2016-09-22 18:27 reporter ~0002358	Hello, We use disks: several pools containing multiple media stored on disk.

ianbrown78 2016-09-23 03:48 reporter ~0002359	Hi, We are encountering the same issue. We are running CentOS 7.2.1511. uname -a reports: Linux 3.10.0-327.22.2.el7.x86_64 0000001 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux We also, are only using disk backends. Cheers, Ian

medicmomcilo 2016-09-25 12:19 reporter ~0002369	Hi, We have multiple disk devices defined and that is all. However, we don't use autochanger device. Kind regards, Momcilo.

ianbrown78 2016-12-05 05:07 reporter ~0002462	Hi, Just upgraded to 16.2.4. We are still getting the same error on random jobs. What more information do you require to get this fixed? Cheers, Ian

joergs 2016-12-12 18:41 developer ~0002467	Confirmed, as several environments are affected by this problem,

joergs 2016-12-12 18:50 developer ~0002468	> What more information do you require to get this fixed? Well, a setup to reproduce the problem would be perfect. Not being able to reproduce the problem makes it hard to fix it. Maybe a debug trace might bring some insight. You can enable the debug log by the bconsole command: setdebug level=100 trace=1 dir Also director configuration and some information about the environment would be helpful.

nrobert13 2016-12-15 15:32 reporter ~0002471	Hi, I'm also getting these errors, please see my comments on the users group, maybe the info provided helps. https://groups.google.com/forum/?fromgroups=#!topic/bareos-users/Nbv-_fRWxaY I have debugging enabled for dir and sd, but it's kind of difficult to identify the related sections, because there's no timestamp in the tracefiles. I assume the number after filename:xxx- is the jobid, and I can tell you that, one of my jobs hich failed had 0 bytes, and even though failed. I cannot find the jobid at all in the sd trace, but there are some lines in the dir trace. let me try to upload the traces. thanks, Robert

nrobert13 2016-12-15 15:43 reporter	bareos-dir.trace (40,333 bytes)

nrobert13 2016-12-15 15:44 reporter	failed-job.log (3,258 bytes) 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Start Backup JobId 8021, Job=bg1nc0502.2016-12-15_14.23.18_33 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Using Device "DiskDeviceIncr2" to write. 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Sending Accurate information. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Warning: mount.c:247 Open device "DiskDeviceIncr2" (/data/bareos-storage/Incr) Volume "" failed: ERR=Could not open file device "DiskDeviceIncr2" (/data/bareos-storage/Incr). No Volume name given. 15-Dec 14:23 bg1nc0502 JobId 8021: Fatal error: dir_cmd.c:2211 Bad response to Append Data command. Wanted 3000 OK data , got 3903 Error append data 15-Dec 14:23 muc1pro-backup-1-sd JobId 8021: Fatal error: Too many errors trying to mount device "DiskDeviceIncr2" (/data/bareos-storage/Incr). 15-Dec 14:23 muc1pro-backup-1-dir JobId 8021: Error: Bareos muc1pro-backup-1-dir 15.2.3 (07Mar16): Build OS: x86_64-redhat-linux-gnu redhat CentOS release 6.7 (Final) JobId: 8021 Job: bg1nc0502.2016-12-15_14.23.18_33 Backup Level: Incremental, since=2016-12-14 00:25:59 Client: "bg1nc0502" 14.2.2 (12Dec14) x86_64-redhat-linux-gnu,redhat,CentOS release 5.11 (Final) FileSet: "bg1nc0502" 2016-11-30 16:15:55 Pool: "DiskDaily" (From command line) Catalog: "MyCatalog" (From Job resource) Storage: "DiskStorageIncr" (From Pool resource) Scheduled time: 15-Dec-2016 14:23:16 Start time: 15-Dec-2016 14:23:20 End time: 15-Dec-2016 14:23:21 Elapsed time: 1 sec Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: yes Volume name(s): Volume Session Id: 77 Volume Session Time: 1481703237 Last Volume Bytes: 0 (0 B) Non-fatal FD errors: 1 SD Errors: 1 FD termination status: Fatal Error SD termination status: Fatal Error Termination: * Backup Error * failed-job.log (3,258 bytes)

joergs 2016-12-15 15:45 developer ~0002472	The xxx after the filename is the line number in the source code.

nrobert13 2016-12-15 15:48 reporter ~0002473 Last edited: 2016-12-15 15:49	I see, and after that comes the job-id, right? would you please remove the trace file from this ticket? thank you.

nrobert13 2016-12-16 08:47 reporter ~0002476	hi, I'm also getting messages like: 16-Dec 00:37 muc1pro-backup-1-sd JobId 8102: Job tw1kc0624.2016-12-16_00.13.09_12 is waiting. Cannot find any appendable volumes. Please use the "label" command to create a new Volume for: Storage: "DiskDeviceIncr2" (/data/bareos-storage/Incr) Pool: DiskDaily Media type: FileIncr although the pool limit is 600 Volumes and there are currently only 310 in use, and plenty of diskspace. If I try to run a single job, which requires new volumes to be labeled, it seem to work, but when the nightly schedule runs ( 5 in parallel) randomly throws either "No volume name given" or "Cannot find any appendable volumes" cheers, Robert

ianbrown78 2016-12-19 01:27 reporter ~0002477	Hi joergs, Our setup is as follows: Dedicated physical server is a Dell PowerEdge R710: - 2 x Intel Xeon X5650 Quad Core with Hyper Threading enabled, - 64GB RAM, - 6 x 900GB SAS in Raid5 (50GB /, 4.45TB /backup) - additional 6TB SAS via iSCSI (Dell Equallogic SANs) - 4 x 1GbE NICs connected to iSCSI VLAN in team config (round robin). - 4 x 1GbE NICs connected to Prod VLAN in team config (round robin). Running CentOS 7.2.1511 Was running Bareos 15.2 up until 1 month ago. Now running Bareos 16.2.4 - Upgraded and rejigged config files to match the new setup. Currently using MariaDB, but in the process of migrating to PGSQL. External storage node running in Google Cloud for Full & Diff migrations. Problem has persisted across versions. However this morning, I have just noted that all hourly incrementals were failing. I restarted the bareos-dir service. Now the subsequent jobs are successful. Attaching Bareos configuration as backup.tar.gz Once the jobs start failing again, I will enable debugging and rerun a job. Cheers, Ian

ianbrown78 2016-12-19 01:27 reporter	backup.tar.gz (13,401 bytes)

nrobert13 2017-01-04 16:34 reporter ~0002497	hi all, I took a look into the trace file of the director, and seems the the problem is in the volume selection. I have defined a storage with 5 different devices pointing to the same folder. the storage with max 5 jobs and each device with max 1 job. looking at the trace file, here's the SELECT count for each job to find the volume: JobID : SELECT COUONT 10472 :10 10473 :121 10474 :189 10475 :291 10476 :189 used this command: for i in `seq 2 6` ; do echo -n 1047$i : ; grep -- "-1047$i" muc1pro-backup-1-dir.trace \| egrep "SELECT" \| wc -l ; done for the first job after the last select I see: Find_next_vol MediaId=3735 MediaId=3735 means volume=Incr-3735 in all the other 4 jobs, after some selects, the same volume is found but then comes a "unwanted_volumes=Incr-3735" and continues to run other selects. after the last select I see : Find_next_vol MediaId=0 , but there's no volume with mediaid = 0 , and most likely therefore it throws a no volume name given... here you can see the selects and find_next_vol records in the trace: grep -- "-10472" muc1pro-backup-1-dir.trace \| egrep --color=always "SELECT\|Find_next_vol\|unwanted" \| less -r replace 10472 in the grep with other jobid's from above. attaching the trace file. see the attached affected jobid's above. bareos-15.2.3-trace-centos6.tar.gz Thanks, Robert

nrobert13 2017-01-04 16:34 reporter	bareos-15.2.3-trace-centos6.tar.gz (40,117 bytes)

DocFraggle 2017-03-01 17:59 reporter ~0002586	Hi all, we hit the same issue from time to time. CentOS 6, Bareos 15.2.2-37.1 No trace, only logs: 01-Mär 17:38 bareos-dir JobId 439358: console command: run BeforeJob "prune files client=clientname yes" 01-Mär 17:38 bareos-dir JobId 439358: Start Backup JobId 439358, Job=jobname.2017-03-01_17.38.33_29 01-Mär 17:38 bareos-dir JobId 439358: Using Device "FileStorageName0" to write. 01-Mär 17:39 bareos-dir JobId 439358: Fatal error: Catalog error creating JobMedia record. sql_create.c:153 Update Media record UPDATE Media SET EndFile=0, EndBlock=0 WHERE MediaId=0 failed: ERR= 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Error creating JobMedia record: 1992 Create JobMedia error 01-Mär 17:39 clientname JobId 439358: Fatal error: dir_cmd.c:2400 Bad response to Append Data command. Wanted 3000 OK data 01-Mär 17:39 bareos-sd JobId 439358: Marking Volume "VOLUME41965" in Error in Catalog. 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: askdir.c:357 NULL Volume name. This shouldn't happen!!! 01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Job 439358 canceled. Regards, Christian

daftu 2017-05-21 13:00 reporter ~0002643	Hi there. Sam issue, pretty sure it's because of multiple devices defined (needed to forever incremental).

medicmomcilo 2017-06-08 15:17 reporter ~0002648	I think I managed to track down properly this bug. It seems to be introduced by a large fix for split brain problem[1]. Looks like there is a race condition for volumes on backup start. This causes a lot of volumes to be 'reserved' and marked as 'unwanted' for other backups. If a request is made for a volume that is 'unwanted' then a request should be repeated. However, only case where this is properly handled is if you have ran out of all volumes and in need to purge/recycle oldest. Only in such case it would retry on failure to get volume. I am attaching proposed patch which would retry on every failure. Patch is minimal and easy to review before you decide on implementing or not. This is tested in small environment and proved to work fine. Couldn't confirm in our production due to build limitations we have. [1] https://github.com/bareos/bareos/commit/fa7a54a3b19f2f003a076b36532e277d98aaa812 Kind regards, Momo.

medicmomcilo 2017-06-08 15:17 reporter	0001-Fixes-bug-0000691-No-volume-name-given.patch (617 bytes) From eb9e6b9f95c04fcedd04f9e76b5694a277f48dc7 Mon Sep 17 00:00:00 2001 From: Momcilo Medic <medicmomcilo@gmail.com> Date: Thu, 8 Jun 2017 15:10:17 +0200 Subject: [PATCH] Fixes bug 0000691 - 'No volume name given' --- src/cats/sql_find.c \| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/cats/sql_find.c b/src/cats/sql_find.c index 77e2106f..3687dc1f 100644 --- a/src/cats/sql_find.c +++ b/src/cats/sql_find.c @@ -493,7 +493,7 @@ retry_fetch: break; } - if (!found_candidate && find_oldest) { + if (!found_candidate) { item++; goto retry_fetch; } -- 2.13.0 0001-Fixes-bug-0000691-No-volume-name-given.patch (617 bytes)

joergs 2017-06-08 15:21 developer ~0002650	Fix committed to bareos bareos-16.2 branch with changesetid 7019.

pstorz 2017-06-08 15:23 administrator ~0002653	Fix committed to bareos bareos-15.2 branch with changesetid 7026.

joergs 2017-06-08 16:31 developer ~0002662	@Momo: thank you for your patch. We analysed this problem in a customer environment, and think the patch Philipp did make, solved the problem. It just did take longer, until we pushed it to github.

bareos: bareos-16.2 d1ff29b3 2017-04-25 12:50 pstorz Committer: joergs Ported: N/A Details Diff	Don't return empty volname if volume is on unwanted vols list When the volume was on the unwanted_list, the loop was continued but the number of sql rows was not checked, so that the code continued and returned an empty volumename. Fixes 0000691: Backup job fails with "No Volume name given."	Affected Issues 0000691
	mod - src/cats/sql_find.c	Diff File
bareos: bareos-15.2 627d8d7e 2017-04-25 12:50 pstorz Ported: N/A Details Diff	Don't return empty volname if volume is on unwanted vols list When the volume was on the unwanted_list, the loop was continued but the number of sql rows was not checked, so that the code continued and returned an empty volumename. Fixes 0000691: Backup job fails with "No Volume name given."	Affected Issues 0000691
	mod - src/cats/sql_find.c	Diff File

Date Modified	Username	Field	Change
2016-08-24 10:31	medicmomcilo	New Issue
2016-09-21 10:37	geovariances	Note Added: 0002356
2016-09-22 17:16	joergs	Note Added: 0002357
2016-09-22 18:27	geovariances	Note Added: 0002358
2016-09-23 03:48	ianbrown78	Note Added: 0002359
2016-09-25 12:19	medicmomcilo	Note Added: 0002369
2016-10-13 10:51	joergs	Relationship added	related to 0000641
2016-10-13 10:52	joergs	Relationship deleted	related to 0000641
2016-10-13 10:53	joergs	Relationship added	related to 0000647
2016-12-05 05:07	ianbrown78	Note Added: 0002462
2016-12-12 18:41	joergs	Note Added: 0002467
2016-12-12 18:41	joergs	Status	new => confirmed
2016-12-12 18:50	joergs	Note Added: 0002468
2016-12-15 15:32	nrobert13	Note Added: 0002471
2016-12-15 15:43	nrobert13	File Added: bareos-dir.trace
2016-12-15 15:44	nrobert13	File Added: failed-job.log
2016-12-15 15:45	joergs	Note Added: 0002472
2016-12-15 15:48	nrobert13	Note Added: 0002473
2016-12-15 15:49	nrobert13	Note Edited: 0002473
2016-12-16 08:47	nrobert13	Note Added: 0002476
2016-12-19 01:27	ianbrown78	Note Added: 0002477
2016-12-19 01:27	ianbrown78	File Added: backup.tar.gz
2017-01-04 16:34	nrobert13	Note Added: 0002497
2017-01-04 16:34	nrobert13	File Added: bareos-15.2.3-trace-centos6.tar.gz
2017-03-01 17:59	DocFraggle	Note Added: 0002586
2017-05-21 13:00	daftu	Note Added: 0002643
2017-06-08 12:43	joergs	Relationship added	child of 0000794
2017-06-08 12:44	joergs	Relationship added	child of 0000798
2017-06-08 15:17	medicmomcilo	Note Added: 0002648
2017-06-08 15:17	medicmomcilo	File Added: 0001-Fixes-bug-0000691-No-volume-name-given.patch
2017-06-08 15:21	joergs	Changeset attached	=> bareos bareos-16.2 d1ff29b3
2017-06-08 15:21	joergs	Note Added: 0002650
2017-06-08 15:21	joergs	Status	confirmed => resolved
2017-06-08 15:21	joergs	Resolution	open => fixed
2017-06-08 15:23	pstorz	Changeset attached	=> bareos bareos-15.2 627d8d7e
2017-06-08 15:23	pstorz	Note Added: 0002653
2017-06-08 16:31	joergs	Note Added: 0002662
2019-12-18 15:25	arogge	Status	resolved => closed

Reporting new Issues is disabled, please Report new Issues at https://github.com/bareos/bareos/issues

View Issue Details

Relationships

Activities

Related Changesets

Issue History

related to	0000647	closed		Automatic mount of disk volumes intermittently fails
child of	0000794	closed	stephand	Release bareos-16.2.6
child of	0000798	closed	maik	Release Bareos 15.2.5