Bareos Bug Tracker
Bareos Bug Tracker

View Issue Details Jump to Notes ] Related Changesets ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000691bareos-core[All Projects] storage daemonpublic2016-08-24 10:312017-06-08 16:31
Reportermedicmomcilo 
Assigned To 
PrioritynormalSeveritymajorReproducibilityrandom
StatusresolvedResolutionfixed 
PlatformLinuxOSUbuntuOS Version14.04
Product Version15.2.2 
Target VersionFixed in Version 
Summary0000691: Backup job fails with "No Volume name given."
DescriptionWe have occasional failed backup job due to 'No volume name given.' error.
Job can be manually re-ran and it is successful.

This job is running perfectly fine most of the time.

Digging through the code I found that this error message should only be seen if using AutoChanger, which we're not. From the src/stored/dev.c:

   /*
    * If this is a virtual autochanger (i.e. changer_res != NULL) we simply use
    * the device name, assuming it has been appropriately setup by the "autochanger".
    */
   if (!device->changer_res || device->changer_command[0] == 0) {
      if (VolCatInfo.VolCatName[0] == 0) {
         Mmsg(errmsg, _("Could not open file device %s. No Volume name given.\n"),
            print_name());
         clear_opened();
         return;
      }

      if (!IsPathSeparator(archive_name.c_str()[strlen(archive_name.c_str())-1])) {
         pm_strcat(archive_name, "/");
      }
      pm_strcat(archive_name, getVolCatName());
}

Do let me know what additional information if any, you require.
Steps To ReproduceUnfortunately this is random occurance.
TagsNo tags attached.
bareos-master: impactyes
bareos-master: actionwill care
bareos-16.2: impactyes
bareos-16.2: actionwill care
bareos-15.2: impactyes
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: actionnone
bareos-13.2: impact
bareos-13.2: actionnone
bareos-12.4: impact
bareos-12.4: actionnone
Attached Files? file icon bareos-dir.trace [^] (40,333 bytes) 2016-12-15 15:43
log file icon failed-job.log [^] (3,258 bytes) 2016-12-15 15:44
gz file icon backup.tar.gz [^] (13,401 bytes) 2016-12-19 01:27
gz file icon bareos-15.2.3-trace-centos6.tar.gz [^] (40,117 bytes) 2017-01-04 16:34
patch file icon 0001-Fixes-bug-0000691-No-volume-name-given.patch [^] (617 bytes) 2017-06-08 15:17 [Show Content]

- Relationships
related to 0000647closed Automatic mount of disk volumes intermittently fails 
child of 0000794closedstephand Release bareos-16.2.6 
child of 0000798assignedmaik Release bareos-15.2.5 

-  Notes
(0002356)
geovariances (reporter)
2016-09-21 10:37

Hello,

We encounter the same error randomly with the same product version (and OS version).
As for medicmomcilo, we can send additional information and logs.

Thank you,

Mickaël
(0002357)
joergs (administrator)
2016-09-22 17:16

So what backends are you using? A single tape or disk? (or some other backend).
(0002358)
geovariances (reporter)
2016-09-22 18:27

Hello,

We use disks: several pools containing multiple media stored on disk.
(0002359)
ianbrown78 (reporter)
2016-09-23 03:48

Hi,

We are encountering the same issue.
We are running CentOS 7.2.1511.
uname -a reports: Linux 3.10.0-327.22.2.el7.x86_64 0000001 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
We also, are only using disk backends.

Cheers,
Ian
(0002369)
medicmomcilo (reporter)
2016-09-25 12:19

Hi,

We have multiple disk devices defined and that is all.
However, we don't use autochanger device.

Kind regards,
Momcilo.
(0002462)
ianbrown78 (reporter)
2016-12-05 05:07

Hi,

Just upgraded to 16.2.4.
We are still getting the same error on random jobs.

What more information do you require to get this fixed?

Cheers,
Ian
(0002467)
joergs (administrator)
2016-12-12 18:41

Confirmed, as several environments are affected by this problem,
(0002468)
joergs (administrator)
2016-12-12 18:50

> What more information do you require to get this fixed?

Well, a setup to reproduce the problem would be perfect. Not being able to reproduce the problem makes it hard to fix it.

Maybe a debug trace might bring some insight. You can enable the debug log by the bconsole command:
setdebug level=100 trace=1 dir

Also director configuration and some information about the environment would be helpful.
(0002471)
nrobert13 (reporter)
2016-12-15 15:32

Hi,

I'm also getting these errors, please see my comments on the users group, maybe the info provided helps.
https://groups.google.com/forum/?fromgroups=#!topic/bareos-users/Nbv-_fRWxaY [^]

I have debugging enabled for dir and sd, but it's kind of difficult to identify the related sections, because there's no timestamp in the tracefiles.

I assume the number after filename:xxx- is the jobid, and I can tell you that, one of my jobs hich failed had 0 bytes, and even though failed. I cannot find the jobid at all in the sd trace, but there are some lines in the dir trace.

let me try to upload the traces.

thanks,
Robert
(0002472)
joergs (administrator)
2016-12-15 15:45

The xxx after the filename is the line number in the source code.
(0002473)
nrobert13 (reporter)
2016-12-15 15:48
edited on: 2016-12-15 15:49

I see, and after that comes the job-id, right?

would you please remove the trace file from this ticket?

thank you.

(0002476)
nrobert13 (reporter)
2016-12-16 08:47

hi,

I'm also getting messages like:

16-Dec 00:37 muc1pro-backup-1-sd JobId 8102: Job tw1kc0624.2016-12-16_00.13.09_12 is waiting. Cannot find any appendable volumes.
Please use the "label" command to create a new Volume for:
    Storage: "DiskDeviceIncr2" (/data/bareos-storage/Incr)
    Pool: DiskDaily
    Media type: FileIncr

although the pool limit is 600 Volumes and there are currently only 310 in use, and plenty of diskspace. If I try to run a single job, which requires new volumes to be labeled, it seem to work, but when the nightly schedule runs ( 5 in parallel) randomly throws either "No volume name given" or "Cannot find any appendable volumes"

cheers,
Robert
(0002477)
ianbrown78 (reporter)
2016-12-19 01:27

Hi joergs,
Our setup is as follows:
Dedicated physical server is a Dell PowerEdge R710:
 - 2 x Intel Xeon X5650 Quad Core with Hyper Threading enabled,
 - 64GB RAM,
 - 6 x 900GB SAS in Raid5 (50GB /, 4.45TB /backup)
 - additional 6TB SAS via iSCSI (Dell Equallogic SANs)
 - 4 x 1GbE NICs connected to iSCSI VLAN in team config (round robin).
 - 4 x 1GbE NICs connected to Prod VLAN in team config (round robin).

Running CentOS 7.2.1511
Was running Bareos 15.2 up until 1 month ago.
Now running Bareos 16.2.4 - Upgraded and rejigged config files to match the new setup.
Currently using MariaDB, but in the process of migrating to PGSQL.
External storage node running in Google Cloud for Full & Diff migrations.

Problem has persisted across versions. However this morning, I have just noted that all hourly incrementals were failing. I restarted the bareos-dir service. Now the subsequent jobs are successful.

Attaching Bareos configuration as backup.tar.gz
Once the jobs start failing again, I will enable debugging and rerun a job.

Cheers,
Ian
(0002497)
nrobert13 (reporter)
2017-01-04 16:34

hi all,

I took a look into the trace file of the director, and seems the the problem is in the volume selection. I have defined a storage with 5 different devices pointing to the same folder. the storage with max 5 jobs and each device with max 1 job.

looking at the trace file, here's the SELECT count for each job to find the volume:

JobID : SELECT COUONT
10472 :10
10473 :121
10474 :189
10475 :291
10476 :189

used this command:
for i in `seq 2 6` ; do echo -n 1047$i : ; grep -- "-1047$i" muc1pro-backup-1-dir.trace | egrep "SELECT" | wc -l ; done


for the first job after the last select I see:
Find_next_vol MediaId=3735

MediaId=3735 means volume=Incr-3735

in all the other 4 jobs, after some selects, the same volume is found but then comes a "unwanted_volumes=Incr-3735" and continues to run other selects.
after the last select I see : Find_next_vol MediaId=0 , but there's no volume with mediaid = 0 , and most likely therefore it throws a no volume name given...

here you can see the selects and find_next_vol records in the trace:

grep -- "-10472" muc1pro-backup-1-dir.trace | egrep --color=always "SELECT|Find_next_vol|unwanted" | less -r

replace 10472 in the grep with other jobid's from above.

attaching the trace file. see the attached affected jobid's above.

bareos-15.2.3-trace-centos6.tar.gz

Thanks,
Robert
(0002586)
DocFraggle (reporter)
2017-03-01 17:59

Hi all,

we hit the same issue from time to time. CentOS 6, Bareos 15.2.2-37.1

No trace, only logs:

01-Mär 17:38 bareos-dir JobId 439358: console command: run BeforeJob "prune files client=clientname yes"
01-Mär 17:38 bareos-dir JobId 439358: Start Backup JobId 439358, Job=jobname.2017-03-01_17.38.33_29
01-Mär 17:38 bareos-dir JobId 439358: Using Device "FileStorageName0" to write.
01-Mär 17:39 bareos-dir JobId 439358: Fatal error: Catalog error creating JobMedia record. sql_create.c:153 Update Media record UPDATE Media SET EndFile=0, EndBlock=0 WHERE MediaId=0 failed: ERR=
01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Error creating JobMedia record: 1992 Create JobMedia error
01-Mär 17:39 clientname JobId 439358: Fatal error: dir_cmd.c:2400 Bad response to Append Data command. Wanted 3000 OK data
01-Mär 17:39 bareos-sd JobId 439358: Marking Volume "VOLUME41965" in Error in Catalog.
01-Mär 17:39 bareos-sd JobId 439358: Fatal error: askdir.c:357 NULL Volume name. This shouldn't happen!!!
01-Mär 17:39 bareos-sd JobId 439358: Fatal error: Job 439358 canceled.

Regards, Christian
(0002643)
daftu (reporter)
2017-05-21 13:00

Hi there.
Sam issue, pretty sure it's because of multiple devices defined (needed to forever incremental).
(0002648)
medicmomcilo (reporter)
2017-06-08 15:17

I think I managed to track down properly this bug.
It seems to be introduced by a large fix for split brain problem[1].

Looks like there is a race condition for volumes on backup start.
This causes a lot of volumes to be 'reserved' and marked as 'unwanted' for other backups.
If a request is made for a volume that is 'unwanted' then a request should be repeated.
However, only case where this is properly handled is if you have ran out of all volumes and in need to purge/recycle oldest.
Only in such case it would retry on failure to get volume.

I am attaching proposed patch which would retry on every failure.
Patch is minimal and easy to review before you decide on implementing or not.

This is tested in small environment and proved to work fine.
Couldn't confirm in our production due to build limitations we have.

[1] https://github.com/bareos/bareos/commit/fa7a54a3b19f2f003a076b36532e277d98aaa812 [^]

Kind regards,
Momo.
(0002650)
joergs (administrator)
2017-06-08 15:21

Fix committed to bareos bareos-16.2 branch with changesetid 7019.
(0002653)
pstorz (administrator)
2017-06-08 15:23

Fix committed to bareos bareos-15.2 branch with changesetid 7026.
(0002662)
joergs (administrator)
2017-06-08 16:31

@Momo: thank you for your patch. We analysed this problem in a customer environment, and think the patch Philipp did make, solved the problem. It just did take longer, until we pushed it to github.

- Related Changesets
bareos: bareos-16.2 d1ff29b3
Timestamp: 2017-04-25 10:50:44
Author: pstorz
Committer: joergs
Ported: N/A
Details ] Diff ]
Don't return empty volname if volume is on unwanted vols list

When the volume was on the unwanted_list, the loop was
continued but the number of sql rows was not checked, so that
the code continued and returned an empty volumename.

Fixes 0000691: Backup job fails with "No Volume name given."
mod - src/cats/sql_find.c Diff ] File ]
bareos: bareos-15.2 627d8d7e
Timestamp: 2017-04-25 10:50:44
Author: pstorz
Ported: N/A
Details ] Diff ]
Don't return empty volname if volume is on unwanted vols list

When the volume was on the unwanted_list, the loop was
continued but the number of sql rows was not checked, so that
the code continued and returned an empty volumename.

Fixes 0000691: Backup job fails with "No Volume name given."
mod - src/cats/sql_find.c Diff ] File ]

- Issue History
Date Modified Username Field Change
2016-08-24 10:31 medicmomcilo New Issue
2016-09-21 10:37 geovariances Note Added: 0002356
2016-09-22 17:16 joergs Note Added: 0002357
2016-09-22 18:27 geovariances Note Added: 0002358
2016-09-23 03:48 ianbrown78 Note Added: 0002359
2016-09-25 12:19 medicmomcilo Note Added: 0002369
2016-10-13 10:51 joergs Relationship added related to 0000641
2016-10-13 10:52 joergs Relationship deleted related to 0000641
2016-10-13 10:53 joergs Relationship added related to 0000647
2016-12-05 05:07 ianbrown78 Note Added: 0002462
2016-12-12 18:41 joergs Note Added: 0002467
2016-12-12 18:41 joergs Status new => confirmed
2016-12-12 18:44 joergs bareos-master: impact => yes
2016-12-12 18:44 joergs bareos-master: action => will care
2016-12-12 18:44 joergs bareos-16.2: impact => yes
2016-12-12 18:44 joergs bareos-16.2: action => will care
2016-12-12 18:44 joergs bareos-15.2: impact => yes
2016-12-12 18:44 joergs bareos-14.2: action => none
2016-12-12 18:44 joergs bareos-13.2: action => none
2016-12-12 18:44 joergs bareos-12.4: action => none
2016-12-12 18:50 joergs Note Added: 0002468
2016-12-15 15:32 nrobert13 Note Added: 0002471
2016-12-15 15:43 nrobert13 File Added: bareos-dir.trace
2016-12-15 15:44 nrobert13 File Added: failed-job.log
2016-12-15 15:45 joergs Note Added: 0002472
2016-12-15 15:48 nrobert13 Note Added: 0002473
2016-12-15 15:49 nrobert13 Note Edited: 0002473 View Revisions
2016-12-16 08:47 nrobert13 Note Added: 0002476
2016-12-19 01:27 ianbrown78 Note Added: 0002477
2016-12-19 01:27 ianbrown78 File Added: backup.tar.gz
2017-01-04 16:34 nrobert13 Note Added: 0002497
2017-01-04 16:34 nrobert13 File Added: bareos-15.2.3-trace-centos6.tar.gz
2017-03-01 17:59 DocFraggle Note Added: 0002586
2017-05-21 13:00 daftu Note Added: 0002643
2017-06-08 12:43 joergs Relationship added child of 0000794
2017-06-08 12:44 joergs Relationship added child of 0000798
2017-06-08 15:17 medicmomcilo Note Added: 0002648
2017-06-08 15:17 medicmomcilo File Added: 0001-Fixes-bug-0000691-No-volume-name-given.patch
2017-06-08 15:21 joergs Changeset attached => bareos bareos-16.2 d1ff29b3
2017-06-08 15:21 joergs Note Added: 0002650
2017-06-08 15:21 joergs Status confirmed => resolved
2017-06-08 15:21 joergs Resolution open => fixed
2017-06-08 15:23 pstorz Changeset attached => bareos bareos-15.2 627d8d7e
2017-06-08 15:23 pstorz Note Added: 0002653
2017-06-08 16:31 joergs Note Added: 0002662


Copyright © 2000 - 2017 MantisBT Team
Powered by Mantis Bugtracker