View Issue Details

IDProjectCategoryView StatusLast Update
0001122bareos-core[All Projects] Generalpublic2019-10-18 19:32
ReporterxyrosAssigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
PlatformLinuxOSUbuntuOS Version16.04
Product Version18.2.6 
Fixed in Version 
Summary0001122: Consolidate queues and indefinitely orphans jobs but falsely reports status as "Consolidate OK" for last queued
DescriptionMy Consolidate job never succeeds -- quickly terminating with "Consolidate OK" while leaving all the VirtualFull jobs it started queued and orphaned.

In the WebUI listing for the allegedly successful Consolidate run, it always list the sequentially last (by job ID) client it queued as being the successful run; however, the level is "Incremental," nothing is actually done, and the client's VirtualFull job is actually still queued up with all the other clients.

In bconsole the status is similar to this:

Running Jobs:
Console connected at 15-Oct-19 15:06
 JobId Level Name Status
======================================================================
   636 Virtual PandoraFMS.2019-10-15_14.33.02_06 is waiting on max Storage jobs
   637 Virtual MongoDB.2019-10-15_14.33.03_09 is waiting on max Storage jobs
   638 Virtual DNS-DHCP.2019-10-15_14.33.04_11 is waiting on max Storage jobs
   639 Virtual Desktop_1.2019-10-15_14.33.05_19 is waiting on max Storage jobs
   640 Virtual Desktop_2.2019-10-15_14.33.05_20 is waiting on max Storage jobs
   641 Virtual Desktop_3.2019-10-15_14.33.06_21 is waiting on max Storage jobs
====


Given that above output, for example the WebUI would show the following:

    642 Consolidate desktop3-fd.hq Consolidate Incremental 0 0.00 B 0 Success
    641 Desktop_3 desktop3-fd.hq Backup VirtualFull 0 0.00 B 0 Queued
    640 Desktop_2 desktop2-fd.hq Backup VirtualFull 0 0.00 B 0 Queued
    639 Desktop_1 desktop1-fd.hq Backup VirtualFull 0 0.00 B 0 Queued
    638 DNS-DHCP dns-dhcp-fd.hq Backup VirtualFull 0 0.00 B 0 Queued
    637 MongoDB mongodb-fd.hq Backup VirtualFull 0 0.00 B 0 Queued
    636 PandoraFMS pandorafms-fd.hq Backup VirtualFull 0 0.00 B 0 Queued


I don't know if this has anything to do with the fact that I have multiple storage definitions, for each VLAN the server is on, and an additional one dedicated for the storage addressable on the default IP (see bareos-dir/storage/File.conf in attached bareos.zip file). Technically this should not matter, but I get the impression Bareos nas not been designed/tested to elegantly work in an environment where the server participates in VLANs.

The reason I'm using VLANs is so that connections do not have to go through a router to reach the clients. Therefore, the full network bandwidth of each LAN segment is available to the Bareos client/server data transfer.

I've tried debugging the Consolidate backup process, using "bareos-dir -d 400 >> /var/log/bareos-dir.log;" however, I get nothing that particularly identifies the issue. I have attached a truncated log file that contains activity starting with queuing the second-to-last. I've cut off the log at the point where it is stuck in the endless cycling with output of:

bareos-dir (50): dird/jobq.cc:951-0 Inc Rstore=File-AI-VLAN105 rncj=1
bareos-dir (50): dird/jobq.cc:1004-0 Fail to acquire Wstore=File-AI-VLAN105 wncj=1
bareos-dir (50): dird/jobq.cc:971-0 Dec Rstore=File-AI-VLAN105 rncj=0
bareos-dir (50): dird/jobq.cc:951-0 Inc Rstore=File-AI-VLAN105 rncj=1
bareos-dir (50): dird/jobq.cc:1004-0 Fail to acquire Wstore=File-AI-VLAN105 wncj=1
bareos-dir (50): dird/jobq.cc:971-0 Dec Rstore=File-AI-VLAN105 rncj=0
bareos-dir (50): dird/jobq.cc:951-0 Inc Rstore=File-AI-VLAN107 rncj=1
bareos-dir (50): dird/jobq.cc:1004-0 Fail to acquire Wstore=File-AI-VLAN107 wncj=1
bareos-dir (50): dird/jobq.cc:971-0 Dec Rstore=File-AI-VLAN107 rncj=0
bareos-dir (50): dird/jobq.cc:951-0 Inc Rstore=File-AI-VLAN107 rncj=1
bareos-dir (50): dird/jobq.cc:1004-0 Fail to acquire Wstore=File-AI-VLAN107 wncj=1
etc...

For convenience, I have attached all the most relevant excerpts of my configuration files (sanitized for privacy/security reasons).

I suspect there's a bug that is responsible for this; however, I'm unable to make heads or tails of what's going on.

Could someone please take a look?

Thanks
Steps To Reproduce1. Place Bareos on a network switch (virtual or actual) with tagged VLANS
2. Configure Bareos host to have connectivity on three or more VLANs
3. Make sure you have clients you can backup, on each of the VLANs
4. Use the attached config files as reference for setting up storages and jobs for testing.
Tagsalways incremental, consolidate
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

xyros

xyros

2019-10-18 19:32

reporter  

bareos.zip (9,113 bytes)
bareos-dir.log (41,361 bytes)

Issue History

Date Modified Username Field Change
2019-10-18 19:32 xyros New Issue
2019-10-18 19:32 xyros Tag Attached: always incremental
2019-10-18 19:32 xyros Tag Attached: consolidate
2019-10-18 19:32 xyros File Added: bareos.zip
2019-10-18 19:32 xyros File Added: bareos-dir.log