View Issue Details

IDProjectCategoryView StatusLast Update
0000528bareos-coredirectorpublic2019-12-18 15:25
Reporterjkhradil Assigned Topstorz  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Summary0000528: Migration job hangs waiting for waiting for max Storage jobs
DescriptionSince version 15.2 migration (copy) jobs do not work when run from schedule. The control job doesn't get the next pool setting applied and hangs waiting for max Storage jobs. This doesn't happen when the job is run manually as the next pool is set in reset_restore_context function (ua_run.c) in this case.

Patch fixing this issue is attached.
Steps To Reproduce1) Define migration or copy job as per documentation
2) Set this job to run from a schedule and wait for the specified
3) See job hang waiting for waiting for max Storage jobs
TagsNo tags attached.

Activities

jkhradil

jkhradil

2015-10-01 09:38

reporter  

0001-Fix-migration-control-job-hanging-due-to-next-pool-n.patch (858 bytes)   
From 3bf26b78b49992c15c9b06891c1068eb686ae783 Mon Sep 17 00:00:00 2001
From: Jakub Hradil <jkhradil@gmail.com>
Date: Thu, 1 Oct 2015 09:12:38 +0200
Subject: [PATCH] Fix migration control job hanging due to next pool not being
 set when run from schedule

---
 src/dird/migrate.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/dird/migrate.c b/src/dird/migrate.c
index a8b9d1c..cdbdf31 100644
--- a/src/dird/migrate.c
+++ b/src/dird/migrate.c
@@ -1223,6 +1223,11 @@ bool do_migration_init(JCR *jcr)
        * one to the writing SD.
        */
       jcr->remote_replicate = !is_same_storage_daemon(jcr->res.rstore, jcr->res.wstore);
+   } else {
+      /*
+       * Set next pool even for control job, otherwise it will hang waiting for max Storage jobs
+       */
+      set_migration_next_pool(jcr, &pool);
    }
 
    return true;
-- 
2.4.3

pstorz

pstorz

2015-10-01 13:10

administrator   ~0001856

Hello,

wer are using 15.2.1 for quite some time and we also use copy jobs but that always worked.

Also, reproducing the problem in a regression test also was not successful

Can you please specify how exactly to reproduce the problem?


Thank you very much
jkhradil

jkhradil

2015-10-01 15:29

reporter   ~0001857

My configuration files for the director and storage daemon are in the attached file bareos.zip. This configuration worked with version 14.2.4 on Centos 7, after upgrading to version 15.2.1 the copy job hangs waiting for max Storage jobs.

Same happens using build from master on Fedora.

Steps to reproduce:
1) create empty bareos database
2) used attached config files
3) schedule Full and Copy jobs (DIR_Bareos_Schedule_Bareos_Backup, DIR_Bareos_Schedule_Bareos_Copy)
4) see full backup finish successfully
5) see copy job hang, director's status shows: "job" is waiting for max Storage jobs

Running this scenario under debbuger, I see it go thru the do_migration_init(JCR *jcr) function only once to init the control job. It jumps over the if (jcr->MigrateJobId != 0) block (this block is new since version 14.2) and never enters the do_migration_init(JCR *jcr) function for the second time to init the copy job.

If I run the copy job manually, the do_migration_init(JCR *jcr) function is executed twice and job runs succesfully. The difference I can see is that in a scheduled job the rstorage and wstorage are the same, however in manually ran job they differ, since wstorage is set to the right value in reset_restore_context function (ua_run.c).
pstorz

pstorz

2015-10-02 10:14

administrator   ~0001862

Hello,

looks like you forgot to upload the bareos.zip file. Is that right?
jkhradil

jkhradil

2015-10-02 11:38

reporter  

bareos.zip (4,422 bytes)
jkhradil

jkhradil

2015-10-02 11:39

reporter   ~0001863

Yeah, sorry about that. I selected the file, but forgot to click the Upload File button. Now it's uploaded.
pstorz

pstorz

2015-10-02 12:19

administrator   ~0001864

Fixing your problem is very easy; when you have the dir

"waiting for max Storage jobs.", you only have to increase the
maximum concurrent jobs on your storage to something more than one.

I added

"Maximum Concurrent Jobs = 10"

to each of your Storages in your storage.conf, and everything works without any change.


However we will have a look at your patch anyway.

Thanks

best regards

Philipp
mvwieringen

mvwieringen

2015-11-19 14:49

developer   ~0002004

Fix committed to bareos bareos-15.2 branch with changesetid 5893.

Related Changesets

bareos: bareos-15.2 4cc7481f

2015-11-17 16:27

pstorz


Committer: mvwieringen

Ported: N/A

Details Diff
migration control jobs don't count for concurrency

Migration control jobs do not touch the storage in any way
so they do not need to be counted when checking the maximum
concurrent jobs for storages.

Also did a cleanup of the the code and comments along the way.

Fixes 0000528: Migration job hangs waiting for waiting for max Storage jobs

Signed-off-by: Marco van Wieringen <marco.van.wieringen@bareos.com>
Affected Issues
0000528
mod - src/dird/jobq.c Diff File
mod - src/include/jcr.h Diff File

Issue History

Date Modified Username Field Change
2015-10-01 09:38 jkhradil New Issue
2015-10-01 09:38 jkhradil File Added: 0001-Fix-migration-control-job-hanging-due-to-next-pool-n.patch
2015-10-01 13:10 pstorz Note Added: 0001856
2015-10-01 13:10 pstorz Assigned To => pstorz
2015-10-01 13:10 pstorz Status new => feedback
2015-10-01 15:29 jkhradil Note Added: 0001857
2015-10-01 15:29 jkhradil Status feedback => assigned
2015-10-02 10:14 pstorz Note Added: 0001862
2015-10-02 11:38 jkhradil File Added: bareos.zip
2015-10-02 11:39 jkhradil Note Added: 0001863
2015-10-02 12:19 pstorz Note Added: 0001864
2015-11-19 14:49 mvwieringen Changeset attached => bareos bareos-15.2 4cc7481f
2015-11-19 14:49 mvwieringen Note Added: 0002004
2015-11-19 14:49 mvwieringen Status assigned => resolved
2015-11-19 14:49 mvwieringen Resolution open => fixed
2019-12-18 15:25 arogge Status resolved => closed