View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0001239||bareos-core||storage daemon||public||2020-05-12 11:27||2020-05-12 11:27|
|Priority||normal||Severity||major||Reproducibility||have not tried|
|Summary||0001239: A running job stole a tape from another running job|
I'm not really sure if it is a bug or a limitation, but yesterday I saw an unexpected behavior.
There is a library with 2 tape readers.
I started a restore job JOB_REC_1 that needed 4 tapes, let's say T1,T2,T3,T4. While reading from T3, another scheduled backup job JOB_BACK_1 started. It was using T4. So when JOB_REC_1 asked for T4, it was not available (because used by JOB_BACK_1). Normal. So it started waiting for T4.
Some minutes later other jobs JOB_BACK_2, JOB_BACK_3, JOB_BACK_4 were queued (max 2 jobs in parallel in bareos config) . When JOB_BACK_1 finished, JOB_BACK_2 started writing to the tape. At this moment I was a little bit surprised to see it writing to the tape, because I was expecting that JOB_REC_1 should have priority for the tape. I waited some minutes, then JOB_BACK_3 started. In its logs I saw that it was "Ready to append to end of Volume R0B018L7 at file=1097". At this moment, JOB_REC_1 unloaded the tape from JOB_BACK_3 drive and loaded it in its drive. Of course JOB_BACK_3 got an IO error message: "Error: stored/block.cc:804 Write error at 1097:0 on device "Drive-2" (/dev/nst1). ERR=Erreur d'entrée/sortie." and Bareos changed tape status to full.
I think that restore process succeeded (even if job "failed", because Bareos found entry in its database that was not written to the tape).
When manually changed tape status back to Append, backup jobs succeeded (with a warning for JOB_BACK_3).
But I wonder if there is a bug in media locking…
|Tags||No tags attached.|