View Issue Details

IDProjectCategoryView StatusLast Update
0000909bareos-core[All Projects] directorpublic2018-03-13 12:16
ReporterrightmiremAssigned To 
Status newResolutionreopened 
PlatformIntelOSDebian JessieOS Version8
Product Version 
Target VersionFixed in Version 
Summary0000909: "Reschedule on error" recognized, but not actually rescheduling the job
DescriptionI have been testing my backup's ability to recover from an error.

I have a job that has the following settings...

  Reschedule Interval = 1 minute
  Reschedule On Error = yes
  Reschedule Times = 5

... and I start it as a full. I then restart the Bareos director (to error out the job intentionally).

In the log, it shows that the job has been rescheduled - but the job never starts. The job should have started at 10:20. But by 10:26 there was nothing running being reported by "list jobs" in bconsole.

=== LOG ===
    01-Feb 10:19 server-dir JobId 569: Fatal error: Network error with FD during Backup: ERR=No data available
    01-Feb 10:19 server-sd JobId 569: Fatal error: append.c:245 Network error reading from FD. ERR=No data available
    01-Feb 10:19 server-sd JobId 569: Elapsed time=00:01:24, Transfer rate=112.6 M Bytes/second
    01-Feb 10:19 server-dir JobId 569: Error: Director's comm line to SD dropped.
    01-Feb 10:19 server-dir JobId 569: Fatal error: No Job status returned from FD.
    01-Feb 10:19 server-dir JobId 569: Error: Bareos server-dir 17.2.4 (21Sep17):
      Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 8.0 (jessie)
      JobId: 569
      Job: backupJobName.2018-02-01_10.18.12_04
      Backup Level: Full
      Client: "server-fd" 17.2.4 (21Sep17) x86_64-pc-linux-gnu,debian,Debian GNU/Linux 8.0 (jessie),Debian_8.0,x86_64
      FileSet: "backupJobName" 2018-01-29 15:00:00
      Pool: "6mo-Full" (From Job FullPool override)
      Catalog: "MyCatalog" (From Client resource)
      Storage: "Tape" (From Job resource)
      Scheduled time: 01-Feb-2018 10:18:10
      Start time: 01-Feb-2018 10:18:14
      End time: 01-Feb-2018 10:19:45
      Elapsed time: 1 min 31 secs
      Priority: 10
      FD Files Written: 0
      SD Files Written: 0
      FD Bytes Written: 0 (0 B)
      SD Bytes Written: 1,042 (1.042 KB)
      Rate: 0.0 KB/s
      Software Compression: None
      VSS: no
      Encryption: no
      Accurate: no
      Volume name(s): DL011BL7
      Volume Session Id: 1
      Volume Session Time: 1517476667
      Last Volume Bytes: 5,035,703,887,872 (5.035 TB)
      Non-fatal FD errors: 2
      SD Errors: 0
      FD termination status: Error
      SD termination status: Error
      Termination: *** Backup Error ***

    01-Feb 10:19 server-dir JobId 569: Rescheduled Job backupJobName.2018-02-01_10.18.12_04 at 01-Feb-2018 10:19 to re-run in 60 seconds (01-Feb-2018 10:20).
    01-Feb 10:19 server-dir JobId 569: Job backupJobName.2018-02-01_10.18.12_04 waiting 60 seconds for scheduled start time.

Steps To ReproduceI have scheduled a job with "reschedule on error"

I have both started the job manually, and let t he schedule start the job through the scheduler

I have tried killing the job BOTH by killing the core Bareos process with "kill -9" AND by simply restarting bareos with the restart commands.

Regardless of the method to kill the job, the log recognizes the job ended on an error, and states it is rescheduling the job (in 60 seconds).

However, the job never actually restarts.
Additional InformationSee the main issue description for the log data
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action




2018-02-12 18:30

administrator   ~0002908

Reschedule on error is not intended to cover Bareos Director restart.
However, it should work if you restart the fd.


2018-02-20 15:16

reporter   ~0002932

Can we reopen this. I never got notification that it was in progress.

So, is it indicative of a problem when the log TRIES to reschedule the job - but simply doesn't?


2018-02-20 15:23

reporter   ~0002933

OK. It DID work when I killed the fd.

However, can you tell me what sorts of errors WILL trigger a restart (I don't see that in the manual). We're not just concerned with file errors, but also...

- Tape drive failure.
- Accidental system restart or server power failure.
- OS crash or hang.
- Daemon crashes or hangs.


2018-03-13 12:16

reporter   ~0002945

This can be marked as resolved

Issue History

Date Modified Username Field Change
2018-02-12 09:50 rightmirem New Issue
2018-02-12 18:30 joergs Note Added: 0002908
2018-02-20 15:01 joergs Status new => closed
2018-02-20 15:01 joergs Resolution open => no change required
2018-02-20 15:16 rightmirem Note Added: 0002932
2018-02-20 15:16 rightmirem Status closed => feedback
2018-02-20 15:16 rightmirem Resolution no change required => reopened
2018-02-20 15:23 rightmirem Note Added: 0002933
2018-02-20 15:23 rightmirem Status feedback => new
2018-03-13 12:16 rightmirem Note Added: 0002945