View Issue Details

IDProjectCategoryView StatusLast Update
0001012bareos-coredirectorpublic2018-09-25 17:01
Reporterstephand Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product Version17.2.7 
Summary0001012: If a Job terminates with error then status client should not report OK for that job
DescriptionWhen a fatal SD error occurs while a restore job is running, nevertheless the "status client=<ClientName>" output shows OK status for that failed restore job.
Steps To ReproduceWhile the restore job is running, fill the volume currently being used with zeroes using dd. (Note: Never do this on a production system)

Example:
[root@vgr-c7bpdev2-test-pgsql storage]# ls -lah
total 299M
drwxrwxr-x. 2 bareos bareos 23 Sep 25 10:39 .
drwxrwx---. 3 bareos bareos 210 Sep 25 11:49 ..
-rw-r-----. 1 bareos bareos 299M Sep 25 10:46 Full-0001
[root@vgr-c7bpdev2-test-pgsql storage]# ll /dev/zero
crw-rw-rw-. 1 root root 1, 5 Sep 25 10:24 /dev/zero
[root@vgr-c7bpdev2-test-pgsql storage]# dd if=/dev/zero of=Full-0001 bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.168414 s, 1.9 GB/s

This will cause a fatal SD error:

*restore client=bareos-fd select all done yes
Using Catalog "MyCatalog"
Automatically selected FileSet: Data1Set
+-------+-------+-----------+----------+---------------------+------------+
| jobid | level | jobfiles | jobbytes | starttime | volumename |
+-------+-------+-----------+----------+---------------------+------------+
| 1 | F | 1,001,111 | 0 | 2018-09-25 10:39:01 | Full-0001 |
+-------+-------+-----------+----------+---------------------+------------+
You have selected the following JobId: 1

Building directory tree for JobId(s) 1 ... +++++++++++++++++++++++++++++++++++++++++++++++++
1,000,000 files inserted into the tree and marked for extraction.
Bootstrap records written to /var/lib/bareos/bareos-dir.restore.4.bsr

The job will require the following
   Volume(s) Storage(s) SD Device(s)
===========================================================================
   
    Full-0001 File FileStorage

Volumes marked with "*" are online.


1,001,111 files selected to be restored.

Using Catalog "MyCatalog"
Job queued. JobId=7
*
You have messages.
*mes
25-Sep 12:28 bareos-dir JobId 7: Start Restore Job RestoreFiles.2018-09-25_12.28.13_41
25-Sep 12:28 bareos-dir JobId 7: Using Device "FileStorage" to read.
25-Sep 12:28 bareos-sd JobId 7: Ready to read from volume "Full-0001" on device "FileStorage" (/var/lib/bareos/storage).
25-Sep 12:28 bareos-sd JobId 7: Forward spacing Volume "Full-0001" to file:block 0:219.
25-Sep 12:28 bareos-sd JobId 7: Error: block.c:288 Volume data error at 0:18192512! Wanted ID: "BB02", got "". Buffer discarded.
25-Sep 12:28 bareos-sd JobId 7: Releasing device "FileStorage" (/var/lib/bareos/storage).
25-Sep 12:28 bareos-sd JobId 7: Fatal error: fd_cmds.c:236 Command error with FD, hanging up.
25-Sep 12:28 bareos-dir JobId 7: Error: Bareos bareos-dir 17.2.7 (16Jul18):
  Build OS: x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core)
  JobId: 7
  Job: RestoreFiles.2018-09-25_12.28.13_41
  Restore Client: bareos-fd
  Start time: 25-Sep-2018 12:28:15
  End time: 25-Sep-2018 12:28:19
  Elapsed time: 4 secs
  Files Expected: 1,001,111
  Files Restored: 74,872
  Bytes Restored: 0
  Rate: 0.0 KB/s
  FD Errors: 0
  FD termination status: OK
  SD termination status: Fatal Error
  Termination: *** Restore Error ***

But the "status client" ouput lists this restore job as OK:

*status client=bareos-fd
Connecting to Client bareos-fd at localhost:9102

vgr-c7bpdev2-test-pgsql-fd Version: 17.2.7 (16 Jul 2018) x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core)
Daemon started 25-Sep-18 10:25. Jobs: run=7 running=0.
 Heap: heap=135,168 smbytes=113,964 max_bytes=181,086 bufs=89 max_bufs=122
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
bareos-dir (director) connected at: 25-Sep-18 12:28
No Jobs running.
====

Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
======================================================================
     1 Full 1,001,111 0 OK 25-Sep-18 10:39 Data1
     2 Full 106,836 0 Error 25-Sep-18 10:45 Data1
     3 Full 175,194 0 Error 25-Sep-18 10:46 Data1
     4 1,001,111 0 OK 25-Sep-18 10:50 RestoreFiles
     5 36,734 0 Error 25-Sep-18 10:53 RestoreFiles
     6 49,344 0 Error 25-Sep-18 11:49 RestoreFiles
     7 74,872 0 OK 25-Sep-18 12:28 RestoreFiles
====
Additional InformationThe expected behavior is that whenever you look at the status of a job on any way that it is possible, the status of the job should be the same.

Note that this is not reproducible by simply stopping bareos-sd while running a restore job. I also wasn't able to reproduce this kind of status discrepancy for backup jobs. In both cases the "status client" termination status was correctly showing Error.
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2018-09-25 17:01 stephand New Issue