View Issue Details

IDProjectCategoryView StatusLast Update
0001012bareos-coredirectorpublic2023-10-12 09:55
Reporterstephand Assigned Tobruno-at-bareos  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version17.2.7 
Summary0001012: If a Job terminates with error then status client should not report OK for that job
DescriptionWhen a fatal SD error occurs while a restore job is running, nevertheless the "status client=<ClientName>" output shows OK status for that failed restore job.
Steps To ReproduceWhile the restore job is running, fill the volume currently being used with zeroes using dd. (Note: Never do this on a production system)

Example:
[root@vgr-c7bpdev2-test-pgsql storage]# ls -lah
total 299M
drwxrwxr-x. 2 bareos bareos 23 Sep 25 10:39 .
drwxrwx---. 3 bareos bareos 210 Sep 25 11:49 ..
-rw-r-----. 1 bareos bareos 299M Sep 25 10:46 Full-0001
[root@vgr-c7bpdev2-test-pgsql storage]# ll /dev/zero
crw-rw-rw-. 1 root root 1, 5 Sep 25 10:24 /dev/zero
[root@vgr-c7bpdev2-test-pgsql storage]# dd if=/dev/zero of=Full-0001 bs=1M count=300
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 0.168414 s, 1.9 GB/s

This will cause a fatal SD error:

*restore client=bareos-fd select all done yes
Using Catalog "MyCatalog"
Automatically selected FileSet: Data1Set
+-------+-------+-----------+----------+---------------------+------------+
| jobid | level | jobfiles | jobbytes | starttime | volumename |
+-------+-------+-----------+----------+---------------------+------------+
| 1 | F | 1,001,111 | 0 | 2018-09-25 10:39:01 | Full-0001 |
+-------+-------+-----------+----------+---------------------+------------+
You have selected the following JobId: 1

Building directory tree for JobId(s) 1 ... +++++++++++++++++++++++++++++++++++++++++++++++++
1,000,000 files inserted into the tree and marked for extraction.
Bootstrap records written to /var/lib/bareos/bareos-dir.restore.4.bsr

The job will require the following
   Volume(s) Storage(s) SD Device(s)
===========================================================================
   
    Full-0001 File FileStorage

Volumes marked with "*" are online.


1,001,111 files selected to be restored.

Using Catalog "MyCatalog"
Job queued. JobId=7
*
You have messages.
*mes
25-Sep 12:28 bareos-dir JobId 7: Start Restore Job RestoreFiles.2018-09-25_12.28.13_41
25-Sep 12:28 bareos-dir JobId 7: Using Device "FileStorage" to read.
25-Sep 12:28 bareos-sd JobId 7: Ready to read from volume "Full-0001" on device "FileStorage" (/var/lib/bareos/storage).
25-Sep 12:28 bareos-sd JobId 7: Forward spacing Volume "Full-0001" to file:block 0:219.
25-Sep 12:28 bareos-sd JobId 7: Error: block.c:288 Volume data error at 0:18192512! Wanted ID: "BB02", got "". Buffer discarded.
25-Sep 12:28 bareos-sd JobId 7: Releasing device "FileStorage" (/var/lib/bareos/storage).
25-Sep 12:28 bareos-sd JobId 7: Fatal error: fd_cmds.c:236 Command error with FD, hanging up.
25-Sep 12:28 bareos-dir JobId 7: Error: Bareos bareos-dir 17.2.7 (16Jul18):
  Build OS: x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core)
  JobId: 7
  Job: RestoreFiles.2018-09-25_12.28.13_41
  Restore Client: bareos-fd
  Start time: 25-Sep-2018 12:28:15
  End time: 25-Sep-2018 12:28:19
  Elapsed time: 4 secs
  Files Expected: 1,001,111
  Files Restored: 74,872
  Bytes Restored: 0
  Rate: 0.0 KB/s
  FD Errors: 0
  FD termination status: OK
  SD termination status: Fatal Error
  Termination: *** Restore Error ***

But the "status client" ouput lists this restore job as OK:

*status client=bareos-fd
Connecting to Client bareos-fd at localhost:9102

vgr-c7bpdev2-test-pgsql-fd Version: 17.2.7 (16 Jul 2018) x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core)
Daemon started 25-Sep-18 10:25. Jobs: run=7 running=0.
 Heap: heap=135,168 smbytes=113,964 max_bytes=181,086 bufs=89 max_bufs=122
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
bareos-dir (director) connected at: 25-Sep-18 12:28
No Jobs running.
====

Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
======================================================================
     1 Full 1,001,111 0 OK 25-Sep-18 10:39 Data1
     2 Full 106,836 0 Error 25-Sep-18 10:45 Data1
     3 Full 175,194 0 Error 25-Sep-18 10:46 Data1
     4 1,001,111 0 OK 25-Sep-18 10:50 RestoreFiles
     5 36,734 0 Error 25-Sep-18 10:53 RestoreFiles
     6 49,344 0 Error 25-Sep-18 11:49 RestoreFiles
     7 74,872 0 OK 25-Sep-18 12:28 RestoreFiles
====
Additional InformationThe expected behavior is that whenever you look at the status of a job on any way that it is possible, the status of the job should be the same.

Note that this is not reproducible by simply stopping bareos-sd while running a restore job. I also wasn't able to reproduce this kind of status discrepancy for backup jobs. In both cases the "status client" termination status was correctly showing Error.
TagsNo tags attached.

Activities

bruno-at-bareos

bruno-at-bareos

2023-09-11 17:21

manager   ~0005408

Is this still happening ?
stephand

stephand

2023-10-11 18:57

developer   ~0005469

The behaviour changed, now getting

11-Oct 16:47 bareos-sd JobId 12: Error: stored/block.cc:294 Volume data error at 0:0! Wanted ID: "BB02", got "". Buffer discarded.
11-Oct 16:47 bareos-sd JobId 12: Warning: stored/acquire.cc:325 Read acquire: Requested Volume "Full-0001" on "FileStorage" (/var/lib/bareos/storage) is not a Bareos labeled Volume, because: ERR=stored/block.cc:294 Volume data error at 0:0! Wanted ID: "BB02", got "". Buffer discarded.
11-Oct 16:47 bareos-sd JobId 12: Please mount read Volume "Full-0001" for:
    Job: RestoreFiles.2023-10-11_16.47.46_20
    Storage: "FileStorage" (/var/lib/bareos/storage)
    Pool: Incremental
    Media type: File

and the job stay in status "waiting for a mount request"

After cancelling, the job correctly displays status "Cancel" in status client output.

Just veried that with packages from https://download.bareos.org/next/, so version is 23.0.0~pre1020.b1d94178f (09 October 2023)

Can't tell since when the behaviour changed, but we can close this as the reported issue is obviously not existing any more.
bruno-at-bareos

bruno-at-bareos

2023-10-12 09:55

manager   ~0005470

In the meantime, it has be fixed

Issue History

Date Modified Username Field Change
2018-09-25 17:01 stephand New Issue
2023-09-11 17:21 bruno-at-bareos Assigned To => bruno-at-bareos
2023-09-11 17:21 bruno-at-bareos Status new => feedback
2023-09-11 17:21 bruno-at-bareos Note Added: 0005408
2023-10-11 18:57 stephand Note Added: 0005469
2023-10-11 18:57 stephand Status feedback => assigned
2023-10-12 09:55 bruno-at-bareos Status assigned => closed
2023-10-12 09:55 bruno-at-bareos Resolution open => fixed
2023-10-12 09:55 bruno-at-bareos Note Added: 0005470