0001012: If a Job terminates with error then status client should not report OK for that job - Bareos Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0001012	bareos-core	director	public	2018-09-25 17:01	2023-10-12 09:55

Reporter	stephand	Assigned To	bruno-at-bareos
Priority	normal	Severity	minor	Reproducibility	always
Status	closed	Resolution	fixed
Product Version	17.2.7

Summary	0001012: If a Job terminates with error then status client should not report OK for that job
Description	When a fatal SD error occurs while a restore job is running, nevertheless the "status client=<ClientName>" output shows OK status for that failed restore job.
Steps To Reproduce	While the restore job is running, fill the volume currently being used with zeroes using dd. (Note: Never do this on a production system) Example: [root@vgr-c7bpdev2-test-pgsql storage]# ls -lah total 299M drwxrwxr-x. 2 bareos bareos 23 Sep 25 10:39 . drwxrwx---. 3 bareos bareos 210 Sep 25 11:49 .. -rw-r-----. 1 bareos bareos 299M Sep 25 10:46 Full-0001 [root@vgr-c7bpdev2-test-pgsql storage]# ll /dev/zero crw-rw-rw-. 1 root root 1, 5 Sep 25 10:24 /dev/zero [root@vgr-c7bpdev2-test-pgsql storage]# dd if=/dev/zero of=Full-0001 bs=1M count=300 300+0 records in 300+0 records out 314572800 bytes (315 MB) copied, 0.168414 s, 1.9 GB/s This will cause a fatal SD error: restore client=bareos-fd select all done yes Using Catalog "MyCatalog" Automatically selected FileSet: Data1Set +-------+-------+-----------+----------+---------------------+------------+ \| jobid \| level \| jobfiles \| jobbytes \| starttime \| volumename \| +-------+-------+-----------+----------+---------------------+------------+ \| 1 \| F \| 1,001,111 \| 0 \| 2018-09-25 10:39:01 \| Full-0001 \| +-------+-------+-----------+----------+---------------------+------------+ You have selected the following JobId: 1 Building directory tree for JobId(s) 1 ... +++++++++++++++++++++++++++++++++++++++++++++++++ 1,000,000 files inserted into the tree and marked for extraction. Bootstrap records written to /var/lib/bareos/bareos-dir.restore.4.bsr The job will require the following Volume(s) Storage(s) SD Device(s) =========================================================================== Full-0001 File FileStorage Volumes marked with "" are online. 1,001,111 files selected to be restored. Using Catalog "MyCatalog" Job queued. JobId=7 * You have messages. mes 25-Sep 12:28 bareos-dir JobId 7: Start Restore Job RestoreFiles.2018-09-25_12.28.13_41 25-Sep 12:28 bareos-dir JobId 7: Using Device "FileStorage" to read. 25-Sep 12:28 bareos-sd JobId 7: Ready to read from volume "Full-0001" on device "FileStorage" (/var/lib/bareos/storage). 25-Sep 12:28 bareos-sd JobId 7: Forward spacing Volume "Full-0001" to file:block 0:219. 25-Sep 12:28 bareos-sd JobId 7: Error: block.c:288 Volume data error at 0:18192512! Wanted ID: "BB02", got "". Buffer discarded. 25-Sep 12:28 bareos-sd JobId 7: Releasing device "FileStorage" (/var/lib/bareos/storage). 25-Sep 12:28 bareos-sd JobId 7: Fatal error: fd_cmds.c:236 Command error with FD, hanging up. 25-Sep 12:28 bareos-dir JobId 7: Error: Bareos bareos-dir 17.2.7 (16Jul18): Build OS: x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core) JobId: 7 Job: RestoreFiles.2018-09-25_12.28.13_41 Restore Client: bareos-fd Start time: 25-Sep-2018 12:28:15 End time: 25-Sep-2018 12:28:19 Elapsed time: 4 secs Files Expected: 1,001,111 Files Restored: 74,872 Bytes Restored: 0 Rate: 0.0 KB/s FD Errors: 0 FD termination status: OK SD termination status: Fatal Error Termination: Restore Error * But the "status client" ouput lists this restore job as OK: *status client=bareos-fd Connecting to Client bareos-fd at localhost:9102 vgr-c7bpdev2-test-pgsql-fd Version: 17.2.7 (16 Jul 2018) x86_64-redhat-linux-gnu redhat CentOS Linux release 7.5.1804 (Core) Daemon started 25-Sep-18 10:25. Jobs: run=7 running=0. Heap: heap=135,168 smbytes=113,964 max_bytes=181,086 bufs=89 max_bufs=122 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s Running Jobs: bareos-dir (director) connected at: 25-Sep-18 12:28 No Jobs running. ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name ====================================================================== 1 Full 1,001,111 0 OK 25-Sep-18 10:39 Data1 2 Full 106,836 0 Error 25-Sep-18 10:45 Data1 3 Full 175,194 0 Error 25-Sep-18 10:46 Data1 4 1,001,111 0 OK 25-Sep-18 10:50 RestoreFiles 5 36,734 0 Error 25-Sep-18 10:53 RestoreFiles 6 49,344 0 Error 25-Sep-18 11:49 RestoreFiles 7 74,872 0 OK 25-Sep-18 12:28 RestoreFiles ====
Additional Information	The expected behavior is that whenever you look at the status of a job on any way that it is possible, the status of the job should be the same. Note that this is not reproducible by simply stopping bareos-sd while running a restore job. I also wasn't able to reproduce this kind of status discrepancy for backup jobs. In both cases the "status client" termination status was correctly showing Error.
Tags	No tags attached.

bruno-at-bareos 2023-09-11 17:21 manager ~0005408	Is this still happening ?

stephand 2023-10-11 18:57 developer ~0005469	The behaviour changed, now getting 11-Oct 16:47 bareos-sd JobId 12: Error: stored/block.cc:294 Volume data error at 0:0! Wanted ID: "BB02", got "". Buffer discarded. 11-Oct 16:47 bareos-sd JobId 12: Warning: stored/acquire.cc:325 Read acquire: Requested Volume "Full-0001" on "FileStorage" (/var/lib/bareos/storage) is not a Bareos labeled Volume, because: ERR=stored/block.cc:294 Volume data error at 0:0! Wanted ID: "BB02", got "". Buffer discarded. 11-Oct 16:47 bareos-sd JobId 12: Please mount read Volume "Full-0001" for: Job: RestoreFiles.2023-10-11_16.47.46_20 Storage: "FileStorage" (/var/lib/bareos/storage) Pool: Incremental Media type: File and the job stay in status "waiting for a mount request" After cancelling, the job correctly displays status "Cancel" in status client output. Just veried that with packages from https://download.bareos.org/next/, so version is 23.0.0~pre1020.b1d94178f (09 October 2023) Can't tell since when the behaviour changed, but we can close this as the reported issue is obviously not existing any more.

bruno-at-bareos 2023-10-12 09:55 manager ~0005470	In the meantime, it has be fixed

Date Modified	Username	Field	Change
2018-09-25 17:01	stephand	New Issue
2023-09-11 17:21	bruno-at-bareos	Assigned To	=> bruno-at-bareos
2023-09-11 17:21	bruno-at-bareos	Status	new => feedback
2023-09-11 17:21	bruno-at-bareos	Note Added: 0005408
2023-10-11 18:57	stephand	Note Added: 0005469
2023-10-11 18:57	stephand	Status	feedback => assigned
2023-10-12 09:55	bruno-at-bareos	Status	assigned => closed
2023-10-12 09:55	bruno-at-bareos	Resolution	open => fixed
2023-10-12 09:55	bruno-at-bareos	Note Added: 0005470