View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001591 | bareos-core | director | public | 2024-01-19 16:17 | 2024-05-17 09:39 |
Reporter | raschu | Assigned To | bruno-at-bareos | ||
Priority | normal | Severity | minor | Reproducibility | random |
Status | assigned | Resolution | open | ||
Platform | Linux | OS | Debian | OS Version | 10 |
Summary | 0001591: Error: stored/device.cc:222 Error getting Volume info | ||||
Description | Hi, we manage over 70 backup jobs. In rare cases we observe the error "Error: stored/device.cc:222 Error getting Volume info" in the backup mails. The job was successful and is marked with a warning. 2 examples: 19-Jan 02:19 bareos-dir JobId 8696: There are no more Jobs associated with Volume "incr7d-1589". Marking it purged. 19-Jan 02:19 bareos-dir JobId 8696: All records pruned from Volume "incr7d-1589"; marking it "Purged" 19-Jan 02:19 bareos-dir JobId 8696: Recycled volume "incr7d-1589" 19-Jan 02:19 bareos-dir JobId 8696: Using Device "bstore02-vd01" to write. 19-Jan 02:19 client01 JobId 8696: Connected Storage daemon at storage02:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 19-Jan 02:19 client01 JobId 8696: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 19-Jan 02:19 client01 JobId 8696: Extended attribute support is enabled 19-Jan 02:19 client01 JobId 8696: ACL support is enabled 19-Jan 02:19 bareos-sd JobId 8696: Recycled volume "incr7d-1589" on device "bstore02-vd01" (/bstore02), all previous data lost. 19-Jan 02:19 bareos-dir JobId 8696: Max Volume jobs=1 exceeded. Marking Volume "incr7d-1589" as Used. 19-Jan 02:19 bareos-sd JobId 8696: Error: stored/device.cc:222 Error getting Volume info: 1998 Volume "incr7d-1589" catalog status is Used, but should be Append, Purged or Recycle. 19-Jan 02:19 bareos-sd JobId 8696: Releasing device "bstore02-vd01" (/bstore02). 19-Jan 02:19 bareos-sd JobId 8696: Elapsed time=00:00:02, Transfer rate=7.527 M Bytes/second 19-Jan 01:00 bareos-dir JobId 8693: Sending Accurate information. 19-Jan 02:15 bareos-dir JobId 8693: Created new Volume "aincr30d-bsd02-2821" in catalog. 19-Jan 02:15 bareos-dir JobId 8693: Using Device "bstore02-vd01" to write. 19-Jan 02:17 client02 JobId 8693: Connected Storage daemon at storage02:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 19-Jan 02:17 client02 JobId 8693: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 19-Jan 02:17 client02 JobId 8693: Extended attribute support is enabled 19-Jan 02:17 bareos-sd JobId 8693: Labeled new Volume "aincr30d-bsd02-2821" on device "bstore02-vd01" (/bstore02). 19-Jan 02:17 client02 JobId 8693: ACL support is enabled 19-Jan 02:17 bareos-sd JobId 8693: Wrote label to prelabeled Volume "aincr30d-bsd02-2821" on device "bstore02-vd01" (/bstore02) 19-Jan 02:17 bareos-dir JobId 8693: Max Volume jobs=1 exceeded. Marking Volume "aincr30d-bsd02-2821" as Used. 19-Jan 02:18 bareos-sd JobId 8693: Error: stored/device.cc:222 Error getting Volume info: 1998 Volume "aincr30d-bsd02-2821" catalog status is Used, but should be Append, Purged or Recycle. 19-Jan 02:18 bareos-sd JobId 8693: Releasing device "bstore02-vd01" (/bstore02). 19-Jan 02:18 bareos-sd JobId 8693: Elapsed time=00:01:29, Transfer rate=345.0 K Bytes/second I can't understand the error. The volumes were either newly created or recently reset. The director has the version "23.0.1~pre57.8e89bfe0a-40". Thanks Ralf | ||||
Tags | No tags attached. | ||||
Two jobs accessing the same volume at the same time ? | |
Thanks Bruno. No, because I set only 1 jobs per volume: """ Pool { Name = aincr30d-bsd02 # 30 days always incremental Pool Type = Backup Recycle = yes # Bareos can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 65 days # 30 days, recommended retention buffer >2x Maximum Volume Bytes = 5 g # Limit Volume size to something reasonable Volume Use Duration = 23 hours # defines the time period that the Volume can be written Maximum Volume Jobs = 1 # nur ein Job pro Volume, damit fehlgeschlagene Jobs besser gelöscht werden können Label Format = "aincr30d-bsd02-" # Volumes label Storage = bsd02 Next Pool = aicons30d-bsd02 } """ Bye Ralf |
|
And what about the maximum concurrent job on the device ? | |
Hi Bruno, we use some virtual devices for parallel jobs. The config is: Storage { # define myself Name = bsd02 Description = "storage for internal backup - filter: !hrz in fqdn" Address = storage02 Password = "pass" Device = bstore02 Device = bstore02-vd01 Device = bstore02-vd02 Device = bstore02-vd03 Media Type = File02 Maximum Concurrent Jobs = 20 } Config from the storage daemon: Device { Name = bstore02 Media Type = File02 Device Type = File Archive Device = /bstore02 Maximum Concurrent Jobs = 1 LabelMedia = yes; # lets Bareos label unlabeled media Random Access = yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; } Device { Name = bstore02-vd01 Media Type = File02 Device Type = File Archive Device = /bstore02 Maximum Concurrent Jobs = 1 LabelMedia = yes; # lets Bareos label unlabeled media Random Access = yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; Description = "virtual device - linked for more concurrent jobs" } Device { Name = bstore02-vd02 Media Type = File02 Device Type = File Archive Device = /bstore02 Maximum Concurrent Jobs = 1 LabelMedia = yes; # lets Bareos label unlabeled media Random Access = yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; Description = "virtual device - linked for more concurrent jobs" } Device { Name = bstore02-vd03 Media Type = File02 Device Type = File Archive Device = /bstore02 Maximum Concurrent Jobs = 1 LabelMedia = yes; # lets Bareos label unlabeled media Random Access = yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; Description = "virtual device - linked for more concurrent jobs" } I took the idea from here: https://svennd.be/concurrent-jobs-in-bareos-with-disk-storage/ Thanks Ralf |
|
Things looks good, would you mind to find back the complete job's log for those having that issue. btw if you want to optimize even more the configuration you can get inspired by future https://github.com/bareos/bareos/pull/1467#issuecomment-1780956976 ;-) |
|
Hi Bruno, thanks. The new option with the autochanger is good :-) Now here more examples. I got the error 4 times (different jobs) last weekend :-/ """ 05-Feb 04:01 bareos-dir JobId 9638: Start Backup JobId 9638, Job=client01.2024-02-05_04.00.01_09 05-Feb 04:01 bareos-dir JobId 9638: Connected Storage daemon at bstore02:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:01 bareos-dir JobId 9638: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:01 bareos-dir JobId 9638: Connected Client: client01 at client01:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:01 bareos-dir JobId 9638: Handshake: Immediate TLS 05-Feb 04:01 bareos-dir JobId 9638: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:01 bareos-dir JobId 9638: Sending Accurate information. 05-Feb 04:29 bareos-dir JobId 9638: Created new Volume "aincr180d-bsd02-3234" in catalog. 05-Feb 04:29 bareos-dir JobId 9638: Using Device "bstore02-vd03" to write. 05-Feb 04:29 client01 JobId 9638: Connected Storage daemon at bstore02:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:29 client01 JobId 9638: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 05-Feb 04:29 client01 JobId 9638: Extended attribute support is enabled 05-Feb 04:29 bareos-sd JobId 9638: Labeled new Volume "aincr180d-bsd02-3234" on device "bstore02-vd03" (/bstore02). 05-Feb 04:29 client01 JobId 9638: ACL support is enabled 05-Feb 04:29 bareos-sd JobId 9638: Wrote label to prelabeled Volume "aincr180d-bsd02-3234" on device "bstore02-vd03" (/bstore02) 05-Feb 04:29 bareos-dir JobId 9638: Max Volume jobs=1 exceeded. Marking Volume "aincr180d-bsd02-3234" as Used. 05-Feb 04:29 bareos-sd JobId 9638: Error: stored/device.cc:222 Error getting Volume info: 1998 Volume "aincr180d-bsd02-3234" catalog status is Used, but should be Append, Purged or Recycle. 05-Feb 04:29 bareos-sd JobId 9638: Releasing device "bstore02-vd03" (/bstore02). 05-Feb 04:29 bareos-sd JobId 9638: Elapsed time=00:00:01, Transfer rate=0 Bytes/second 05-Feb 04:29 bareos-dir JobId 9638: Bareos bareos-dir 23.0.2~pre6.5fbc90f69 (23Jan24): Build OS: Debian GNU/Linux 11 (bullseye) JobId: 9638 Job: client01.2024-02-05_04.00.01_09 Backup Level: Incremental, since=2024-02-04 04:06:17 Client: "client01" 23.0.2~pre32.0a0e55739 (31Jan24) Debian GNU/Linux 12 (bookworm),debian FileSet: "client01" 2024-01-19 04:00:00 Pool: "aincr180d-bsd02" (From Job resource) Catalog: "MyCatalog" (From Client resource) Storage: "bsd02" (From Pool resource) Scheduled time: 05-Feb-2024 04:00:01 Start time: 05-Feb-2024 04:01:02 End time: 05-Feb-2024 04:29:08 Elapsed time: 28 mins 6 secs Priority: 10 Allow Mixed Priority: no FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Quota Used: 14,609,097,104 (14.60 GB) Burst Quota: 0 (0 B) Soft Quota: 26,843,545,600 (26.84 GB) Hard Quota: 32,212,254,720 (32.21 GB) Grace Expiry Date: Soft Quota not exceeded Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: yes Accurate: yes Volume name(s): Volume Session Id: 565 Volume Session Time: 1704452670 Last Volume Bytes: 0 (0 B) Non-fatal FD errors: 0 SD Errors: 1 FD termination status: OK SD termination status: OK Bareos binary info: Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com Job triggered by: Scheduler Termination: Backup OK -- with warnings """ |
|
So to resume the client is using encryption and you have quotas in place ? Any kind of predictive path to reproduce, so you may be able to run the daemons with a high debug level ? |
|
Hi Bruno, yes we use encryption and quotas. The clients are "internal customer" so it is not so easy to set a higher debug level. But for my opinion - it must be a director or catalog problem!? It is strange because Bareos create new volume and detect it is status 'used'? Thanks Ralf |
|
you can raise a debug level to any client in bconsole, so not a big concern, I would say. The question is why you are able to reproduce that error, while we don't with all the tests ( and real ) runs here. If we can reproduce, we know we can fix it: the problem is how can I reproduce that ;-) |
|
Hi Bruno, okay - thanks :-) Perfect, I set the debug level for one client which often gets the error. *setdebug level=200 trace=1 timestamp=1 client=renamedclient Connecting to Client renamedclient at renamedclient:9102 2000 OK setdebug=200 trace=1 hangup=0 timestamp=1 tracefile=/var/lib/bareos/renamedclient.trace Now I wait for the next job this night. Bye Ralf |
|
but you will need to also trace the sd ... beware that tracefile can grow quick large ... |
|
Thanks, the sd is now also in debug mode :-) | |
Hi Bruno, this night we got also the error and debug was enabled :) I hope it helps to find the problem. I changed the hostnames. I attached the trace logs. The log from the client-fd is very large and does not contain any data that will help us. Thanks Ralf The jobid was 9689. Here the details: 06-Feb 04:00 bareos-dir JobId 9689: Start Backup JobId 9689, Job=client01.renamed.2024-02-06_04.00.01_09 06-Feb 04:00 bareos-dir JobId 9689: Connected Storage daemon at storageserver.renamed:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 06-Feb 04:00 bareos-dir JobId 9689: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 06-Feb 04:00 bareos-dir JobId 9689: Probing client protocol... (result will be saved until config reload) 06-Feb 04:00 bareos-dir JobId 9689: Connected Client: client01.renamed at client01.renamed:9102, encryption: PSK-AES256-CBC-SHA TLSv1.2 06-Feb 04:00 bareos-dir JobId 9689: Handshake: Immediate TLS 06-Feb 04:00 bareos-dir JobId 9689: Encryption: PSK-AES256-CBC-SHA TLSv1.2 06-Feb 04:00 bareos-dir JobId 9689: Sending Accurate information. 06-Feb 04:05 bareos-dir JobId 9689: Created new Volume "aincr30d-bsd02-3253" in catalog. 06-Feb 04:05 bareos-dir JobId 9689: Using Device "bstore02-vd03" to write. 06-Feb 04:05 client01.renamed JobId 9689: Connected Storage daemon at storageserver.renamed:9103, encryption: PSK-AES256-CBC-SHA TLSv1.2 06-Feb 04:05 client01.renamed JobId 9689: Encryption: PSK-AES256-CBC-SHA TLSv1.2 06-Feb 04:05 client01.renamed JobId 9689: Extended attribute support is enabled 06-Feb 04:05 bareos-sd JobId 9689: Labeled new Volume "aincr30d-bsd02-3253" on device "bstore02-vd03" (/bstore02). 06-Feb 04:05 client01.renamed JobId 9689: ACL support is enabled 06-Feb 04:05 bareos-sd JobId 9689: Wrote label to prelabeled Volume "aincr30d-bsd02-3253" on device "bstore02-vd03" (/bstore02) 06-Feb 04:05 bareos-dir JobId 9689: Max Volume jobs=1 exceeded. Marking Volume "aincr30d-bsd02-3253" as Used. 06-Feb 04:05 bareos-sd JobId 9689: Error: stored/device.cc:222 Error getting Volume info: 1998 Volume "aincr30d-bsd02-3253" catalog status is Used, but should be Append, Purged or Recycle. 06-Feb 04:30 bareos-sd JobId 9689: Releasing device "bstore02-vd03" (/bstore02). 06-Feb 04:30 bareos-sd JobId 9689: Elapsed time=00:24:26, Transfer rate=39.40 K Bytes/second 06-Feb 04:30 bareos-dir JobId 9689: Insert of attributes batch table with 321 entries start 06-Feb 04:30 bareos-dir JobId 9689: Insert of attributes batch table done 06-Feb 04:30 bareos-dir JobId 9689: Bareos bareos-dir 23.0.2~pre6.5fbc90f69 (23Jan24): Build OS: Debian GNU/Linux 11 (bullseye) JobId: 9689 Job: client01.renamed.2024-02-06_04.00.01_09 Backup Level: Incremental, since=2024-02-05 04:00:18 Client: "client01.renamed" 23.0.2~pre32.0a0e55739 (31Jan24) Red Hat Enterprise Linux Server release 7.9 (Maipo),redhat FileSet: "client01.renamed" 2024-01-20 04:00:00 Pool: "aincr30d-bsd02" (From Job resource) Catalog: "MyCatalog" (From Client resource) Storage: "bsd02" (From Pool resource) Scheduled time: 06-Feb-2024 04:00:01 Start time: 06-Feb-2024 04:00:16 End time: 06-Feb-2024 04:30:13 Elapsed time: 29 mins 57 secs Priority: 10 Allow Mixed Priority: no FD Files Written: 321 SD Files Written: 321 FD Bytes Written: 57,709,242 (57.70 MB) SD Bytes Written: 57,764,329 (57.76 MB) Quota Used: 720,602,415,037 (720.6 GB) Burst Quota: 0 (0 B) Soft Quota: 1,073,741,824,000 (1.073 TB) Hard Quota: 1,288,490,188,800 (1.288 TB) Grace Expiry Date: Soft Quota not exceeded Rate: 32.1 KB/s Software Compression: 46.4 % (lz4) VSS: no Encryption: no Accurate: yes Volume name(s): aincr30d-bsd02-3253 Volume Session Id: 583 Volume Session Time: 1704452670 Last Volume Bytes: 57,784,439 (57.78 MB) Non-fatal FD errors: 0 SD Errors: 1 FD termination status: OK SD termination status: OK Bareos binary info: Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com Job triggered by: Scheduler Termination: Backup OK -- with warnings |
|
Thanks could you also add the output of llist volume=aincr180d-bsd02-2334 llist volume=aincr30d-bsd02-3243 llist volume=aincr30d-bsd02-3249 llist volume=incr30d-3250 If you are using any script for secure erase command thanks to also give the maximum of information. |
|
Thanks Bruno. Here are the output: *llist volume=aincr180d-bsd02-2334 No results to list. *llist volume=aincr30d-bsd02-3243 mediaid: 3,243 volumename: aincr30d-bsd02-3243 slot: 0 poolid: 17 pool: aincr30d-bsd02 mediatype: File02 firstwritten: 2024-02-05 22:00:03 lastwritten: 2024-02-06 08:04:56 labeldate: 2024-02-05 22:00:03 voljobs: 1 volfiles: 0 volblocks: 1,092 volmounts: 1 volbytes: 1,145,006,249 volerrors: 0 volwrites: 1,093 volcapacitybytes: 0 volstatus: Used enabled: 1 recycle: 1 volretention: 5,616,000 voluseduration: 82,800 maxvoljobs: 1 maxvolfiles: 0 maxvolbytes: 5,368,709,120 inchanger: 0 endfile: 0 endblock: 1,145,006,248 labeltype: 0 storageid: 3 deviceid: 0 locationid: 0 recyclecount: 0 initialwrite: scratchpoolid: 0 scratchpool: recyclepoolid: 0 recyclepool: comment: storage: bsd02 *llist volume=aincr30d-bsd02-3249 mediaid: 3,249 volumename: aincr30d-bsd02-3249 slot: 0 poolid: 17 pool: aincr30d-bsd02 mediatype: File02 firstwritten: 2024-02-06 04:00:02 lastwritten: 2024-02-06 04:03:18 labeldate: 2024-02-06 04:00:02 voljobs: 1 volfiles: 1 volblocks: 5,119 volmounts: 1 volbytes: 5,367,660,784 volerrors: 0 volwrites: 5,120 volcapacitybytes: 0 volstatus: Full enabled: 1 recycle: 1 volretention: 5,616,000 voluseduration: 82,800 maxvoljobs: 1 maxvolfiles: 0 maxvolbytes: 5,368,709,120 inchanger: 0 endfile: 1 endblock: 1,072,693,487 labeltype: 0 storageid: 3 deviceid: 0 locationid: 0 recyclecount: 0 initialwrite: scratchpoolid: 0 scratchpool: recyclepoolid: 0 recyclepool: comment: storage: bsd02 *llist volume=incr30d-3250 mediaid: 3,250 volumename: incr30d-3250 slot: 0 poolid: 6 pool: incr30d mediatype: File02 firstwritten: 2024-02-06 04:00:03 lastwritten: 2024-02-06 04:41:15 labeldate: 2024-02-06 04:00:03 voljobs: 1 volfiles: 0 volblocks: 447 volmounts: 1 volbytes: 468,619,261 volerrors: 0 volwrites: 448 volcapacitybytes: 0 volstatus: Used enabled: 1 recycle: 1 volretention: 2,592,000 voluseduration: 82,800 maxvoljobs: 1 maxvolfiles: 0 maxvolbytes: 5,368,709,120 inchanger: 0 endfile: 0 endblock: 468,619,260 labeltype: 0 storageid: 3 deviceid: 0 locationid: 0 recyclecount: 0 initialwrite: scratchpoolid: 0 scratchpool: recyclepoolid: 0 recyclepool: comment: storage: bsd02 ###### In time we use no erase script. But if we want to get more disc space we use this select to identify deletion candidates: psql -d bareos -h localhost -U $dbUser -c "SELECT m.VolumeName FROM Media m where m.VolStatus not in ('Append','Purged') and not exists (select 1 from JobMedia jm where jm.MediaId=m.MediaId);" In this case we run the command "delete volume=$volName yes" and delete the volume-files on storage. Bye Ralf |
|
While using sql command I would really advise to use the sqlquery module of bconsole bconsole <<< "sqlquery SELECT m.VolumeName FROM Media m where m.VolStatus not in ('Append','Purged') and not exists (select 1 from JobMedia jm where jm.MediaId=m.MediaId); " So if I understand, you try to "improve" the volume pruning and volume reservation. What happen if you stop doing this? Is the error still reproducible? |
|
Thanks for the answer Bruno. I haven't used this procedure for 3 weeks. In my opinion it has no influence on it. Now with the error, new volumes are created directly and these are then evaluated incorrectly. I currently have this effect with about 3 backups every night. Other systems run without problems. Bye Ralf |
|
Hi Bruno, one update to the issue. I think it is a overload effect on our storage. It happens only in a few cases, for example when updates are distributed across many systems. Possibly there should be slightly more timeout in Bareos for the evaluation, but I don't know the code. Thanks Ralf |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2024-01-19 16:17 | raschu | New Issue | |
2024-01-23 13:48 | bruno-at-bareos | Note Added: 0005712 | |
2024-01-24 18:02 | raschu | Note Added: 0005717 | |
2024-01-25 08:59 | bruno-at-bareos | Note Added: 0005719 | |
2024-01-25 10:01 | raschu | Note Added: 0005721 | |
2024-01-25 10:08 | raschu | Note Edited: 0005721 | |
2024-01-25 13:25 | bruno-at-bareos | Note Added: 0005722 | |
2024-02-05 10:35 | raschu | Note Added: 0005727 | |
2024-02-05 10:50 | bruno-at-bareos | Note Added: 0005728 | |
2024-02-05 11:04 | raschu | Note Added: 0005729 | |
2024-02-05 11:23 | bruno-at-bareos | Note Added: 0005730 | |
2024-02-05 11:40 | raschu | Note Added: 0005731 | |
2024-02-05 13:52 | bruno-at-bareos | Note Added: 0005732 | |
2024-02-05 15:48 | raschu | Note Added: 0005733 | |
2024-02-06 15:39 | raschu | Note Added: 0005739 | |
2024-02-06 15:39 | raschu | File Added: tracefile.zip | |
2024-02-06 17:41 | bruno-at-bareos | Note Added: 0005740 | |
2024-02-07 15:37 | raschu | Note Added: 0005741 | |
2024-02-08 09:19 | bruno-at-bareos | Note Added: 0005745 | |
2024-02-08 10:12 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2024-02-08 10:12 | bruno-at-bareos | Status | new => assigned |
2024-02-08 17:19 | raschu | Note Added: 0005752 | |
2024-05-17 09:39 | raschu | Note Added: 0005945 |