View Issue Details

IDProjectCategoryView StatusLast Update
0000977bareos-corestorage daemonpublic2023-05-15 15:24
Reporterstevec Assigned Tobruno-at-bareos  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionreopened 
PlatformLinuxOSUbuntuOS Version16.04
Product Version17.2.6 
Summary0000977: Authorization key rejected by Storage Daemon upon upgrade from 16.2.6 to 17.2.6
DescriptionWas running 16.2.6 under ubuntu 16.04.4 self-contained (director; storage daemon and file daemon all on same system) for several months. Upgraded to 17.2.6; upgraded database format (sqlite3). No changes to configuration files or password keys.

When running the first batch of incremental backups usually one of them will run and write to tape. all others fail with "Fatal error: Authorization key rejected by Storage daemon ."

All backups use the same storage device; The one that seems to work is the first backup.
Steps To Reproducerun any backup.
Additional InformationOnly item that comes to mind as a difference between the working and non working backup under 17.2.6 is the length of time to do the backups. the one that works is the shortest length (minutes) backup the others take much longer. still they were working under 16.2.6.
TagsNo tags attached.

Activities

stevec

stevec

2018-06-30 12:10

reporter  

bareos-17.2.6.log (10,373 bytes)
stevec

stevec

2018-06-30 12:13

reporter   ~0003058

If there are other debug items that I can try to help pinpoint this please let me know.
stevec

stevec

2018-07-01 23:07

reporter   ~0003059

since I am still using monolithic config files I moved out the bareos-dir.d; bareos-fd.d and bareos-sd.d directories so as to force only reading the bareos-dir.conf; bareos-sd.conf and bareos-fd.conf files. This did not have any real effect (besides just removing the possibility the new auto genned files from 17.2.6 comming into play). From the log (attached) from the run 0000002 (bareos-17.2.6-02.log) you can see that more than one backup completed but got errors in authentication to the SD and got the errors:

01-Jul 11:55 loki-fd JobId 11413: Fatal error: Failed to authenticate Storage daemon.
01-Jul 11:55 loki-dir JobId 11414: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2902 Bad storage
stevec

stevec

2018-07-01 23:08

reporter  

bareos-17.2.6-02.log (25,603 bytes)
stevec

stevec

2018-07-01 23:15

reporter  

config.log (510,658 bytes)
stevec

stevec

2018-07-01 23:16

reporter  

config.out (3,506 bytes)
stevec

stevec

2018-07-03 02:30

reporter   ~0003060

It appears that the name lookup has changed between 16.2.x and 17.2.x. The error:

01-Jul 11:55 loki-fd JobId 11413: Fatal error: Failed to authenticate Storage daemon.
01-Jul 11:55 loki-dir JobId 11414: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2902 Bad storage


Can be avoided by coding the client & director (in my system; loki-dir; loki-fd; loki-sd; loki-mon) in /etc/hosts. This has not been needed from 12.x through 16.2 code but now seems a requirement in 17.2. going back to 16.2.6 I can remove the entries in /etc/host and it works without the error above.


Second item on 17.2.6; even with coding the entries in /etc/hosts to get past the error above I am seeing at LEAST a 12x decrease in speed for spooling data (probably much longer as had to cancel since they would never complete in a day). For example, incrementals on 16.2.6 & 16.2.7 take about 2 hours. Swapping over to the 17.2.6 code this has taken more than 16 hours and I barely see the spooling starting. bareos-dir is taking 100% cpu the entire time. wonder if this may be related to the database changes between 16.2 and 17.2 in relation to sqlite3 databases?

Falling back to 16.2.7 I'm back to a normal spooling times.
bruno-at-bareos

bruno-at-bareos

2023-03-23 16:43

manager   ~0004949

Is this still reproducible with current code (Bareos >21) and recent supported OS ?
stevec

stevec

2023-03-23 16:50

reporter   ~0004953

I could never get 17.2.x to work, I managed to get 19.2.13 to work which I am currently on. I cannot test/use anything newer than 19.2.x as Bareos dropped sqlite as a compile option which is *required* for my use case.
bruno-at-bareos

bruno-at-bareos

2023-04-06 15:17

manager   ~0004963

Can I ask what kind of use case you have that wouldn't be possible to use postgresql ?
bruno-at-bareos

bruno-at-bareos

2023-05-15 10:34

manager   ~0005054

no answer.
stevec

stevec

2023-05-15 14:50

reporter   ~0005058

Sorry, didn't see the notification on your previous request and just saw the closed notice.

Just to answer your question (ticket can be closed, this is just informational). sqlite3 is used as it allows the entire bareos system to be copied/backed up to an external/encrypted drive on rotating basis to offsite DR. That way we can very quickly load up an OS image and then just attach or copy over the small partition to the machine and be up and running in 30-45 mins from start of DR test. Otherwise it takes a lot longer and has proven to be more prone to errors which cause time delays. Since the backup server needs to be online as fast as it can (i.e. it's main dependency is the switch network and tape inventory (making sure all tapes arrived at DR)) any delays then put back any other recovery effort.

also in our case we're talking only about 20 million files or so so sqlite3 is very fast for what we need it for.
bruno-at-bareos

bruno-at-bareos

2023-05-15 15:24

manager   ~0005059

Thanks a lot for the report, helping to understand what would be not covered by PostgreSQL.

To be valid, your copy happen certainly with all daemons stopped, which can also be done with PostgreSQL data. Then you should be able to use the same process on your DR.
even if it will spend a few more seconds to start PostgreSQL.

Our recommendation would still be to have the last sql dump directly at hand or at least the last BackupCatalog bsr to quickly extract from media the dump.
(BTW other tools like pg_basebackup or also pgbackrest can achieve consistant backup and quick restore.)

Exporting regulary the dump + bareos configuration should allow you to have an almost ready to use DR.

Regards

Issue History

Date Modified Username Field Change
2018-06-30 12:10 stevec New Issue
2018-06-30 12:10 stevec File Added: bareos-17.2.6.log
2018-06-30 12:13 stevec Note Added: 0003058
2018-07-01 23:07 stevec Note Added: 0003059
2018-07-01 23:08 stevec File Added: bareos-17.2.6-02.log
2018-07-01 23:15 stevec File Added: config.log
2018-07-01 23:16 stevec File Added: config.out
2018-07-03 02:30 stevec Note Added: 0003060
2023-03-23 16:43 bruno-at-bareos Assigned To => bruno-at-bareos
2023-03-23 16:43 bruno-at-bareos Status new => feedback
2023-03-23 16:43 bruno-at-bareos Note Added: 0004949
2023-03-23 16:50 stevec Note Added: 0004953
2023-03-23 16:50 stevec Status feedback => assigned
2023-04-06 15:17 bruno-at-bareos Note Added: 0004963
2023-05-15 10:34 bruno-at-bareos Status assigned => closed
2023-05-15 10:34 bruno-at-bareos Resolution open => no change required
2023-05-15 10:34 bruno-at-bareos Note Added: 0005054
2023-05-15 14:50 stevec Status closed => new
2023-05-15 14:50 stevec Resolution no change required => reopened
2023-05-15 14:50 stevec Note Added: 0005058
2023-05-15 15:24 bruno-at-bareos Status new => closed
2023-05-15 15:24 bruno-at-bareos Note Added: 0005059