View Issue Details

IDProjectCategoryView StatusLast Update
0001192bareos-core[All Projects] directorpublic2020-02-27 11:36
ReporterhostedpowerAssigned Toarogge 
PrioritynormalSeveritymajorReproducibilityrandom
Status feedbackResolutionopen 
PlatformLinuxOSDebianOS Version9
Product Version19.2.6 
Fixed in Version 
Summary0001192: Authorization key rejected by Storage daemon since upgrading director and storage daemons
DescriptionHello,

Since we upgraded all servers components to 19.2.6 (coming from 18.2.7) we get Authorization key rejected by Storage daemon again for random hosts.

There really isn't any reason since those hosts never changed since very long time. Also they were working for months with 18.2.x and has 0 issues. Very strange to see this error. It's only for about 5-10% of the hosts.

Example:

web.xxxx.com JobId 9: Fatal error: Authorization key rejected by Storage daemon .
Please see http://doc.bareos.org/master/html/bareos-manual-main-reference.html#AuthorizationErrors for help.
2020-02-12 22:27:47 hostedpower-dir JobId 9: TLS negotiation failed (while probing client protocol)
2020-02-12 22:27:47 hostedpower-dir JobId 9: Connected Client: web.xxxx.com at web.xxxx.com:9102, encryption: None

I think it are all some older ones that don't support encryption, but on the other hand we have others not supporting encryption which work 100% fine.
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Relationships

has duplicate 0001201 closedarogge Fallback to unencrypted communication with older bareos client 14.2.1 sometimes fails 

Activities

ojehle

ojehle

2020-02-13 07:15

reporter   ~0003803

Last edited: 2020-02-13 07:29

View 3 revisions

I get the same error without tls.. i have 10 jobs on the same Windows FD, connecting to the Same Linux SD 2 of the jobs are failed. restarting the jobs will work after.. looks a temporary error on SD, i try with delay between jobs, but not working.. i increase now the timeout between the jobs from 30 secs to 60 secs

Upgraded from 18.2.6 -> 19.2.4 (windows FD)
Upgraded Server 18.2.7 -> 19.2.5 (Linux SD/Director)

compiling now 19.2.6 and install it....



2020-02-12 22:31:47 linux-dir JobId 13446: Start Backup JobId 13446, Job=BackupWindows.2020-02-12_22.31.45_43
2020-02-12 22:31:47 linux-dir JobId 13446: Connected Storage daemon at arena8:9103, encryption: TLS_CHACHA20_POLY1305_SHA256
2020-02-12 22:31:47 linux-dir JobId 13446: Using Device "WindowsRemote" to write.
2020-02-12 22:31:47 linux-dir JobId 13446: Connected Client: windows-fd at windows:9102, encryption: None
2020-02-12 22:31:47 linux-dir JobId 13446: Handshake: Cleartext
2020-02-12 22:31:47 linux-dir JobId 13446: Encryption: None
2020-02-12 22:31:46 windows-fd JobId 13446: Fatal error: Authorization key rejected windows-fd.
2020-02-12 22:31:46 windows-fd JobId 13446: Fatal error: Failed to authenticate Storage daemon.

arogge

arogge

2020-02-13 10:04

developer   ~0003806

You can just install 19.2.6 from the repositories. Binaries have been released to the public.
buloosh

buloosh

2020-02-13 11:24

reporter   ~0003807

Last edited: 2020-02-13 11:40

View 2 revisions

Got the same problem on a 19.2.6 as well
running all daemons in debug mode i figured out, that problem relays server side, when director tries to connect storage daemon sometimes it asks for the tls handshake, but there is no one in request, so it fails
my setup:
server centos7 dir\sd(vers 19.2.6)
clients debian8-9\centos7(vers 17.2-19.2.6)

ojehle

ojehle

2020-02-13 11:29

reporter   ~0003808

i complied and installed 19.2.6 and the windows fd was 19.2.4 got the same error.. did not see errors in the SD Version 19.2.6 trace logs.. so i expect it's the windows fd release 19.2.4. will retry the testing with the 18.2.5 windows fd.... there i never got such errors...

will post the results...
ojehle

ojehle

2020-02-13 13:01

reporter   ~0003810

Last edited: 2020-02-13 13:02

View 2 revisions

Tested now:

with the downgraded fd on windows i get no errors anymore.. i started my jobs approx 25 times...

git 19.2.6 version director/storage ubuntu linux 18lts
windows fd 18.2.5 version (exe) from download

18.2.5 (30Jan19) Microsoft Windows Server 2012 Datacenter Edition (build 9200), 64-bit,Cross-compile,Win64


with the 19.2.4 version (exe) from download i got the error sometimes (in the same job sequence , (10 jobs) i got it always once)...

ojehle

ojehle

2020-02-13 13:13

reporter   ~0003811

test running now with windows fd 19.2.6 from download...
ojehle

ojehle

2020-02-13 14:10

reporter   ~0003812

with the windows 19.2.6 i got the same error...... after several backups.. ... the error occured, next backup works again...

will downgrade now again to windows fd to 18.2.5 and run the tests again....
ojehle

ojehle

2020-02-13 14:28

reporter   ~0003813

got it now also with the 18.2.5 fd .....


linux-dir JobId 13553: Error: Bareos linux-dir 19.2.6 (11Feb20):
Build OS: Linux-4.15.0-76-generic ubuntu Ubuntu 18.04.4 LTS
JobId: 13553
Job: BackupWindows.2020-02-13_14.20.05_21
Backup Level: Incremental, since=2020-02-13 14:15:16
Client: "windows-fd" 18.2.5 (30Jan19) Microsoft Windows Server 2012 Datacenter Edition (build 9200), 64-bit,Cross-compile,Win64
jkk

jkk

2020-02-18 14:54

reporter   ~0003826

We have pretty much the same problem after upgrading to Bareos 19.2.6.
Clients are mostly CentOS 7 using bacula-client, so unencrypted fd<->sd connections are used.
The failing clients seem to be random - rerunning the job eventually succeeds, sometimes after 5 tries.
This definitely looks like an issue in bareos-sd 19.2, maybe handshake/authentication-related.
It is pretty annoying, since every day around half of our scheduled backups fail.
arogge

arogge

2020-02-18 15:16

developer   ~0003827

Could somebody enable tracing on the components (at least debuglevel 100), reproduce the issue and attach the tracefiles here?
ojehle

ojehle

2020-02-18 16:51

reporter  

bareos-sd.trace (32,485 bytes)
ojehle

ojehle

2020-02-18 16:51

reporter   ~0003830

Last edited: 2020-02-18 20:57

View 2 revisions

could reproduce it installing a linux vm and windows vm, set a small job and on client

tls enable = no
tls require = no

i could reproduce it ever 5 or 6 run of the backup...



bareos-dir.trace (120,750 bytes)
backup_log.txt (2,461 bytes)
18-Feb 16:44 bareos-dir JobId 92: Start Backup JobId 92, Job=backup-win-fd.2020-02-18_16.44.07_30
18-Feb 16:44 bareos-dir JobId 92: Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256
18-Feb 16:44 bareos-dir JobId 92: Using Device "FileStorage" to write.
18-Feb 16:44 bareos-dir JobId 92: Connected Client: win10_1910-fd at 192.168.56.102:9102, encryption: None
18-Feb 16:44 bareos-dir JobId 92:  Handshake: Cleartext 
18-Feb 16:44 bareos-dir JobId 92:  Encryption: None
18-Feb 16:44 win10_1910-fd JobId 92: Created 27 wildcard excludes from FilesNotToBackup Registry key
18-Feb 16:44 win10_1910-fd JobId 92: Fatal error: Authorization key rejected win10_1910-fd.
18-Feb 16:44 win10_1910-fd JobId 92: Fatal error: Failed to authenticate Storage daemon.
18-Feb 16:44 bareos-dir JobId 92: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2902 Bad storage

18-Feb 16:44 bareos-dir JobId 92: Error: Bareos bareos-dir 19.2.6 (11Feb20):
  Build OS:               Linux-4.15.0-76-generic ubuntu Ubuntu 18.04.4 LTS
  JobId:                  92
  Job:                    backup-win-fd.2020-02-18_16.44.07_30
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "win10_1910-fd" 19.2.6 (11Feb20) Microsoft Windows 8 Professional (build 9200), 64-bit,Cross-compile,Win64
  FileSet:                "Win" 2020-02-18 16:07:08
  Pool:                   "Full" (From Job FullPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         18-Feb-2020 16:44:07
  Start time:             18-Feb-2020 16:44:09
  End time:               18-Feb-2020 16:44:16
  Elapsed time:           7 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    yes
  Encryption:             no
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      12
  Volume Session Time:    1582040123
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    2
  SD Errors:              0
  FD termination status:  Fatal Error
  SD termination status:  Waiting on FD
  Bareos binary info:     self-compiled: Get official binaries and vendor support on bareos.com
  Termination:            *** Backup Error ***

backup_log.txt (2,461 bytes)
win10_1910-fd.trace (238,161 bytes)
ojehle

ojehle

2020-02-18 17:29

reporter   ~0003831

it looks it's the

tls enable = no
tls require = no

after i removed this from client and director.. i could start 40 backups against the file daemon without any backup error. if i switch back to

tls enable = no
tls require = no

after the 5 backup, the error occurs...
arogge

arogge

2020-02-18 17:36

developer   ~0003832

Just to make sure i can reproduce this correctly: where do you set these? In the Client resource on the director and in the Director resource on the filedaemon?
ojehle

ojehle

2020-02-18 17:42

reporter   ~0003833

yes, i did it in the client resource of the director and on the client in the director and myself.conf
arogge

arogge

2020-02-19 09:19

developer   ~0003836

In my minimal test-case I cannot make it fail (yet). Can you reproduce the problem when dir, fd and sd are on the same machine, or do you need networking in between?
ojehle

ojehle

2020-02-19 09:30

reporter   ~0003837

i had linux vm (DIR,SD) and a windows vm (FD)

yesterday i run my test "simple backup" until error. for approx 200 successfull backups with 19.2.6 on my test setup.

After the successfull test, i switched our productive system again from 18.6 to 19.2.6 without the tls_enable = no everywhere and its working since then without any error.
arogge

arogge

2020-02-19 09:44

developer   ~0003838

It looks like sometimes when a non-TLS client connects to the SD the authentication fails. I'm just not sure how this can fail only sometimes.
ojehle

ojehle

2020-02-19 09:48

reporter   ~0003839

it's really like this.. if you set "tls enable=no" after starting 5-6 times the same job you get the error. The Same 2 VM (virtual box), same resources, only thing running on my laptop.... .not touching anything...

timing issue? if you cannot get it with localhost?
arogge

arogge

2020-02-19 11:41

developer   ~0003841

Seems like a restart of the SD between backups works around the issue. I made the test not do this anymore and now I can reproduce the issue.

Thanks for taking the time to reproduce this. I'll keep you updated as soon as I know anything new.
arogge

arogge

2020-02-21 20:14

developer   ~0003850

When debugging the SD with debuglevel 200 or above, you should see "lib/try_tls_handshake_as_a_server.cc:77-0 Connection to %s will be denied due to configuration mismatch". This message occurs when the connection does not satisfy the TlsPolicy.
The TlsPolicy that is checked should be set by ConfiguredTlsPolicyGetter::GetConfiguredTlsPolicyFromCleartextHello(). Seems like there are cases where it isn't set correctly or maybe the value gets corrupted.
arogge

arogge

2020-02-25 13:38

developer   ~0003855

Seems like the value was never initialized in the first place. We now have a proposed fix for this.
arogge

arogge

2020-02-26 18:23

developer   ~0003866

We have packages with the proposed fix available at https://download.bareos.org/bareos/experimental/CD/PR-431/
Would anyone be willing to test with these? The change is only in the SD, so upgrading that is sufficient.
Thank you!
ojehle

ojehle

2020-02-26 18:48

reporter   ~0003868

will test it tomorrow with the packages. is the fix already in the master branch of git?
ojehle

ojehle

2020-02-27 08:26

reporter   ~0003872

Tests successfull
Tests run successfull


run 80 jobs tls enable = no -> no errros
run 40 jobs tls enabled -> no errors


thanks for fixing....
arogge

arogge

2020-02-27 11:36

developer   ~0003873

Thanks for the feedback. I'll merge the patch into master and backport it to 19.2 and 18.2.

Issue History

Date Modified Username Field Change
2020-02-12 22:45 hostedpower New Issue
2020-02-13 07:15 ojehle Note Added: 0003803
2020-02-13 07:17 ojehle Note Edited: 0003803 View Revisions
2020-02-13 07:29 ojehle Note Edited: 0003803 View Revisions
2020-02-13 10:04 arogge Note Added: 0003806
2020-02-13 11:24 buloosh Note Added: 0003807
2020-02-13 11:29 ojehle Note Added: 0003808
2020-02-13 11:40 buloosh Note Edited: 0003807 View Revisions
2020-02-13 13:01 ojehle Note Added: 0003810
2020-02-13 13:02 ojehle Note Edited: 0003810 View Revisions
2020-02-13 13:13 ojehle Note Added: 0003811
2020-02-13 14:10 ojehle Note Added: 0003812
2020-02-13 14:28 ojehle Note Added: 0003813
2020-02-18 14:54 jkk Note Added: 0003826
2020-02-18 15:16 arogge Note Added: 0003827
2020-02-18 16:51 ojehle File Added: bareos-sd.trace
2020-02-18 16:51 ojehle File Added: bareos-dir.trace
2020-02-18 16:51 ojehle File Added: backup_log.txt
2020-02-18 16:51 ojehle File Added: win10_1910-fd.trace
2020-02-18 16:51 ojehle Note Added: 0003830
2020-02-18 17:29 ojehle Note Added: 0003831
2020-02-18 17:36 arogge Note Added: 0003832
2020-02-18 17:42 ojehle Note Added: 0003833
2020-02-18 20:57 ojehle Note Edited: 0003830 View Revisions
2020-02-19 09:19 arogge Note Added: 0003836
2020-02-19 09:30 ojehle Note Added: 0003837
2020-02-19 09:44 arogge Note Added: 0003838
2020-02-19 09:48 ojehle Note Added: 0003839
2020-02-19 11:41 arogge Note Added: 0003841
2020-02-19 11:41 arogge Assigned To => arogge
2020-02-19 11:41 arogge Status new => confirmed
2020-02-20 14:00 arogge Relationship added has duplicate 0001201
2020-02-21 20:14 arogge Note Added: 0003850
2020-02-25 13:38 arogge Note Added: 0003855
2020-02-26 18:23 arogge Note Added: 0003866
2020-02-26 18:24 arogge Status confirmed => feedback
2020-02-26 18:48 ojehle Note Added: 0003868
2020-02-27 08:26 ojehle Note Added: 0003872
2020-02-27 11:36 arogge Note Added: 0003873