Bareos Bug Tracker
Bareos Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001019bareos-core[All Projects] directorpublic2018-10-09 15:092018-12-11 11:20
Reporterwizhippo 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusnewResolutionopen 
Platformx86OSUbuntuOS Version18.04
Product Version18.2.4-rc1 
Target VersionFixed in Version 
Summary0001019: Director hangs waiting for client if not available PSK
DescriptionUsing TLS Psk Require = yes if a client is offline the director hangs waiting with:

delllt.2018-10-07_22.00.00_24 is waiting for Client to connect (Client Initiated Connection)

The log in bareos gui shows:

2018-10-07 22:06:47 kamino-dir JobId 1337:
Try to establish a secure connection by immediate TLS handshake:
2018-10-07 22:06:47 kamino-dir JobId 1337: Fatal error: Failed to connect to client "delllt-fd".
2018-10-07 22:06:35 kamino-dir JobId 1337: Fatal error: lib/bsock_tcp.cc:139 Unable to connect to Client: delllt-fd on delllt:9102. ERR=No route to host
2018-10-07 22:03:39 kamino-dir JobId 1337: Warning: lib/bsock_tcp.cc:133 Could not connect to Client: delllt-fd on delllt:9102. ERR=No route to host
Retrying ...
2018-10-07 22:03:23 kamino-dir JobId 1337: Using Device "FileDevice-1" to write.
2018-10-07 22:03:22 kamino-dir JobId 1337: Start Backup JobId 1337, Job=delllt.2018-10-07_22.00.00_24
2018-10-07 22:03:22 kamino-dir JobId 1337: Secure connection to Storage daemon at kamino:9103 with cipher ECDHE-PSK-CHACHA20-POLY1305 established

Should there not be a timeout waiting and the job should just fail?
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action
Attached Files

- Relationships

-  Notes
(0003133)
wizhippo (reporter)
2018-10-09 15:12
edited on: 2018-10-09 15:32

Trying to cancel hung job even though director shows it's running I get:

*status dir

 JobId Level Name Status
======================================================================
  1337 Full delllt.2018-10-07_22.00.00_24 is waiting for Client to connect (Client Initiated Connection)


*can
Select Job:
     1: JobId=1337 Job=delllt.2018-10-07_22.00.00_24
Choose Job to cancel (1-21): 1
3904 Job delllt.2018-10-07_22.00.00_24 not found.


Had to restart director.

(0003134)
wizhippo (reporter)
2018-10-09 16:08

Just to add Connection From Client To Director is not set and I'm not sure why there is a client initiated connection.
(0003148)
wizhippo (reporter)
2018-10-30 15:28

I can reproduce this when running a job with higher priority first against a host that is not online and then running the catalog backup afterwards.

The catalog backup never runs as the first jobs fails because the host is unavailable but remains as a running job on the director indefinitely even though failed. A restart of the director is required to remove the job as trying to delete the job in the console shows the job does not exist even though it shows it as running.
(0003158)
r0mulux (reporter)
2018-12-07 11:41
edited on: 2018-12-11 11:20

Hello, I have same issue.
Jobs seems to freeze if machine to backup is not reachable, and next scheduled jobs are never executed. Freezed jobs can not be deleted. Need to restart director each time.


- Issue History
Date Modified Username Field Change
2018-10-09 15:09 wizhippo New Issue
2018-10-09 15:12 wizhippo Note Added: 0003133
2018-10-09 15:32 wizhippo Note Edited: 0003133 View Revisions
2018-10-09 16:08 wizhippo Note Added: 0003134
2018-10-30 15:28 wizhippo Note Added: 0003148
2018-12-07 11:41 r0mulux Note Added: 0003158
2018-12-11 11:20 r0mulux Note Edited: 0003158 View Revisions


Copyright © 2000 - 2018 MantisBT Team
Powered by Mantis Bugtracker