View Issue Details

IDProjectCategoryView StatusLast Update
0001102bareos-corefile daemonpublic2023-08-31 09:45
Reportertwdragon Assigned Tobruno-at-bareos  
PrioritynormalSeveritymajorReproducibilitysometimes
Status closedResolutionunable to reproduce 
Platformx86_64OSUbuntuOS Version18.04 LTS
Product Version18.2.6 
Summary0001102: FD hangs up when connected by OpenVPN channel
DescriptionThis problem was encountered in backup infrastructure based on VMWare-based VDS equipped with Ubuntu 18.04 LTS (core 4.18.0-25-generic) and Debian 9 hardware-based storage server with RAID6 disk-based DSS. On both machines the same Bareos version is installed from community builds repository. The machines are interconnected using OpenVPN (it is necessary because VDS is located within DMZ). The backups are collected from remote VDS to local storage server.

If in any config file the 'Maximum Network Buffer Size' option encounters, then remote File Daemon located on VDS randomly hangs up completely. After that the strange behaviour emerges: the File Daemon keeps the running job in its job list for unlimited time even independently from 'cancel' commands issued by Director. There is no information written about this situation into logs, backtraces etc. In syslog and kernel logs there are no error reports too.

After long experiments it was cleared that deletion of 'Maximum Network Buffer Size' option from any config file used by Bareos turns a workaround to this problem. I think this problem should be considered as compatibility issue.
Steps To Reproduce- Interconnect machines using OpenVPN channel.
- Limit networks addresses for Bareos daemons to VPN intranet.
- Set 'Maximum Network Buffer Size' to any value.
Tagsconfig, fd, network, vpn

Activities

arogge

arogge

2019-07-22 10:17

manager   ~0003513

I'm curious: why did you configure Maximum Network Buffer Size at all? And to what value did you set it? The default is usually fine.
twdragon

twdragon

2019-07-22 10:53

reporter   ~0003514

Last edited: 2019-07-22 10:55

@arogge the previous backup system we used was Bacula 7.7.4 and it did not work without Maximum Network Buffer Size parameter set on both sides of VPN channel to 32768. Without it the channel was blocked when Bacula runs until OpenVPN daemon was restarted. Literally, this error was the thing that forced us to migrate to Bareos (Bacula vendors deleted old versions from their repository but from version 9.0 Bacula produces the same FD error as the one we are discussing on here). I was curious too because FD hanging up revealed without any updates and observable preconditions. After migration we have seen that the true source of hanging up is File daemon (Bacula produced Storage daemon errors, but it was visible that in fact the remote File daemon hangs up). In prolonged tests, turning parameters off one by one, we discovered Maximum Network Buffer Size as source of the problem.

arogge

arogge

2019-07-22 11:00

manager   ~0003515

thank you very much for the insight.
arogge

arogge

2019-07-22 12:59

manager   ~0003517

Do you have any suggestion what we can do to improve the situation?
I can only imagine better documentation informing that setting this parameter might break things.
twdragon

twdragon

2019-07-26 12:21

reporter   ~0003523

I tried to dig for deep reasons of such behaviour but did not make it successful till now. The overview of situation: when `Maximum Network Buffer Size` is set, the connection over OpenVPN established by Bareos File Daemon produces the jumbo packets with enormous (e.a. 80.6 MiB is the highest value I discovered in tests) length at random times. After that such packet is lost over VPN connection. The File daemon do not receive answer for this packet and then it hangs up in a state looking like it can not completely read the random file. The jumbo packet anomaly is discovered always after successful transporting of large (1.2 - 2.6 GiB) amount of data. I think it can be both Bacula/Bareos and OpenVPN compatibility issue. Now it could be good choice to insert the statement about this issue in documentation, I think. But I will attempt to discover the true reason.
bruno-at-bareos

bruno-at-bareos

2023-07-19 10:49

manager   ~0005227

Ping did you make any progress on discovering the root cause ?
bruno-at-bareos

bruno-at-bareos

2023-08-31 09:45

manager   ~0005355

No feedback on new version, if still reproducible please open a new report.

Issue History

Date Modified Username Field Change
2019-07-20 14:13 twdragon New Issue
2019-07-20 14:13 twdragon Tag Attached: config
2019-07-20 14:13 twdragon Tag Attached: fd
2019-07-20 14:13 twdragon Tag Attached: network
2019-07-20 14:13 twdragon Tag Attached: vpn
2019-07-22 10:17 arogge Status new => feedback
2019-07-22 10:17 arogge Note Added: 0003513
2019-07-22 10:53 twdragon Note Added: 0003514
2019-07-22 10:53 twdragon Status feedback => new
2019-07-22 10:55 twdragon Note Edited: 0003514
2019-07-22 11:00 arogge Note Added: 0003515
2019-07-22 12:59 arogge Status new => feedback
2019-07-22 12:59 arogge Note Added: 0003517
2019-07-26 12:21 twdragon Note Added: 0003523
2019-07-26 12:21 twdragon Status feedback => new
2023-07-19 10:49 bruno-at-bareos Assigned To => bruno-at-bareos
2023-07-19 10:49 bruno-at-bareos Status new => feedback
2023-07-19 10:49 bruno-at-bareos Note Added: 0005227
2023-08-31 09:45 bruno-at-bareos Status feedback => closed
2023-08-31 09:45 bruno-at-bareos Resolution open => unable to reproduce
2023-08-31 09:45 bruno-at-bareos Note Added: 0005355