View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001102 | bareos-core | file daemon | public | 2019-07-20 14:13 | 2023-08-31 09:45 |
Reporter | twdragon | Assigned To | bruno-at-bareos | ||
Priority | normal | Severity | major | Reproducibility | sometimes |
Status | closed | Resolution | unable to reproduce | ||
Platform | x86_64 | OS | Ubuntu | OS Version | 18.04 LTS |
Product Version | 18.2.6 | ||||
Summary | 0001102: FD hangs up when connected by OpenVPN channel | ||||
Description | This problem was encountered in backup infrastructure based on VMWare-based VDS equipped with Ubuntu 18.04 LTS (core 4.18.0-25-generic) and Debian 9 hardware-based storage server with RAID6 disk-based DSS. On both machines the same Bareos version is installed from community builds repository. The machines are interconnected using OpenVPN (it is necessary because VDS is located within DMZ). The backups are collected from remote VDS to local storage server. If in any config file the 'Maximum Network Buffer Size' option encounters, then remote File Daemon located on VDS randomly hangs up completely. After that the strange behaviour emerges: the File Daemon keeps the running job in its job list for unlimited time even independently from 'cancel' commands issued by Director. There is no information written about this situation into logs, backtraces etc. In syslog and kernel logs there are no error reports too. After long experiments it was cleared that deletion of 'Maximum Network Buffer Size' option from any config file used by Bareos turns a workaround to this problem. I think this problem should be considered as compatibility issue. | ||||
Steps To Reproduce | - Interconnect machines using OpenVPN channel. - Limit networks addresses for Bareos daemons to VPN intranet. - Set 'Maximum Network Buffer Size' to any value. | ||||
Tags | config, fd, network, vpn | ||||
I'm curious: why did you configure Maximum Network Buffer Size at all? And to what value did you set it? The default is usually fine. |
|
@arogge the previous backup system we used was Bacula 7.7.4 and it did not work without Maximum Network Buffer Size parameter set on both sides of VPN channel to 32768. Without it the channel was blocked when Bacula runs until OpenVPN daemon was restarted. Literally, this error was the thing that forced us to migrate to Bareos (Bacula vendors deleted old versions from their repository but from version 9.0 Bacula produces the same FD error as the one we are discussing on here). I was curious too because FD hanging up revealed without any updates and observable preconditions. After migration we have seen that the true source of hanging up is File daemon (Bacula produced Storage daemon errors, but it was visible that in fact the remote File daemon hangs up). In prolonged tests, turning parameters off one by one, we discovered Maximum Network Buffer Size as source of the problem. |
|
thank you very much for the insight. | |
Do you have any suggestion what we can do to improve the situation? I can only imagine better documentation informing that setting this parameter might break things. |
|
I tried to dig for deep reasons of such behaviour but did not make it successful till now. The overview of situation: when `Maximum Network Buffer Size` is set, the connection over OpenVPN established by Bareos File Daemon produces the jumbo packets with enormous (e.a. 80.6 MiB is the highest value I discovered in tests) length at random times. After that such packet is lost over VPN connection. The File daemon do not receive answer for this packet and then it hangs up in a state looking like it can not completely read the random file. The jumbo packet anomaly is discovered always after successful transporting of large (1.2 - 2.6 GiB) amount of data. I think it can be both Bacula/Bareos and OpenVPN compatibility issue. Now it could be good choice to insert the statement about this issue in documentation, I think. But I will attempt to discover the true reason. | |
Ping did you make any progress on discovering the root cause ? | |
No feedback on new version, if still reproducible please open a new report. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-07-20 14:13 | twdragon | New Issue | |
2019-07-20 14:13 | twdragon | Tag Attached: config | |
2019-07-20 14:13 | twdragon | Tag Attached: fd | |
2019-07-20 14:13 | twdragon | Tag Attached: network | |
2019-07-20 14:13 | twdragon | Tag Attached: vpn | |
2019-07-22 10:17 | arogge | Status | new => feedback |
2019-07-22 10:17 | arogge | Note Added: 0003513 | |
2019-07-22 10:53 | twdragon | Note Added: 0003514 | |
2019-07-22 10:53 | twdragon | Status | feedback => new |
2019-07-22 10:55 | twdragon | Note Edited: 0003514 | |
2019-07-22 11:00 | arogge | Note Added: 0003515 | |
2019-07-22 12:59 | arogge | Status | new => feedback |
2019-07-22 12:59 | arogge | Note Added: 0003517 | |
2019-07-26 12:21 | twdragon | Note Added: 0003523 | |
2019-07-26 12:21 | twdragon | Status | feedback => new |
2023-07-19 10:49 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2023-07-19 10:49 | bruno-at-bareos | Status | new => feedback |
2023-07-19 10:49 | bruno-at-bareos | Note Added: 0005227 | |
2023-08-31 09:45 | bruno-at-bareos | Status | feedback => closed |
2023-08-31 09:45 | bruno-at-bareos | Resolution | open => unable to reproduce |
2023-08-31 09:45 | bruno-at-bareos | Note Added: 0005355 |