View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001092 | bareos-core | file daemon | public | 2019-06-18 21:57 | 2023-08-23 13:55 |
Reporter | Cookiefamily | Assigned To | bruno-at-bareos | ||
Priority | normal | Severity | crash | Reproducibility | always |
Status | closed | Resolution | no change required | ||
OS | Ubuntu | OS Version | 18.04.2 LTS | ||
Product Version | 18.2.5 | ||||
Summary | 0001092: Out of memory when recovering with ReaR | ||||
Description | When recovering a Test-VM which I intentionally broke to recover it with Bareos and ReaR (first time testing the recovery process), the OOM-Killer kills the bareos-fd as well as other services and the restore aborts. See the attachements for a screenshot of the KVM when the OOM-Killer went through. Also attached is the output of the bareos-dir which obviously errors because the connection dies. What might be interesting is the line "Error: lib/bsock_tcp.cc:417 Wrote 35 bytes to File Daemon:IP:9102, but only 0 accepted.", It seems like it doesn't even start copying files and locks up before that. I can restore files just fine, when reinstalling the machine from scratch and doing a restore that way. | ||||
Steps To Reproduce | Client: 1. Boot from ReaR-ISO, choose "Recover test01" 2. login as root 3. execute rear recover 4. enter yes for automatic disk layout configuration When "waiting for job to start" appears do the following on the bareos-director: 1. restore client=test01.FQDN.com-fd 2. select most recent backup 3. mark * Verify the settings: JobName: test01.FQDN.com-restore Bootstrap: /var/lib/bareos/bareos-dir.restore.5.bsr Where: / Replace: Always FileSet: UnixDefault Backup Client: test01.FQDN.com-fd Restore Client: test01.FQDN.com-fd Format: Native Storage: File When: 2019-06-09 19:12:24 Catalog: MyCatalog Priority: 1 Plugin Options: *None* Wait 0000001:0000020 seconds and the OOM Killer does its job. This works 100% of the time. | ||||
Additional Information | The VM is a KVM running on a Xeon E3-1270v6 with 1 core, 2gb of Ram (upped it to 4gb now because I ran into the OOM, issue persists) and a 20gb virtual disk managed via Proxmox. The Backup size is 2.8gb compressed. The ReaR team sees the error in the bareos file deamon https://github.com/rear/rear/issues/2158 . | ||||
Tags | crash, fd | ||||
bareos-dir-log.txt (3,831 bytes)
09-Jun 19:15 bareos-dir JobId 72: Start Restore Job test01.FQDN.com-restore.2019-06-09_19.15.54_51 09-Jun 19:15 bareos-dir JobId 72: Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 09-Jun 19:15 bareos-dir JobId 72: Using Device "FileStorage0" to read. 09-Jun 19:15 bareos-dir JobId 72: Connected Client: test01.FQDN.com-fd at IP:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 09-Jun 19:15 bareos-dir JobId 72: Handshake: Immediate TLS 09-Jun 19:15 hetznerstorage01-sd JobId 72: Connected File Daemon at IP:9102, encryption: TLS_CHACHA20_POLY1305_SHA256 09-Jun 19:16 hetznerstorage01-sd JobId 72: Ready to read from volume "test01.FQDN.com-hetznerstorage01-2w-20190603-1500-59" on device "FileStorage0" (/var/lib/bareos/storage). 09-Jun 19:16 hetznerstorage01-sd JobId 72: Forward spacing Volume "test01.FQDN.com-hetznerstorage01-2w-20190603-1500-59" to file:block 0:291. 09-Jun 19:16 test01-fd JobId 72: Error: File /var/lock already exists and could not be replaced. ERR=Is a directory. 09-Jun 19:16 test01-fd JobId 72: Warning: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success 09-Jun 19:16 test01-fd JobId 72: Warning: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success 09-Jun 19:16 test01-fd JobId 72: Error: File /var/run already exists and could not be replaced. ERR=Is a directory. 09-Jun 19:16 test01-fd JobId 72: Error: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success 09-Jun 19:16 bareos-dir JobId 72: Fatal error: Socket error on Store end command: ERR=No data available 09-Jun 19:16 hetznerstorage01-sd JobId 72: Error: lib/bsock_tcp.cc:417 Wrote 35 bytes to File Daemon:IP:9102, but only 0 accepted. 09-Jun 19:16 hetznerstorage01-sd JobId 72: Fatal error: stored/read.cc:147 Error sending to File daemon. ERR=Connection reset by peer 09-Jun 19:16 hetznerstorage01-sd JobId 72: Error: lib/bsock_tcp.cc:457 Socket has errors=1 on call to File Daemon:IP:9102 09-Jun 19:16 hetznerstorage01-sd JobId 72: Releasing device "FileStorage0" (/var/lib/bareos/storage). 09-Jun 19:16 bareos-dir JobId 72: Error: Bareos bareos-dir 18.2.5 (30Jan19): Build OS: Linux-4.4.92-6.18-default ubuntu Ubuntu 18.04 LTS JobId: 72 Job: test01.FQDN.com-restore.2019-06-09_19.15.54_51 Restore Client: test01.FQDN.com-fd Start time: 09-Jun-2019 19:15:56 End time: 09-Jun-2019 19:16:25 Elapsed time: 29 secs Files Expected: 165,117 Files Restored: 0 Bytes Restored: 0 Rate: 0.0 KB/s FD Errors: 1 FD termination status: SD termination status: Fatal Error Bareos binary info: bareos.org build: Get official binaries and vendor support on bareos.com Termination: *** Restore Error *** 09-Jun 19:16 bareos-dir JobId 72: Error: Bareos bareos-dir 18.2.5 (30Jan19): Build OS: Linux-4.4.92-6.18-default ubuntu Ubuntu 18.04 LTS JobId: 72 Job: test01.FQDN.com-restore.2019-06-09_19.15.54_51 Restore Client: test01.FQDN.com-fd Start time: 09-Jun-2019 19:15:56 End time: 09-Jun-2019 19:16:25 Elapsed time: 29 secs Files Expected: 165,117 Files Restored: 0 Bytes Restored: 0 Rate: 0.0 KB/s FD Errors: 2 FD termination status: SD termination status: Fatal Error Bareos binary info: bareos.org build: Get official binaries and vendor support on bareos.com Termination: *** Restore Error *** 09-Jun 19:16 bareos-dir: ERROR in dird/socket_server.cc:88 Connection request from client failed. |
|
backporting upstream comment: The where parameter is set to a ram mount, so the restore simply fill it up quickly. In case someone tumble upon this. I tried this today and got the same error. It turned out the default restore location Where is /tmp/bareos-restores, which on the rear recover console is in ram, thus you will get OOM pretty quick. You will need to specify the Where parameter and set it to /mnt/local which is the restoration destination disk. Hope it helps someone. not a bug. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2019-06-18 21:57 | Cookiefamily | New Issue | |
2019-06-18 21:57 | Cookiefamily | Tag Attached: crash | |
2019-06-18 21:57 | Cookiefamily | Tag Attached: fd | |
2019-06-18 21:57 | Cookiefamily | File Added: bareos-dir-log.txt | |
2019-06-18 21:57 | Cookiefamily | File Added: bareos-fd-oom.png | |
2023-08-23 13:55 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2023-08-23 13:55 | bruno-at-bareos | Status | new => closed |
2023-08-23 13:55 | bruno-at-bareos | Resolution | open => no change required |
2023-08-23 13:55 | bruno-at-bareos | Note Added: 0005347 |