View Issue Details

IDProjectCategoryView StatusLast Update
0001092bareos-corefile daemonpublic2023-08-23 13:55
ReporterCookiefamily Assigned Tobruno-at-bareos  
PrioritynormalSeveritycrashReproducibilityalways
Status closedResolutionno change required 
OSUbuntuOS Version18.04.2 LTS 
Product Version18.2.5 
Summary0001092: Out of memory when recovering with ReaR
DescriptionWhen recovering a Test-VM which I intentionally broke to recover it with Bareos and ReaR (first time testing the recovery process), the OOM-Killer kills the bareos-fd as well as other services and the restore aborts.
See the attachements for a screenshot of the KVM when the OOM-Killer went through.
Also attached is the output of the bareos-dir which obviously errors because the connection dies.
What might be interesting is the line "Error: lib/bsock_tcp.cc:417 Wrote 35 bytes to File Daemon:IP:9102, but only 0 accepted.", It seems like it doesn't even start copying files and locks up before that.

I can restore files just fine, when reinstalling the machine from scratch and doing a restore that way.
Steps To ReproduceClient:
1. Boot from ReaR-ISO, choose "Recover test01"
2. login as root
3. execute rear recover
4. enter yes for automatic disk layout configuration

When "waiting for job to start" appears do the following on the bareos-director:
1. restore client=test01.FQDN.com-fd
2. select most recent backup
3. mark *

Verify the settings:
JobName: test01.FQDN.com-restore
Bootstrap: /var/lib/bareos/bareos-dir.restore.5.bsr
Where: /
Replace: Always
FileSet: UnixDefault
Backup Client: test01.FQDN.com-fd
Restore Client: test01.FQDN.com-fd
Format: Native
Storage: File
When: 2019-06-09 19:12:24
Catalog: MyCatalog
Priority: 1
Plugin Options: *None*

Wait 0000001:0000020 seconds and the OOM Killer does its job. This works 100% of the time.
Additional InformationThe VM is a KVM running on a Xeon E3-1270v6 with 1 core, 2gb of Ram (upped it to 4gb now because I ran into the OOM, issue persists) and a 20gb virtual disk managed via Proxmox.
The Backup size is 2.8gb compressed.
The ReaR team sees the error in the bareos file deamon https://github.com/rear/rear/issues/2158 .
Tagscrash, fd

Activities

Cookiefamily

Cookiefamily

2019-06-18 21:57

reporter  

bareos-dir-log.txt (3,831 bytes)   
09-Jun 19:15 bareos-dir JobId 72: Start Restore Job test01.FQDN.com-restore.2019-06-09_19.15.54_51
09-Jun 19:15 bareos-dir JobId 72: Connected Storage daemon at bareos:9103, encryption: TLS_CHACHA20_POLY1305_SHA256
09-Jun 19:15 bareos-dir JobId 72: Using Device "FileStorage0" to read.
09-Jun 19:15 bareos-dir JobId 72: Connected Client: test01.FQDN.com-fd at IP:9102, encryption: TLS_CHACHA20_POLY1305_SHA256
09-Jun 19:15 bareos-dir JobId 72:  Handshake: Immediate TLS
09-Jun 19:15 hetznerstorage01-sd JobId 72: Connected File Daemon at IP:9102, encryption: TLS_CHACHA20_POLY1305_SHA256
09-Jun 19:16 hetznerstorage01-sd JobId 72: Ready to read from volume "test01.FQDN.com-hetznerstorage01-2w-20190603-1500-59" on device "FileStorage0" (/var/lib/bareos/storage).
09-Jun 19:16 hetznerstorage01-sd JobId 72: Forward spacing Volume "test01.FQDN.com-hetznerstorage01-2w-20190603-1500-59" to file:block 0:291.
09-Jun 19:16 test01-fd JobId 72: Error: File /var/lock already exists and could not be replaced. ERR=Is a directory.
09-Jun 19:16 test01-fd JobId 72: Warning: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success
09-Jun 19:16 test01-fd JobId 72: Warning: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success
09-Jun 19:16 test01-fd JobId 72: Error: File /var/run already exists and could not be replaced. ERR=Is a directory.
09-Jun 19:16 test01-fd JobId 72: Error: findlib/acl.cc:950 acl_from_text error on file "/var/log/journal/": ERR=Success
09-Jun 19:16 bareos-dir JobId 72: Fatal error: Socket error on Store end command: ERR=No data available
09-Jun 19:16 hetznerstorage01-sd JobId 72: Error: lib/bsock_tcp.cc:417 Wrote 35 bytes to File Daemon:IP:9102, but only 0 accepted.
09-Jun 19:16 hetznerstorage01-sd JobId 72: Fatal error: stored/read.cc:147 Error sending to File daemon. ERR=Connection reset by peer
09-Jun 19:16 hetznerstorage01-sd JobId 72: Error: lib/bsock_tcp.cc:457 Socket has errors=1 on call to File Daemon:IP:9102
09-Jun 19:16 hetznerstorage01-sd JobId 72: Releasing device "FileStorage0" (/var/lib/bareos/storage).
09-Jun 19:16 bareos-dir JobId 72: Error: Bareos bareos-dir 18.2.5 (30Jan19):
  Build OS:               Linux-4.4.92-6.18-default ubuntu Ubuntu 18.04 LTS
  JobId:                  72
  Job:                    test01.FQDN.com-restore.2019-06-09_19.15.54_51
  Restore Client:         test01.FQDN.com-fd
  Start time:             09-Jun-2019 19:15:56
  End time:               09-Jun-2019 19:16:25
  Elapsed time:           29 secs
  Files Expected:         165,117
  Files Restored:         0
  Bytes Restored:         0
  Rate:                   0.0 KB/s
  FD Errors:              1
  FD termination status:
  SD termination status:  Fatal Error
  Bareos binary info:     bareos.org build: Get official binaries and vendor support on bareos.com
  Termination:            *** Restore Error ***

09-Jun 19:16 bareos-dir JobId 72: Error: Bareos bareos-dir 18.2.5 (30Jan19):
  Build OS:               Linux-4.4.92-6.18-default ubuntu Ubuntu 18.04 LTS
  JobId:                  72
  Job:                    test01.FQDN.com-restore.2019-06-09_19.15.54_51
  Restore Client:         test01.FQDN.com-fd
  Start time:             09-Jun-2019 19:15:56
  End time:               09-Jun-2019 19:16:25
  Elapsed time:           29 secs
  Files Expected:         165,117
  Files Restored:         0
  Bytes Restored:         0
  Rate:                   0.0 KB/s
  FD Errors:              2
  FD termination status:
  SD termination status:  Fatal Error
  Bareos binary info:     bareos.org build: Get official binaries and vendor support on bareos.com
  Termination:            *** Restore Error ***

09-Jun 19:16 bareos-dir: ERROR in dird/socket_server.cc:88 Connection request from client failed.
bareos-dir-log.txt (3,831 bytes)   
bareos-fd-oom.png (51,669 bytes)   
bareos-fd-oom.png (51,669 bytes)   
bruno-at-bareos

bruno-at-bareos

2023-08-23 13:55

manager   ~0005347

backporting upstream comment: The where parameter is set to a ram mount, so the restore simply fill it up quickly.

In case someone tumble upon this. I tried this today and got the same error. It turned out the default restore location Where is /tmp/bareos-restores, which on the rear recover console is in ram, thus you will get OOM pretty quick. You will need to specify the Where parameter and set it to /mnt/local which is the restoration destination disk. Hope it helps someone.

not a bug.

Issue History

Date Modified Username Field Change
2019-06-18 21:57 Cookiefamily New Issue
2019-06-18 21:57 Cookiefamily Tag Attached: crash
2019-06-18 21:57 Cookiefamily Tag Attached: fd
2019-06-18 21:57 Cookiefamily File Added: bareos-dir-log.txt
2019-06-18 21:57 Cookiefamily File Added: bareos-fd-oom.png
2023-08-23 13:55 bruno-at-bareos Assigned To => bruno-at-bareos
2023-08-23 13:55 bruno-at-bareos Status new => closed
2023-08-23 13:55 bruno-at-bareos Resolution open => no change required
2023-08-23 13:55 bruno-at-bareos Note Added: 0005347