View Issue Details

IDProjectCategoryView StatusLast Update
0000527bareos-core[All Projects] directorpublic2019-01-16 11:36
ReporterilovingAssigned Toarogge 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionwon't fix 
PlatformLinuxOSCentOSOS Version6
Product Version15.2.2 
Target VersionFixed in Version 
Summary0000527: Director crashes (not reproduceable) , the first time you go to Jobs section in bat
DescriptionGoing to jobs section in bat causes bareos-dir to crash.
Steps To Reproduce1. Start Bareos-dir if it's not running
2. Launch bat
3. Go to Jobs tab

Bat will appear to hang while populating the jobs list window, but what's actually happened is that bareos-dir has crashed.

If you restart bareos-dir again, without closing bat, bat will reconnect and everything continues as normal. If you quit bat and re-run it, then the problem happens again.
Additional InformationThe following is mailed to the root user:
From root@localhost.jonahgroup.com Wed Sep 30 13:49:23 2015
Return-Path: <root@localhost.jonahgroup.com>
X-Original-To: root@localhost
Delivered-To: root@localhost.jonahgroup.com
From: root@localhost.jonahgroup.com
Subject: Bareos GDB traceback of bareos-dir on archive01.jonahgroup.com
Sender: bareos@archive01.jonahgroup.com
To: root@localhost.jonahgroup.com
Date: Wed, 30 Sep 2015 13:49:23 -0400 (EDT)
Status: RO

Created /archivepool/bareos/bareos-dir.core.12195 for doing postmortem debugging
[New Thread 12197]
[New Thread 12200]
[New Thread 12201]
[New Thread 12202]
[New Thread 12234]
[New Thread 12195]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/sbin/bareos-dir -g bareos -c /etc/bareos/bareos-dir.conf'.
#0 0x00007f9f3a1bdfbd in nanosleep () from /lib64/libpthread.so.0
$1 = 1751347809
$2 = 10276984
$3 = 10277048
/usr/lib/bareos/scripts/btraceback.gdb:4: Error in sourced command file:
No symbol table is loaded. Use the "file" command.
TagsNo tags attached.
bareos-master: impactyes
bareos-master: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impactyes
bareos-15.2: action
bareos-14.2: impactyes
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

pstorz

pstorz

2015-10-01 10:13

administrator   ~0001855

Hello,

can you please provide the exact version of your OS?

Also, please install the debuginfo package and send the files that are then generated.

It would also be very intersting if you can reproduce the problem on version 15.2

You find the repos here: http://download.bareos.org/bareos/release/15.2/
iloving

iloving

2015-10-01 17:20

reporter   ~0001859

I've installed bareos-contrib-debuginfo and made it crash again, but I think something might be missing since the traceback file says 'no symbol table loaded'

Created /archivepool/bareos/bareos-dir.core.28293 for doing postmortem debugging
[New Thread 28294]
[New Thread 28297]
[New Thread 28298]
[New Thread 28299]
[New Thread 28328]
[New Thread 28293]
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/sbin/bareos-dir -g bareos -c /etc/bareos/bareos-dir.conf'.
#0 0x0000003bb720efbd in nanosleep () from /lib64/libpthread.so.0
$1 = 1751347809
$2 = 10899576
$3 = 10899640
/usr/lib/bareos/scripts/btraceback.gdb:4: Error in sourced command file:
No symbol table is loaded. Use the "file" command.
iloving

iloving

2015-10-01 17:23

reporter   ~0001860

Ok this is wierd... I left bat open while I was collecting the info you wanted, and even though bareos-dir crashed, the jobs panel (I assume timed out waiting for the director) displayed a partial data set, stopping at one specific job, that happens to be a Copy job.

Dunno if this helps at all, but the job it seems to have died on contains the following:
# Fake fileset for copy jobs
Fileset {
  Name = None_Full
  Include {
    Options {
      signature = MD5
    }
  }
}


# Fake client for copy jobs
Client {
  Name = None_Full
  Address = localhost
  Password = "NoNe"
  Catalog = MyCatalog
}

Job {
  Name = "Copy-Full"
  Type = Copy
  Client = None_Full
  Fileset = None_Full
  Level = Full
  Messages = Standard
  Pool = Full
  Selection Type = PoolUncopiedJobs
  Priority = 100
}
pstorz

pstorz

2015-10-02 10:11

administrator   ~0001861

Hello,

after this commit:
https://github.com/bareos/bareos/commit/858a8a642b3d9e78ce7431be5f59e36498d057af

you should not be obligated to define a fileset for a copy job anymore.

Please check if the problem persists in 15.2.

Thanks for your time and help

Philippp
iloving

iloving

2015-10-21 19:32

reporter   ~0001885

I have verified that the problem continues in 15.2 when using the same config files.

I've tried removing the client and Fileset lines from the jobs in question, and the issue persists.

Here are the last lines generated by bat before everything dies:

bat: console/console.cpp:438-0 job_defaults: key=job, value=Copy-Full
bat: console/console.cpp:438-0 job_defaults: key=pool, value=Full
bat: console/console.cpp:438-0 job_defaults: key=messages, value=Standard
bat: console/console.cpp:438-0 job_defaults: key=client, value=*None*

The Job definition now looks like this:
Job {
  Name = "Copy-Full"
  Type = Copy
  Level = Full
  Messages = Standard
  Pool = Full
  Selection Type = PoolUncopiedJobs
  Priority = 100
}
iloving

iloving

2015-10-21 19:41

reporter  

bareos.12556.traceback (5,807 bytes)
iloving

iloving

2015-10-21 19:42

reporter   ~0001886

I've uploaded the backtrace. The core file is 27mb when gzipped, so I can't upload it.
joergs

joergs

2015-11-16 19:03

administrator   ~0001968

I installed a Centos 6 test system with Bareos 15.2.1-rc2 and did run bat. However, I was not able to reproduce this problem. Of course, I only have a small test setup here.

Please enable "Debug Comm" in bat (Settings -> Preferences -> Debug -> Debug Comm) and run it again.
The the director crashes again, please check the last "send" line from the stdout output of bat.
It should look similar to:

bat: bcomm/dircomm.cpp:276-0 conn 0 send: .defaults job="RestoreFiles"

The text after "send:" is the last command send to the director.

Restart the director and run this command in bconsole, to see if the director also crashes without using bat.
arogge

arogge

2019-01-16 11:36

developer   ~0003185

bat was replaced by bareos-webui, so problems with bat won't be handled anymore.

Issue History

Date Modified Username Field Change
2015-09-30 19:52 iloving New Issue
2015-10-01 10:13 pstorz Note Added: 0001855
2015-10-01 10:13 pstorz Assigned To => pstorz
2015-10-01 10:13 pstorz Status new => feedback
2015-10-01 17:20 iloving Note Added: 0001859
2015-10-01 17:20 iloving Status feedback => assigned
2015-10-01 17:23 iloving Note Added: 0001860
2015-10-02 10:11 pstorz Note Added: 0001861
2015-10-21 19:32 iloving Note Added: 0001885
2015-10-21 19:41 iloving File Added: bareos.12556.traceback
2015-10-21 19:42 iloving Note Added: 0001886
2015-11-06 18:30 maik Relationship added child of 0000554
2015-11-16 18:21 joergs Assigned To pstorz => joergs
2015-11-16 19:03 joergs Note Added: 0001968
2015-11-16 19:04 joergs bareos-master: impact => yes
2015-11-16 19:04 joergs bareos-15.2: impact => yes
2015-11-16 19:04 joergs bareos-14.2: impact => yes
2015-11-16 19:04 joergs Status assigned => feedback
2015-11-20 13:30 maik Priority high => normal
2015-11-20 13:30 maik Severity block => major
2015-11-20 13:30 maik Summary Director crashes, the first time you go to Jobs section in bat => Director crashes (not reproduceable) , the first time you go to Jobs section in bat
2015-12-11 09:48 joergs Assigned To joergs =>
2015-12-11 09:48 joergs Relationship deleted child of 0000554
2015-12-11 09:49 joergs Product Version 14.2.2 => 15.2.2
2019-01-16 11:36 arogge Note Added: 0003185
2019-01-16 11:36 arogge Status feedback => resolved
2019-01-16 11:36 arogge Resolution open => won't fix
2019-01-16 11:36 arogge Assigned To => arogge