View Issue Details

IDProjectCategoryView StatusLast Update
0000695bareos-coredirectorpublic2019-01-31 10:14
Reporterpascal Assigned To 
PrioritynormalSeveritymajorReproducibilitysometimes
Status closedResolutionfixed 
PlatformLinuxOSDebianOS Version8
Product Version15.2.2 
Fixed in Version17.2.6 
Summary0000695: Crash in director statistics thread
Descriptionmy bareos director keeps crashing about 3 times a week. To me it looks like the Process gets killed with Signal 7 (SIGBUS) while doing some statistcs stuff.
Steps To ReproduceI can't force the issue but it keeks crashing pretty regulary every 2 days.
Additional InformationI've got the debug symbols installed and have a lot of traceback / bactrace / core dumps available if you need it. I will attach one traceback.
TagsNo tags attached.

Relationships

child of 0000903 closed director crashes some time after a reload if Collect Statistic is enabled 
child of 0000916 closedjoergs Release bareos-17.2.6 

Activities

pascal

pascal

2016-09-12 10:17

reporter  

bareos.28190.traceback (8,287 bytes)
tigerfoot

tigerfoot

2016-09-13 08:55

developer   ~0002352

Hi Pascal, to have a chance to reproduce it, or at least being as close as possible to your env, could you describe a it a bit more.

Source of the packages (from where did you install them)
Database used
Filesystemd used
system message around the crash time
check if any rotatelog or cron job runs during the crash
normal status of daemon ( systemctl status bareos-fd bareos-sd bareos-dir )

You can perhaps also check if your assumption of "doing stats" is related

in bconsole
update stats days=3

prune stats yes
pascal

pascal

2016-09-13 09:45

reporter   ~0002353

I will do my best.

The Debian Packages are coming from http://download.bareos.org/bareos/release/15.2/Debian_8.0. For the Database we are using a postgres 9.4 on a different server. The Filesystem on the director is ext4.

As the director keeps crashing at different times I can't find any relations with cronjobs or logrotation.

In syslog I can see that the process gets interrupted by signal 7. Otherwise there are no log entries concerning the bareos-director.

Example:
Sep 11 13:01:15 mgmt01 bareos-dir: BAREOS interrupted by signal 7: BUS error

root@mgmt01:~# systemctl status bareos-dir
● bareos-director.service - Bareos Director Daemon service
   Loaded: loaded (/lib/systemd/system/bareos-director.service; enabled)
   Active: active (running) since Mon 2016-09-12 09:56:54 CEST; 23h ago
     Docs: man:bareos-dir(8)
  Process: 10225 ExecStart=/usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf (code=exited, status=0/SUCCESS)
  Process: 10218 ExecStartPre=/usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf -t -f (code=exited, status=0/SUCCESS)
 Main PID: 10231 (bareos-dir)
   CGroup: /system.slice/bareos-director.service
           └─10231 /usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf

"update stats days=3" and "prune stats yes" are working. But i've since changed the configuration of the director and set "Collect Statistics = no" in the Storage section of the director in the hopes that the system is running more stable now. So far the director hasn't crashed but it's been only 1 day.

regards,
Pascal
tigerfoot

tigerfoot

2016-10-11 21:27

developer   ~0002380

Ping Pascal
pascal

pascal

2016-10-12 14:55

reporter   ~0002382

Sorry for not getting back to you.

I've set 'Collect Statistics = no' on all the Storage Resources in the Director and restartet all my Storage Deamons. This was about 4 weeks ago. I haven't had a single crash since.

regards,
Pascal
franku

franku

2018-05-07 17:58

administrator   ~0003000

Fix committed to bareos dev branch with changesetid 8009.
pstorz

pstorz

2018-05-21 10:06

administrator   ~0003007

Fix committed to bareos bareos-17.2 branch with changesetid 8592.

Related Changesets

bareos: dev 8f42a3a6

2018-05-07 19:26

franku

Ported: N/A

Details Diff
dird: statistic thread crash fixed

- stop statistics thread before reload config and restart afterwards
- added debug message when old resources table is destroyed within callback
- cleanup variable names and removed obvious comments

Fixes 0000695: director crashes some time after a reload if Collect Statistic is enabled
Affected Issues
0000695
mod - core/src/dird/dird.cc Diff File
mod - core/src/dird/stats.cc Diff File

bareos: bareos-17.2 b97f07a5

2018-05-09 12:40

franku


Committer: pstorz

Ported: N/A

Details Diff
backport: dird: statistic thread crash fixed

- stop statistics thread before reload config and restart afterwards
- added debug message when old resources table is destroyed within callback
- cleanup variable names and removed obvious comments

Fixes 0000695: director crashes some time after a reload if Collect Statistic is enabled
Affected Issues
0000695
mod - src/dird/dird.c Diff File
mod - src/dird/stats.c Diff File

Issue History

Date Modified Username Field Change
2016-09-12 10:17 pascal New Issue
2016-09-12 10:17 pascal File Added: bareos.28190.traceback
2016-09-13 08:55 tigerfoot Note Added: 0002352
2016-09-13 09:45 pascal Note Added: 0002353
2016-10-11 21:27 tigerfoot Note Added: 0002380
2016-10-12 14:55 pascal Note Added: 0002382
2018-02-02 14:42 joergs Relationship added child of 0000903
2018-05-07 17:58 franku Changeset attached => bareos dev 8f42a3a6
2018-05-07 17:58 franku Note Added: 0003000
2018-05-07 17:58 franku Status new => resolved
2018-05-07 17:58 franku Resolution open => fixed
2018-05-21 10:06 pstorz Changeset attached => bareos bareos-17.2 b97f07a5
2018-05-21 10:06 pstorz Note Added: 0003007
2018-06-08 13:48 joergs Relationship added child of 0000916
2018-06-08 13:49 joergs Fixed in Version => 17.2.6
2018-06-22 17:14 joergs Status resolved => closed
2019-01-31 10:13 arogge_adm Relationship added child of 0001040
2019-01-31 10:14 arogge_adm Relationship deleted child of 0001040