View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000695 | bareos-core | director | public | 2016-09-12 10:17 | 2019-01-31 10:14 |
Reporter | pascal | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | sometimes |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | Debian | OS Version | 8 |
Product Version | 15.2.2 | ||||
Fixed in Version | 17.2.6 | ||||
Summary | 0000695: Crash in director statistics thread | ||||
Description | my bareos director keeps crashing about 3 times a week. To me it looks like the Process gets killed with Signal 7 (SIGBUS) while doing some statistcs stuff. | ||||
Steps To Reproduce | I can't force the issue but it keeks crashing pretty regulary every 2 days. | ||||
Additional Information | I've got the debug symbols installed and have a lot of traceback / bactrace / core dumps available if you need it. I will attach one traceback. | ||||
Tags | No tags attached. | ||||
Hi Pascal, to have a chance to reproduce it, or at least being as close as possible to your env, could you describe a it a bit more. Source of the packages (from where did you install them) Database used Filesystemd used system message around the crash time check if any rotatelog or cron job runs during the crash normal status of daemon ( systemctl status bareos-fd bareos-sd bareos-dir ) You can perhaps also check if your assumption of "doing stats" is related in bconsole update stats days=3 prune stats yes |
|
I will do my best. The Debian Packages are coming from http://download.bareos.org/bareos/release/15.2/Debian_8.0. For the Database we are using a postgres 9.4 on a different server. The Filesystem on the director is ext4. As the director keeps crashing at different times I can't find any relations with cronjobs or logrotation. In syslog I can see that the process gets interrupted by signal 7. Otherwise there are no log entries concerning the bareos-director. Example: Sep 11 13:01:15 mgmt01 bareos-dir: BAREOS interrupted by signal 7: BUS error root@mgmt01:~# systemctl status bareos-dir ● bareos-director.service - Bareos Director Daemon service Loaded: loaded (/lib/systemd/system/bareos-director.service; enabled) Active: active (running) since Mon 2016-09-12 09:56:54 CEST; 23h ago Docs: man:bareos-dir(8) Process: 10225 ExecStart=/usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf (code=exited, status=0/SUCCESS) Process: 10218 ExecStartPre=/usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf -t -f (code=exited, status=0/SUCCESS) Main PID: 10231 (bareos-dir) CGroup: /system.slice/bareos-director.service └─10231 /usr/sbin/bareos-dir -c /etc/bareos/bareos-dir.conf "update stats days=3" and "prune stats yes" are working. But i've since changed the configuration of the director and set "Collect Statistics = no" in the Storage section of the director in the hopes that the system is running more stable now. So far the director hasn't crashed but it's been only 1 day. regards, Pascal |
|
Ping Pascal | |
Sorry for not getting back to you. I've set 'Collect Statistics = no' on all the Storage Resources in the Director and restartet all my Storage Deamons. This was about 4 weeks ago. I haven't had a single crash since. regards, Pascal |
|
Fix committed to bareos dev branch with changesetid 8009. | |
Fix committed to bareos bareos-17.2 branch with changesetid 8592. | |
bareos: dev 8f42a3a6 2018-05-07 19:26 Ported: N/A Details Diff |
dird: statistic thread crash fixed - stop statistics thread before reload config and restart afterwards - added debug message when old resources table is destroyed within callback - cleanup variable names and removed obvious comments Fixes 0000695: director crashes some time after a reload if Collect Statistic is enabled |
Affected Issues 0000695 |
|
mod - core/src/dird/dird.cc | Diff File | ||
mod - core/src/dird/stats.cc | Diff File | ||
bareos: bareos-17.2 b97f07a5 2018-05-09 12:40 Committer: pstorz Ported: N/A Details Diff |
backport: dird: statistic thread crash fixed - stop statistics thread before reload config and restart afterwards - added debug message when old resources table is destroyed within callback - cleanup variable names and removed obvious comments Fixes 0000695: director crashes some time after a reload if Collect Statistic is enabled |
Affected Issues 0000695 |
|
mod - src/dird/dird.c | Diff File | ||
mod - src/dird/stats.c | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-09-12 10:17 | pascal | New Issue | |
2016-09-12 10:17 | pascal | File Added: bareos.28190.traceback | |
2016-09-13 08:55 | tigerfoot | Note Added: 0002352 | |
2016-09-13 09:45 | pascal | Note Added: 0002353 | |
2016-10-11 21:27 | tigerfoot | Note Added: 0002380 | |
2016-10-12 14:55 | pascal | Note Added: 0002382 | |
2018-02-02 14:42 | joergs | Relationship added | child of 0000903 |
2018-05-07 17:58 | franku | Changeset attached | => bareos dev 8f42a3a6 |
2018-05-07 17:58 | franku | Note Added: 0003000 | |
2018-05-07 17:58 | franku | Status | new => resolved |
2018-05-07 17:58 | franku | Resolution | open => fixed |
2018-05-21 10:06 | pstorz | Changeset attached | => bareos bareos-17.2 b97f07a5 |
2018-05-21 10:06 | pstorz | Note Added: 0003007 | |
2018-06-08 13:48 | joergs | Relationship added | child of 0000916 |
2018-06-08 13:49 | joergs | Fixed in Version | => 17.2.6 |
2018-06-22 17:14 | joergs | Status | resolved => closed |
2019-01-31 10:13 | arogge_adm | Relationship added | child of 0001040 |
2019-01-31 10:14 | arogge_adm | Relationship deleted | child of 0001040 |