View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000578 | bareos-core | director | public | 2015-12-04 10:10 | 2019-07-04 15:54 |
Reporter | s2Xk4G | Assigned To | arogge | ||
Priority | high | Severity | crash | Reproducibility | always |
Status | closed | Resolution | suspended | ||
Platform | Linux | OS | Debian | OS Version | 8 |
Product Version | 15.2.2 | ||||
Summary | 0000578: Director becomes unresponsible when doing big backups | ||||
Description | Hello, following situation: bareos-director/sd 15.2.2-35.1 (@debian jessie) on the backup-server are installed. PG is the directors backend-DB. About 110 Jobs per day (mixed Full / Incremental) run fine. Different clients have mixed filedaemon versions - 15.2.2-XX(@ debian jessie) / 14.X (@debian wheezy). No issues. Absolutely. Job-sizes are up to 800 GB and run fine. Now i've added a job with 0000003:0000006.5 TB. The client's filedaemon is also the 15.2.2-35.1. Identical to director and storage-daemon. Now what's happening: the job is started, run's well for some hours - 5 to 20(this one from the latest attemp; never made to run so long), and then, at some point in time X, the director becomes unresponsible. Unresponsible in weird way - i cannot connect using bcosole anymore. But the log is getting written further. All jobs that start AFTER that time X are failing (Scheduler fails?) Long running-Jobs were started BEFORE that time X run properly (Job 2997 in my "jobs.list" listing) But, according to director logs, this big job is running. I also see on the client side, that the filedaemon on the client is consuming CPU. Here Example from the logs, that i'm attaching: The job 2949 was started at 11:07am. It should backup these 6.5 TB. It ran until next morning, where about 9:10 i've killed the director. An another job, Job 2997, which was started at 22:43 ran until 8:31 next morning, was completed successfully(see log). | ||||
Steps To Reproduce | launch this job and wait some hours. | ||||
Tags | No tags attached. | ||||
Job size is "6.5TB". Mantis autocorrection has done something weird with my text. | |
And, of course, the Subject should be "Director becomes unresponsive when doing big backups" and not "Director becomes unresponsible when doing big backups" |
|
The logs don't show anything special. You probably want to make a debug log file using bareos-dir -f -d 100 or 200 but for this particular problem that is going to be so big we cannot really analyze that as part of simple non supported environment. So either make your Job smaller e.g. split it or analyze the debug log yourself and pinpoint it to a somewhat more concrete problem that might be solvable. |
|
can you still reproduce this issue with 17.2 or 18.2.4rc2? | |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-12-04 10:10 | s2Xk4G | New Issue | |
2015-12-04 10:10 | s2Xk4G | File Added: jobs_listing_and_director_log.tgz | |
2015-12-04 10:12 | s2Xk4G | Note Added: 0002027 | |
2015-12-04 10:14 | s2Xk4G | Note Added: 0002028 | |
2016-01-11 15:48 | mvwieringen | Note Added: 0002088 | |
2016-01-11 15:48 | mvwieringen | Status | new => feedback |
2019-01-16 13:13 | arogge | Note Added: 0003188 | |
2019-07-04 15:54 | arogge | Assigned To | => arogge |
2019-07-04 15:54 | arogge | Status | feedback => closed |
2019-07-04 15:54 | arogge | Resolution | open => suspended |