View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001064 | bareos-core | General | public | 2019-02-26 18:02 | 2019-10-28 11:33 |
Reporter | j.friedrich | Assigned To | franku | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | Linux | OS | CentOS | OS Version | 7 |
Product Version | 18.2.5 | ||||
Summary | 0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog | ||||
Description | Debug trace: bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c3898, interval 300 one shot bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c3828 tid=0x2c00007f6d858b78 for 300 secs at 1551200139 bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c1928, interval 600 one shot bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c0fb8 tid=0x5800007f6d858b78 for 600 secs at 1551200139 bconsole (900): lib/btimers.cc:206-0 Stop bsock timer 21c0fb8 tid=0x0000007f6d858b78 at 1551200139. bconsole (800): lib/watchdog.cc:218-0 Unregistered watchdog 21c1928 However, watchdog 21c3898 is still active and will fire after 5 minutes. bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x21c3898 fire=1551200439 bconsole (900): lib/btimers.cc:244-0 thread timer 21c3898 kill bsock tid=7f6d7aadace0 at 1551200439. bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (900): lib/signal.cc:137-0 sig=12 User-defined signal 2 | ||||
Steps To Reproduce | 1. start bconsole 2. run restore immediately 3. select a restore with a larger directory which takes longer than 5 minutes to complete 4. after 5 minutes you will be returned to Linux prompt Workaround: wait for > 5 minutes before starting the restore operation | ||||
Additional Information | The bug is in core/src/lib/bsock.cc: bool BareosSocket::ConsoleAuthenticateWithDirector(JobControlRecord* jcr, const char* identity, s_password& password, TlsResource* tls_resource, BStringList& response_args, uint32_t& response_id) { char bashed_name[MAX_NAME_LENGTH]; BareosSocket* dir = this; /* for readability */ bstrncpy(bashed_name, identity, sizeof(bashed_name)); BashSpaces(bashed_name); dir->StartTimer(60 * 5); /* 5 minutes */ This timer will not be stopped on success! | ||||
Tags | No tags attached. | ||||
I haven't done any debug tracing but I have seen it twice now, during labeling a similar issue happens that the console will in the middle of labeling will exit out. No core dumps or errors on the system. Labeling completes in the background. I suspect it is this issue as well but will need to get another batch of tapes in to label before I can test for sure. |
|
> 3. select a restore with a larger directory which takes longer than 5 minutes to complete Does this mean I should pick an option where generating the file list takes more than five minutes? Could you please describe how to reproduce this in more detail? |
|
arogge: I can't confirm the restore side of this, but I can confirm with debug data that I'm seeing the exact same issues (timer is created, timer never gets destroyed, terminal closes) during tape labeling. Do you want to run the label scenario under this ticket since on first glance they seem to share the same root cause or would you like me to separate it out as a new ticket? (key snippets for the label test, can provide the full log for you if wanted under this ticket or a new ticket) bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 55b0a735d628, interval 300 one shot bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60 bconsole (900): lib/btimers.cc:188-0 Start bsock timer 55b0a734b068 tid=0xd000007f48009217 for 300 secs at 1558759034 label barcode slot=1,2,3,14,15 encrypt bconsole (900): lib/btimers.cc:201-0 StopBsockTimer called with NULL btimer_id bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x55b0a735d628 fire=1558759334 bconsole (900): lib/btimers.cc:244-0 thread timer 55b0a735d628 kill bsock tid=7f47fdd19e20 at 1558759334. |
|
In my case when it takes longer than 5 minutes to build the file list for a restore (note -u does not seem to help) (in my case it took about 2.5 seconds to enter restore 6 5, then 5 minutes according to wall time to be dropped back into a prompt) time bconsole -u 999 Connecting to Director localhost:9101 Encryption: None 1000 OK: bareos-dir Version: 19.1.2 (01 February 2019) self-compiled binary self-compiled binaries are UNSUPPORTED by bareos.com. Get official binaries and vendor support on https://www.bareos.com You are connected using the default console Enter a period to cancel a command. *restore Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" First you select one or more JobIds that contain files to be restored. You will be presented several methods of specifying the JobIds. Then you will be allowed to select which files from those JobIds are to be restored. To select the JobIds, you have the following choices: 1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel Select item: (1-13): 5 Defined Clients: 1: bareos-fd 2: bpp-fd 3: brslave-fd 4: c2-fd 5: cubietruck-fd 6: eb-fd 7: ebv7-fd 8: fw-fd 9: hal-fd 10: hc2-fd 11: jet-fd 12: lime2-fd 13: liva-fd 14: mycroft-fd 15: mythdora-fd 16: nano-fd 17: p3-fd 18: pine64-fd 19: raspi-fd 20: revofe-fd 21: rp3-fd 22: rp3p-fd 23: wbq-fd Select the Client (1-23): 6 Automatically selected FileSet: Linux All +-------+-------+-----------+----------------+---------------------+-------------------+ | JobId | Level | JobFiles | JobBytes | StartTime | VolumeName | +-------+-------+-----------+----------------+---------------------+-------------------+ | 2,390 | F | 1,740,307 | 85,957,529,345 | 2019-05-04 02:00:45 | Full-0005 | | 2,821 | D | 555,113 | 27,404,175,565 | 2019-05-26 09:53:44 | Differential-0632 | | 2,837 | I | 84 | 92,990,165 | 2019-05-27 20:03:39 | Incremental-0153 | +-------+-------+-----------+----------------+---------------------+-------------------+ You have selected the following JobIds: 2390,2821,2837 Building directory tree for JobId(s) 2390,2821,2837 ... real 5m2.561s user 0m0.023s sys 0m0.119s |
|
Fix committed to bareos bareos-18.2 branch with changesetid 11967. | |
bareos: master 4c32d3cc 2019-10-25 15:13 Ported: N/A Details Diff |
console: stop bsock timer before leaving scope of function Fixes 0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog |
Affected Issues 0001064 |
|
mod - core/src/lib/bsock.cc | Diff File | ||
bareos: bareos-18.2 72920245 2019-10-25 15:13 Ported: N/A Details Diff |
console: stop bsock timer before leaving scope of function Fixes 0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog |
Affected Issues 0001064 |
|
mod - core/src/lib/bsock.cc | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-02-26 18:02 | j.friedrich | New Issue | |
2019-05-14 01:23 | cmlara | Note Added: 0003365 | |
2019-05-14 09:44 | arogge | Assigned To | => arogge |
2019-05-14 09:44 | arogge | Status | new => feedback |
2019-05-14 09:44 | arogge | Note Added: 0003368 | |
2019-05-25 07:10 | cmlara | Note Added: 0003382 | |
2019-05-29 19:44 | TEKrantz | Note Added: 0003384 | |
2019-07-04 15:58 | arogge | Assigned To | arogge => |
2019-07-04 15:58 | arogge | Status | feedback => acknowledged |
2019-10-25 13:31 | franku | Assigned To | => franku |
2019-10-25 13:31 | franku | Status | acknowledged => assigned |
2019-10-28 11:22 | franku | Changeset attached | => bareos master 4c32d3cc |
2019-10-28 11:22 | franku | Changeset attached | => bareos bareos-18.2 72920245 |
2019-10-28 11:22 | franku | Note Added: 0003610 | |
2019-10-28 11:22 | franku | Status | assigned => resolved |
2019-10-28 11:22 | franku | Resolution | open => fixed |
2019-10-28 11:33 | franku | Status | resolved => new |
2019-10-28 11:33 | franku | Resolution | fixed => reopened |
2019-10-28 11:33 | franku | Status | new => closed |
2019-10-28 11:33 | franku | Resolution | reopened => fixed |