View Issue Details

IDProjectCategoryView StatusLast Update
0001064bareos-coreGeneralpublic2019-10-28 11:33
Reporterj.friedrich Assigned Tofranku  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
PlatformLinuxOSCentOSOS Version7
Product Version18.2.5 
Summary0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog
DescriptionDebug trace:

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c3898, interval 300 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c3828 tid=0x2c00007f6d858b78 for 300 secs at 1551200139

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c1928, interval 600 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c0fb8 tid=0x5800007f6d858b78 for 600 secs at 1551200139

bconsole (900): lib/btimers.cc:206-0 Stop bsock timer 21c0fb8 tid=0x0000007f6d858b78 at 1551200139.
bconsole (800): lib/watchdog.cc:218-0 Unregistered watchdog 21c1928

However, watchdog 21c3898 is still active and will fire after 5 minutes.

bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x21c3898 fire=1551200439
bconsole (900): lib/btimers.cc:244-0 thread timer 21c3898 kill bsock tid=7f6d7aadace0 at 1551200439.
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/signal.cc:137-0 sig=12 User-defined signal 2
Steps To Reproduce1. start bconsole
2. run restore immediately
3. select a restore with a larger directory which takes longer than 5 minutes to complete
4. after 5 minutes you will be returned to Linux prompt

Workaround:
wait for > 5 minutes before starting the restore operation
Additional InformationThe bug is in core/src/lib/bsock.cc:

bool BareosSocket::ConsoleAuthenticateWithDirector(JobControlRecord* jcr,
                                                   const char* identity,
                                                   s_password& password,
                                                   TlsResource* tls_resource,
                                                   BStringList& response_args,
                                                   uint32_t& response_id)
{
  char bashed_name[MAX_NAME_LENGTH];
  BareosSocket* dir = this; /* for readability */

  bstrncpy(bashed_name, identity, sizeof(bashed_name));
  BashSpaces(bashed_name);

dir->StartTimer(60 * 5); /* 5 minutes */

This timer will not be stopped on success!

TagsNo tags attached.

Activities

cmlara

cmlara

2019-05-14 01:23

reporter   ~0003365

I haven't done any debug tracing but I have seen it twice now, during labeling a similar issue happens that the console will in the middle of labeling will exit out. No core dumps or errors on the system. Labeling completes in the background.
I suspect it is this issue as well but will need to get another batch of tapes in to label before I can test for sure.
arogge

arogge

2019-05-14 09:44

manager   ~0003368

> 3. select a restore with a larger directory which takes longer than 5 minutes to complete
Does this mean I should pick an option where generating the file list takes more than five minutes?

Could you please describe how to reproduce this in more detail?
cmlara

cmlara

2019-05-25 07:10

reporter   ~0003382

arogge: I can't confirm the restore side of this, but I can confirm with debug data that I'm seeing the exact same issues (timer is created, timer never gets destroyed, terminal closes) during tape labeling.

Do you want to run the label scenario under this ticket since on first glance they seem to share the same root cause or would you like me to separate it out as a new ticket?

(key snippets for the label test, can provide the full log for you if wanted under this ticket or a new ticket)

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 55b0a735d628, interval 300 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 55b0a734b068 tid=0xd000007f48009217 for 300 secs at 1558759034

label barcode slot=1,2,3,14,15 encrypt
bconsole (900): lib/btimers.cc:201-0 StopBsockTimer called with NULL btimer_id

bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x55b0a735d628 fire=1558759334
bconsole (900): lib/btimers.cc:244-0 thread timer 55b0a735d628 kill bsock tid=7f47fdd19e20 at 1558759334.
TEKrantz

TEKrantz

2019-05-29 19:44

reporter   ~0003384

In my case when it takes longer than 5 minutes to build the file list for a restore (note -u does not seem to help) (in my case it took about 2.5 seconds to enter restore 6 5, then 5 minutes according to wall time to be dropped back into a prompt)

time bconsole -u 999
Connecting to Director localhost:9101
 Encryption: None
1000 OK: bareos-dir Version: 19.1.2 (01 February 2019)
self-compiled binary
self-compiled binaries are UNSUPPORTED by bareos.com.
Get official binaries and vendor support on https://www.bareos.com
You are connected using the default console

Enter a period to cancel a command.
*restore
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"

First you select one or more JobIds that contain files
to be restored. You will be presented several methods
of specifying the JobIds. Then you will be allowed to
select which files from those JobIds are to be restored.

To select the JobIds, you have the following choices:
     1: List last 20 Jobs run
     2: List Jobs where a given File is saved
     3: Enter list of comma separated JobIds to select
     4: Enter SQL list command
     5: Select the most recent backup for a client
     6: Select backup for a client before a specified time
     7: Enter a list of files to restore
     8: Enter a list of files to restore before a specified time
     9: Find the JobIds of the most recent backup for a client
    10: Find the JobIds for a backup for a client before a specified time
    11: Enter a list of directories to restore for found JobIds
    12: Select full restore to a specified Job date
    13: Cancel
Select item: (1-13): 5
Defined Clients:
     1: bareos-fd
     2: bpp-fd
     3: brslave-fd
     4: c2-fd
     5: cubietruck-fd
     6: eb-fd
     7: ebv7-fd
     8: fw-fd
     9: hal-fd
    10: hc2-fd
    11: jet-fd
    12: lime2-fd
    13: liva-fd
    14: mycroft-fd
    15: mythdora-fd
    16: nano-fd
    17: p3-fd
    18: pine64-fd
    19: raspi-fd
    20: revofe-fd
    21: rp3-fd
    22: rp3p-fd
    23: wbq-fd
Select the Client (1-23): 6
Automatically selected FileSet: Linux All
+-------+-------+-----------+----------------+---------------------+-------------------+
| JobId | Level | JobFiles | JobBytes | StartTime | VolumeName |
+-------+-------+-----------+----------------+---------------------+-------------------+
| 2,390 | F | 1,740,307 | 85,957,529,345 | 2019-05-04 02:00:45 | Full-0005 |
| 2,821 | D | 555,113 | 27,404,175,565 | 2019-05-26 09:53:44 | Differential-0632 |
| 2,837 | I | 84 | 92,990,165 | 2019-05-27 20:03:39 | Incremental-0153 |
+-------+-------+-----------+----------------+---------------------+-------------------+
You have selected the following JobIds: 2390,2821,2837

Building directory tree for JobId(s) 2390,2821,2837 ...
real 5m2.561s
user 0m0.023s
sys 0m0.119s
franku

franku

2019-10-28 11:22

administrator   ~0003610

Fix committed to bareos bareos-18.2 branch with changesetid 11967.

Related Changesets

bareos: master 4c32d3cc

2019-10-25 15:13

franku

Ported: N/A

Details Diff
console: stop bsock timer before leaving scope of function

Fixes 0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog
Affected Issues
0001064
mod - core/src/lib/bsock.cc Diff File

bareos: bareos-18.2 72920245

2019-10-25 15:13

franku

Ported: N/A

Details Diff
console: stop bsock timer before leaving scope of function

Fixes 0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog
Affected Issues
0001064
mod - core/src/lib/bsock.cc Diff File

Issue History

Date Modified Username Field Change
2019-02-26 18:02 j.friedrich New Issue
2019-05-14 01:23 cmlara Note Added: 0003365
2019-05-14 09:44 arogge Assigned To => arogge
2019-05-14 09:44 arogge Status new => feedback
2019-05-14 09:44 arogge Note Added: 0003368
2019-05-25 07:10 cmlara Note Added: 0003382
2019-05-29 19:44 TEKrantz Note Added: 0003384
2019-07-04 15:58 arogge Assigned To arogge =>
2019-07-04 15:58 arogge Status feedback => acknowledged
2019-10-25 13:31 franku Assigned To => franku
2019-10-25 13:31 franku Status acknowledged => assigned
2019-10-28 11:22 franku Changeset attached => bareos master 4c32d3cc
2019-10-28 11:22 franku Changeset attached => bareos bareos-18.2 72920245
2019-10-28 11:22 franku Note Added: 0003610
2019-10-28 11:22 franku Status assigned => resolved
2019-10-28 11:22 franku Resolution open => fixed
2019-10-28 11:33 franku Status resolved => new
2019-10-28 11:33 franku Resolution fixed => reopened
2019-10-28 11:33 franku Status new => closed
2019-10-28 11:33 franku Resolution reopened => fixed