View Issue Details

IDProjectCategoryView StatusLast Update
0001064bareos-core[All Projects] Generalpublic2019-05-25 07:10
Reporterj.friedrichAssigned Toarogge 
PrioritynormalSeveritymajorReproducibilityalways
Status feedbackResolutionopen 
PlatformLinuxOSCentOSOS Version7
Product Version18.2.5 
Target VersionFixed in Version 
Summary0001064: bconsole crashes after 5 minutes in restore operation due to forgotten watchdog
DescriptionDebug trace:

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c3898, interval 300 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c3828 tid=0x2c00007f6d858b78 for 300 secs at 1551200139

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 21c1928, interval 600 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 21c0fb8 tid=0x5800007f6d858b78 for 600 secs at 1551200139

bconsole (900): lib/btimers.cc:206-0 Stop bsock timer 21c0fb8 tid=0x0000007f6d858b78 at 1551200139.
bconsole (800): lib/watchdog.cc:218-0 Unregistered watchdog 21c1928

However, watchdog 21c3898 is still active and will fire after 5 minutes.

bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x21c3898 fire=1551200439
bconsole (900): lib/btimers.cc:244-0 thread timer 21c3898 kill bsock tid=7f6d7aadace0 at 1551200439.
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/signal.cc:137-0 sig=12 User-defined signal 2
Steps To Reproduce1. start bconsole
2. run restore immediately
3. select a restore with a larger directory which takes longer than 5 minutes to complete
4. after 5 minutes you will be returned to Linux prompt

Workaround:
wait for > 5 minutes before starting the restore operation
Additional InformationThe bug is in core/src/lib/bsock.cc:

bool BareosSocket::ConsoleAuthenticateWithDirector(JobControlRecord* jcr,
                                                   const char* identity,
                                                   s_password& password,
                                                   TlsResource* tls_resource,
                                                   BStringList& response_args,
                                                   uint32_t& response_id)
{
  char bashed_name[MAX_NAME_LENGTH];
  BareosSocket* dir = this; /* for readability */

  bstrncpy(bashed_name, identity, sizeof(bashed_name));
  BashSpaces(bashed_name);

dir->StartTimer(60 * 5); /* 5 minutes */

This timer will not be stopped on success!

TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

cmlara

cmlara

2019-05-14 01:23

reporter   ~0003365

I haven't done any debug tracing but I have seen it twice now, during labeling a similar issue happens that the console will in the middle of labeling will exit out. No core dumps or errors on the system. Labeling completes in the background.
I suspect it is this issue as well but will need to get another batch of tapes in to label before I can test for sure.
arogge

arogge

2019-05-14 09:44

developer   ~0003368

> 3. select a restore with a larger directory which takes longer than 5 minutes to complete
Does this mean I should pick an option where generating the file list takes more than five minutes?

Could you please describe how to reproduce this in more detail?
cmlara

cmlara

2019-05-25 07:10

reporter   ~0003382

arogge: I can't confirm the restore side of this, but I can confirm with debug data that I'm seeing the exact same issues (timer is created, timer never gets destroyed, terminal closes) during tape labeling.

Do you want to run the label scenario under this ticket since on first glance they seem to share the same root cause or would you like me to separate it out as a new ticket?

(key snippets for the label test, can provide the full log for you if wanted under this ticket or a new ticket)

bconsole (800): lib/watchdog.cc:198-0 Registered watchdog 55b0a735d628, interval 300 one shot
bconsole (1900): lib/watchdog.cc:310-0 pthread_cond_timedwait 60
bconsole (900): lib/btimers.cc:188-0 Start bsock timer 55b0a734b068 tid=0xd000007f48009217 for 300 secs at 1558759034

label barcode slot=1,2,3,14,15 encrypt
bconsole (900): lib/btimers.cc:201-0 StopBsockTimer called with NULL btimer_id

bconsole (3400): lib/watchdog.cc:281-0 Watchdog callback p=0x55b0a735d628 fire=1558759334
bconsole (900): lib/btimers.cc:244-0 thread timer 55b0a735d628 kill bsock tid=7f47fdd19e20 at 1558759334.

Issue History

Date Modified Username Field Change
2019-02-26 18:02 j.friedrich New Issue
2019-05-14 01:23 cmlara Note Added: 0003365
2019-05-14 09:44 arogge Assigned To => arogge
2019-05-14 09:44 arogge Status new => feedback
2019-05-14 09:44 arogge Note Added: 0003368
2019-05-25 07:10 cmlara Note Added: 0003382