View Issue Details

IDProjectCategoryView StatusLast Update
0000987bareos-core[All Projects] directorpublic2019-09-15 16:17
ReporterfrankuAssigned To 
PrioritynormalSeveritymajorReproducibilityrandom
Status newResolutionopen 
PlatformLinuxOSDebianOS Version9
Product Version17.2.6 
Fixed in Version 
Summary0000987: Canceling a job leads to a director crash (TT4200333)
DescriptionWhen canceling a job using bconsole the director can occasionally crash with a coredump.

It is likely that this appears as a result of a race condition where a signal is being sent to a job-thread. The job's thread_id used is a member in the JobControlRecord class whose memory could be deleted meanwhile.
Steps To ReproduceThis issue appears very seldom. No way yet to reproduce reliably.
Additional InformationExcerpt from the coredump:

0000001 0x00007f4107389c14 in signal_handler (sig=11) at signal.c:240
0000002 <signal handler called>
0000003 __pthread_kill (threadid=139913685083904, signo=signo@entry=12) at ../sysdeps/unix/sysv/linux/pthread_kill.c:40
0000004 0x00007f4107377434 in JCR::my_thread_send_signal (this=this@entry=0x558a5e446318, sig=sig@entry=12) at jcr.c:682
0000005 0x0000558a5b0983ec in cancel_file_daemon_job (ua=ua@entry=0x7f3f0c00ed28, jcr=jcr@entry=0x558a5e446318) at fd_cmds.c:1080
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Relationships

child of 0000984 assignedjoergs Release bareos-17.2.8 

Activities

franku

franku

2018-07-18 15:13

developer   ~0003074

Current solution: Refactor the function that frees JobControlRecord (JCR) memory in order to lock the JCR mutex consecutively.
therm

therm

2019-09-15 16:17

reporter   ~0003572

This one affects me also. We have a lot of copy and migrate jobs. Canceling them lets the director crash by a chance of about 50%. If I can provide something to get this one fixed please let me know.
Regards,
Dennis

Issue History

Date Modified Username Field Change
2018-07-18 10:42 franku New Issue
2018-07-18 10:42 franku Status new => assigned
2018-07-18 10:42 franku Assigned To => franku
2018-07-18 15:13 franku Note Added: 0003074
2018-07-18 15:13 franku Description Updated View Revisions
2018-07-18 15:13 franku Steps to Reproduce Updated View Revisions
2019-01-23 11:30 stephand Relationship added child of 0000984
2019-07-15 15:51 franku Status assigned => new
2019-07-15 15:52 franku Assigned To franku =>
2019-09-15 16:17 therm Note Added: 0003572