View Issue Details

IDProjectCategoryView StatusLast Update
0001073bareos-core[All Projects] directorpublic2019-08-01 10:14
ReporterbratkartoffelAssigned Toarogge 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSAlpineOS Version3.8+
Product Version17.2.7 
Target VersionFixed in Version19.2.1 
Summary0001073: bsmtp and bareos-dir crashing with segfaults due to undefined behaviour in pthread usage
DescriptionUsing Alpine 3.8+ (with musl libc) bsmtp crashes when invoked and bareos-dir crashes when running a job or getting a connection with bconsole.
Bug at alpine: https://bugs.alpinelinux.org/issues/10156

Problem may still be there in latest bareos release.
Proposed patches are attached. I'm not that deep into c programming and I'm absolutely not sure if my checks are correct. Please adjust as needed.

Thanks,
Simon
Steps To ReproduceSimple reproduction of bsmtp crash:
---
$> docker run -it alpine:3.9 /bin/ash
/ # apk add bareos
[...]
/ # bsmtp
Segmentation fault (core dumped)
--
Additional Informationbsmtp:
There should be a thread key set called jcr_key, but its initial value is undefined. This is passed into tss_get which is a C11 wrapper for pthread_getspecific.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_getspecific.html
-> The effect of calling pthread_getspecific() or pthread_setspecific() with a key value not obtained from pthread_key_create() or after key has been deleted with pthread_key_delete() is undefined.
---
0x00007ffff7fbe2c3 in tss_get () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0 0x00007ffff7fbe2c3 in tss_get () from /lib/ld-musl-x86_64.so.1
0000001 0x00007ffff7f2d94f in get_jobid_from_tsd () at jcr.c:739
0000002 0x00007ffff7f34279 in p_msg (file=0x555555558006 "bsmtp.c", line=364,
    level=0, fmt=0x555555558568 "Fatal error: no recipient given.\n")
    at message.c:1343
0000003 0x00005555555564d5 in main (argc=0, argv=<optimized out>) at bsmtp.c:364
---

bareos-dir (when bconsole connects):
The issue is that the code tries to do a pthread_detach() on an already detached thread. This leads to undefined behaviour and a crash.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_detach.html
-> The behavior is undefined if the value specified by the thread argument to pthread_detach() does not refer to a joinable thread.
---
(gdb) bt
#0 __pthread_timedjoin_np (t=0x7f9c93e91b10, res=res@entry=0x0, at=at@entry=0x0) at src/thread/pthread_join.c:11
0000001 0x00007f9c95899383 in __pthread_join (t=<optimized out>, res=res@entry=0x0) at src/thread/pthread_join.c:24
0000002 0x00007f9c9589918c in __pthread_detach (t=<optimized out>) at src/thread/pthread_detach.c:9
0000003 __pthread_detach (t=<optimized out>) at src/thread/pthread_detach.c:4
0000004 0x00005559ea34ebd7 in handle_UA_client_request (user=user@entry=0x5559ea99d448) at ua_server.c:76
0000005 0x00005559ea329325 in handle_connection_request (arg=0x5559ea99d448) at socket_server.c:98
0000006 0x00007f9c957d5bc6 in workq_server (arg=arg@entry=0x5559ea38eae0 <socket_workq>) at workq.c:336
0000007 0x00007f9c957c0116 in lmgr_thread_launcher (x=0x5559ea99d588) at lockmgr.c:928
0000008 0x00007f9c95898c4c in start (p=<optimized out>) at src/thread/pthread_create.c:144
0000009 0x00007f9c9589acf2 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
---

bareos-dir (when job starts):
The issue is that the code tries to do a pthread_detach() on an already detached thread. This leads to undefined behaviour and a crash.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_detach.html
-> The behavior is undefined if the value specified by the thread argument to pthread_detach() does not refer to a joinable thread.
---
(gdb) bt
#0 __pthread_timedjoin_np (t=0x7f7daf1ffb10, res=res@entry=0x0, at=at@entry=0x0) at src/thread/pthread_join.c:11
0000001 0x00007f7db0c2a383 in __pthread_join (t=<optimized out>, res=res@entry=0x0) at src/thread/pthread_join.c:24
0000002 0x00007f7db0c2a18c in __pthread_detach (t=<optimized out>) at src/thread/pthread_detach.c:9
0000003 __pthread_detach (t=<optimized out>) at src/thread/pthread_detach.c:4
0000004 0x000055e353dce058 in job_thread (arg=0x55e35419f3e8) at job.c:423
0000005 0x000055e353dd2f2b in jobq_server (arg=arg@entry=0x55e353e3f880 <job_queue>) at jobq.c:485
0000006 0x00007f7db0b51116 in lmgr_thread_launcher (x=0x55e3541a1248) at lockmgr.c:928
0000007 0x00007f7db0c29c4c in start (p=<optimized out>) at src/thread/pthread_create.c:144
0000008 0x00007f7db0c2bcf2 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
---
TagsNo tags attached.
bareos-master: impactyes
bareos-master: actionfixed
bareos-18.2: impactyes
bareos-18.2: actionnone
bareos-17.2: impactyes
bareos-17.2: actionnone
bareos-16.2: impactyes
bareos-16.2: actionnone
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

bratkartoffel

bratkartoffel

2019-04-05 12:29

reporter  

fix-bsmtp-segfault.patch (1,449 bytes)
Patch from Michael Cassaniti, posted here:
https://bugs.alpinelinux.org/issues/10156#note-5
Until this issue is fixed upstream this patch is needed.

diff --git a/src/lib/jcr.c b/src/lib/jcr.c
index 00bfe8c87..338f90e59 100644
--- a/src/lib/jcr.c
+++ b/src/lib/jcr.c
@@ -77,6 +77,7 @@ static pthread_mutex_t jcr_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t job_start_mutex = PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t last_jobs_mutex = PTHREAD_MUTEX_INITIALIZER;

+static bool jcr_initialized = false;
 #ifdef HAVE_WIN32
 static bool tsd_initialized = false;
 static pthread_key_t jcr_key;         /* Pointer to jcr for each thread */
@@ -351,6 +352,8 @@ static void create_jcr_key()
       berrno be;
       Jmsg1(NULL, M_ABORT, 0, _("pthread key create failed: ERR=%s\n"),
             be.bstrerror(status));
+   } else {
+     jcr_initialized = true;
    }
 }

@@ -719,7 +722,10 @@ void set_jcr_in_tsd(JCR *jcr)
  */
 JCR *get_jcr_from_tsd()
 {
-   JCR *jcr = (JCR *)pthread_getspecific(jcr_key);
+   JCR *jcr = (JCR *)INVALID_JCR;
+   if (jcr_initialized){
+     jcr = (JCR *)pthread_getspecific(jcr_key);
+   }

    /*
     * Set any INVALID_JCR to NULL which the rest of BAREOS understands
@@ -736,7 +742,7 @@ JCR *get_jcr_from_tsd()
  */
 uint32_t get_jobid_from_tsd()
 {
-   JCR *jcr = (JCR *)pthread_getspecific(jcr_key);
+   JCR *jcr = get_jcr_from_tsd();
    uint32_t JobId = 0;

    if (jcr && jcr != INVALID_JCR) {
fix-bsmtp-segfault.patch (1,449 bytes)
pthread-double-detach.patch (1,238 bytes)
This patch fixes a double pthread_detach(), which is undefined behaviour
and leads to a segfault when running with musl libc.
Until this issue is fixed upstream this patch is needed.

--- a/src/dird/ua_server.c
+++ b/src/dird/ua_server.c
@@ -73,7 +73,15 @@
    UAContext *ua;
    JCR *jcr;
 
-   pthread_detach(pthread_self());
+   /* only detach if not yet detached */
+   int _detachstate;
+   pthread_attr_t _gattr;
+   pthread_getattr_np(pthread_self(), &_gattr);
+   pthread_attr_getdetachstate(&_gattr, &_detachstate);
+   pthread_attr_destroy(&_gattr);
+   if(_detachstate != PTHREAD_CREATE_DETACHED) {
+      pthread_detach(pthread_self());
+   }
 
    jcr = new_control_jcr("-Console-", JT_CONSOLE);
 
--- a/src/dird/job.c
+++ b/src/dird/job.c
@@ -420,7 +420,16 @@
 {
    JCR *jcr = (JCR *)arg;
 
-   pthread_detach(pthread_self());
+   /* only detach if not yet detached */
+   int _detachstate;
+   pthread_attr_t _gattr;
+   pthread_getattr_np(pthread_self(), &_gattr);
+   pthread_attr_getdetachstate(&_gattr, &_detachstate);
+   pthread_attr_destroy(&_gattr);
+   if(_detachstate != PTHREAD_CREATE_DETACHED) {
+      pthread_detach(pthread_self());
+   }
+
    Dsm_check(100);
 
    Dmsg0(200, "=====Start Job=========\n");
arogge

arogge

2019-04-11 09:42

developer   ~0003329

First of all: thank you very much for putting this amount of effort into this.

To get your patches into Bareos easily, you should clone the git repository, apply your changes against master and then either create a github pull-request (this is the preferred way) or git format-patch the commits and send them to the bareos-devel mailing-list.
see also: https://docs.bareos.org/DeveloperGuide/generaldevel.html#contributions

The director-patch will need work. It is the exact same code twice, so it should be moved into its own function.
bratkartoffel

bratkartoffel

2019-04-11 18:18

reporter   ~0003333

Thank you for taking a look at the patches. Yes, I'll create a PR at github, I just wanted to make sure the patches / approach is correct.
bratkartoffel

bratkartoffel

2019-04-18 15:39

reporter   ~0003342

PR open: https://github.com/bareos/bareos/pull/169
arogge

arogge

2019-07-10 17:46

developer   ~0003436

Can you please take a look at my branch (as I already suggested in your PR). I guess we can merge the change, but I cannot test on Alpine.
bratkartoffel

bratkartoffel

2019-07-11 10:36

reporter   ~0003439

Thanks for you change, I've seen the PR at github.
I'll compile and install it on saturday and test it next week. As far as I see you haven't changed that much, so I expect it to work without problems.
arogge

arogge

2019-07-11 14:08

developer   ~0003441

The new PR is https://github.com/bareos/bareos/pull/220
Most changes concern CMake.
bratkartoffel

bratkartoffel

2019-07-15 09:35

reporter   ~0003461

The first batch of backups ran without a problem yesterday, I'll play with bconsole the next days and give you feedback here.
But as far as I see It looks very good

Related Changesets

bareos: master ad273196

2019-04-18 15:30:10

Simon Frankenberger


Committer: arogge

Ported: N/A

Details Diff
dird: Fix pthread issues

Fixes two double pthread_detach() calls and one pthread_getspecific()
without initialization, all leading to a SEGFAULT with musl libc and
undefined behaviour on other libc implementations.

This closes Bug 0001073
Affected Issues
0001073
mod - core/src/dird/dird.cc Diff File
mod - core/src/dird/dird.h Diff File
mod - core/src/dird/job.cc Diff File
mod - core/src/dird/ua_server.cc Diff File
mod - core/src/lib/jcr.cc Diff File

Issue History

Date Modified Username Field Change
2019-04-05 12:29 bratkartoffel New Issue
2019-04-05 12:29 bratkartoffel File Added: fix-bsmtp-segfault.patch
2019-04-05 12:29 bratkartoffel File Added: pthread-double-detach.patch
2019-04-11 09:29 arogge Assigned To => arogge
2019-04-11 09:29 arogge Status new => assigned
2019-04-11 09:42 arogge Status assigned => feedback
2019-04-11 09:42 arogge Note Added: 0003329
2019-04-11 18:18 bratkartoffel Note Added: 0003333
2019-04-11 18:18 bratkartoffel Status feedback => assigned
2019-04-18 15:39 bratkartoffel Note Added: 0003342
2019-07-10 17:46 arogge Status assigned => feedback
2019-07-10 17:46 arogge Note Added: 0003436
2019-07-11 10:36 bratkartoffel Note Added: 0003439
2019-07-11 10:36 bratkartoffel Status feedback => assigned
2019-07-11 14:08 arogge Note Added: 0003441
2019-07-15 09:27 arogge Assigned To arogge =>
2019-07-15 09:27 arogge Status assigned => feedback
2019-07-15 09:35 bratkartoffel Note Added: 0003461
2019-07-15 09:35 bratkartoffel Status feedback => new
2019-07-16 10:22 arogge Changeset attached => bareos master ad273196
2019-08-01 10:14 arogge Assigned To => arogge
2019-08-01 10:14 arogge Status new => resolved
2019-08-01 10:14 arogge Resolution open => fixed
2019-08-01 10:14 arogge Fixed in Version => 19.2.1
2019-08-01 10:14 arogge bareos-master: impact => yes
2019-08-01 10:14 arogge bareos-master: action => fixed
2019-08-01 10:14 arogge bareos-18.2: impact => yes
2019-08-01 10:14 arogge bareos-18.2: action => none
2019-08-01 10:14 arogge bareos-17.2: impact => yes
2019-08-01 10:14 arogge bareos-17.2: action => none
2019-08-01 10:14 arogge bareos-16.2: impact => yes
2019-08-01 10:14 arogge bareos-16.2: action => none