shibd crashes with pthread_mutex assertion under high load
Description
Environment
Ubuntu 12.04.3 LTS
Linux HOST 3.2.0-53-generic #81-Ubuntu SMP Thu Aug 22 21:01:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Attachments
Activity
Scott Cantor February 18, 2015 at 8:07 PM
Just noting, I looked at the stack trace and once again there is no question that any such bug is a bug in the driver or ODBC library.
I leave these issues open just to track any information people find about which drivers or libraries actually work.
Scott Cantor August 15, 2014 at 2:45 PM
shibd works fine threaded so I'm cautiously skeptical that any deadlock isn't again a bug in ODBC or the driver. But I haven't even looked at anything you found yet, so anything is possible. But a default to 1 CPU is definitely not needed in the normal case. If it solves a problem with ODBC, that's certainly useful information.

Martin Hitschel August 15, 2014 at 2:40 PMEdited
We now found one workaround. The problem seems to be multiple CPU hosts. If the shibd on each cluster member is locked down to using just one CPU (using Linux command 'taskset'), then both the RH5 fix for Unixodbc and the Ubuntu-distributed Unixodbc behave well in combination with Shibd. This seems to work for us, and is tested under high load, and independent of the number of cluster members. We assume it is ok computing-wise to let Shibd just run on one CPU, so if it is not easy to provide a general solution able to cope with more than one CPU, it might be sensible to deliver the Shibd / its start-stop-script OOTB in a state that uses only one.

Martin Hitschel August 15, 2014 at 8:45 AMEdited
The bug has been reported, https://bugs.launchpad.net/ubuntu/+source/unixodbc/+bug/1322263, but no reaction so far.
Per the recommendation on users@shibboleth.net (2014-01-21) we compiled the ODBC library from RH5 sources: unixODBC64-2.2.14-3.el5.src.rpm and mysql-connector-odbc64-5.1.8-1.el5.src.rpm. Due to an MySQL client library API change in Ubuntu 12, the MySQL Connector 5.1.8 was replaced by a version as close to the RH5 version as possible, r955 (=version 5.1.9).
As a result, the segfault in the ODBC libraries seems to have disappeared, leaving a deadlock in Shibd code proper. Please find attached the according Stacktrace (file 2014-08-15-RH5-unixodbc.log). Overall symptoms are that instead of segfaulting instantly, shibd does not react on client requests for about 30 seconds, and then terminates.
Following regular updates, we are now at SP 2.5.3 BTW (SWITCH-provided Ubuntu repository).
Scott Cantor October 30, 2013 at 8:49 PM
Also note that you had best not be using prefork with Apache and ending up with massive numbers of worker threads. If you are, that's a self-inflicted problem you can fix by switching to worker mode. It could be a resource exhaustion issue related to that mistake.
shibd crashes with pthread_mutex assertion under high load.
Session database is using ODBC/MySQL connection.
Output:
shibd: pthread_mutex_lock.c:62: _pthread_mutex_lock: Assertion `mutex->data._owner == 0' failed.
Aborted
Strack trace:
#0 0x00007ffff6b31425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff6b34b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff6b2a0ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff6b2a192 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff6ec3efb in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007fffef30bac6 in SQLDriverConnect ()
from /usr/lib/x86_64-linux-gnu/libodbc.so.1
#6 0x00007fffef565109 in ?? () from /usr/lib/shibboleth/odbc-store.so
#7 0x00007fffef567743 in ?? () from /usr/lib/shibboleth/odbc-store.so
#8 0x00007fffef568840 in ?? () from /usr/lib/shibboleth/odbc-store.so
#9 0x00007ffff7abcfba in ?? () from /usr/lib/libshibsp.so.6
#10 0x00007ffff7af1af8 in shibsp::ListenerService::receive(shibsp::DDF&, std::ostream&) () from /usr/lib/libshibsp.so.6
#11 0x00007ffff7af4450 in shibsp::ServerThread::job() ()
from /usr/lib/libshibsp.so.6
#12 0x00007ffff7af5948 in shibsp::ServerThread::run() ()
from /usr/lib/libshibsp.so.6
#13 0x00007ffff7af5a01 in server_thread_fn(void*) ()
from /usr/lib/libshibsp.so.6
#14 0x00007ffff6ec1e9a in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007ffff6beeccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()