On 07/26/2008 10:23 AM, Sotiris Tsimbonis wrote:
>> For those who are willing to help test beta but don't want to see this
>> crash: if you can live with the possibly decreased performance without
>> threads, please rebuild named without threads. This is most likely an
>> inter-thread race, and shouldn't happen without threads.

>
> We don't want to disable threads (yet), so we applied the above patch.
> We also applied the one mentioned at
> http://marc.info/?l=bind-users&m=121694051627133&w=2 and have increased
> ISC_SOCKET_MAXECENTS to 128 because of the problem mentioned in
> http://marc.info/?l=bind-users&m=121619301702838&w=2
>
> After 12 hours of running, no problems yet.


Unfortunately, it crashed again..

27-Jul-2008 03:50:17.336 general: resolver.c:5494: REQUIRE((((query) !=
0) && (((const isc__magic_t *)(query))->magic == ((('Q') << 24 | ('!')
<< 16 | ('!') << 8 | ('!')))))) failed
27-Jul-2008 03:50:17.336 general: exiting (due to assertion failure)

There was a core this time, and here is the gdb output ..

root@athns02 354>/usr/local/bin/gdb /opt/bind/sbin/named
/var/core/core_athns02_named_102_100_1217119817_28943
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
(no debugging symbols found)
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libsocket.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/libsocket.so.1
Reading symbols from /lib/libscf.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libscf.so.1
Reading symbols from /lib/libpthread.so.1...
warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074

(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.1
Reading symbols from /lib/libthread.so.1...
warning: Lowest section in /lib/libthread.so.1 is .dynamic at 00000074
(no debugging symbols found)...done.
Loaded symbols for /lib/libthread.so.1
Reading symbols from /lib/libc.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.1
Reading symbols from /lib/libdoor.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libdoor.so.1
Reading symbols from /lib/libuutil.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/libuutil.so.1
Reading symbols from
/platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1...(no debugging symbols
found)...done.
Loaded symbols for /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
Reading symbols from /lib/nss_files.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/nss_files.so.1
Reading symbols from /lib/ld.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/ld.so.1

Core was generated by `/opt/bind/sbin/named -u named -t /opt/bind -c
/etc/named.conf'.
Program terminated with signal 6, Aborted.
#0 0xff1c16e8 in _lwp_kill () from /lib/libc.so.1
(gdb) thread apply all bt full

Thread 5 (process 94479 ):
#0 0xff1c097c in ___sigtimedwait () from /lib/libc.so.1
No symbol table info available.
#1 0xff1b45fc in __sigtimedwait () from /lib/libc.so.1
No symbol table info available.
#2 0xff1ac964 in __posix_sigwait () from /lib/libc.so.1
No symbol table info available.
#3 0x001cf408 in isc_app_run ()
No symbol table info available.
#4 0x0003da94 in main ()
No symbol table info available.

Thread 4 (process 356623 ):
#0 0xff1c1058 in ioctl () from /lib/libc.so.1
No symbol table info available.
#1 0x001d9d48 in ?? ()
No symbol table info available.
#2 0x001d9d48 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (process 291087 ):
#0 0xff1c0598 in __lwp_park () from /lib/libc.so.1
No symbol table info available.
#1 0xff1ba514 in cond_sleep_queue () from /lib/libc.so.1
No symbol table info available.
#2 0xff1ba630 in cond_wait_queue () from /lib/libc.so.1
No symbol table info available.
#3 0xff1baaa8 in cond_wait_common () from /lib/libc.so.1
No symbol table info available.
#4 0xff1bac40 in _cond_timedwait () from /lib/libc.so.1
No symbol table info available.
#5 0xff1bad34 in cond_timedwait () from /lib/libc.so.1
No symbol table info available.
#6 0xff1bad74 in pthread_cond_timedwait () from /lib/libc.so.1
No symbol table info available.
#7 0x001e070c in isc_condition_waituntil ()
No symbol table info available.
#8 0x001ce480 in ?? ()
No symbol table info available.
#9 0x001ce480 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (process 225551 ):
#0 0x001123e8 in ?? ()
No symbol table info available.
#1 0x000a598c in dns_message_rendersection ()
No symbol table info available.
#2 0x000339e4 in ns_client_send ()
No symbol table info available.
#3 0x00047420 in ?? ()
No symbol table info available.
#4 0x00047420 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (process 160015 ):
#0 0xff1c16e8 in _lwp_kill () from /lib/libc.so.1
No symbol table info available.
#1 0xff15ff40 in raise () from /lib/libc.so.1
No symbol table info available.
#2 0xff140160 in abort () from /lib/libc.so.1
No symbol table info available.
#3 0x0003cfe0 in ?? ()
No symbol table info available.
#4 0x0003cfe0 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)


Sot.