On Tue, 16 Oct 2007, Victor M. Blood wrote:

>> Do you have the debugging options (INVARIANTS and INVARIANT_SUPPORT)
>> enabled? They are now disabled in RELENG_7. If not you will just get a
>> deadlock when you are unlucky

> Yes, all debug options are stay by default.
> = KERN
> ...
> # Debugging for use in -current
> options KDB # Enable kernel debugger support.
> options DDB # Support DDB.
> options GDB # Support remote GDB.
> options INVARIANTS # Enable calls of extra sanity checking
> options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS
> options WITNESS # Enable checks to detect deadlocks and cycles
> options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
> ...
> I have updated ipfilter to 4.1.27 until cvsup with tag=RELING_7 and sources
> was from 10.10.2007 which system shows me panic with _sx_sleep() and
> _rw_sleep() with ipf 4.1.23. After some tests ) /ping, smbfs, telnet,
> etc.../ Update of ipfilter was finished at 17:00 GMT+3, now: # uptime 23:51
> up 7:12, 2 users, load averages: 1,32 1,23 1,09
> Now I'm have update CVS-tree, and build world for RELENG_7. With ipfilter
> 4.1.23 system stay alive 1-2 min with inet work, I have been compelled to
> disable ipfilter (ipf -D) for work with network. While any failures are not
> present, all works normally.

The bug in ipfilter has to do with using a sleepable lock class in an
interrupt or a software interrupt thread. This can lead to deadlocks,
although is relatively unlikely to do so, so is reported by invariants testing
as a fatal condition. The panic won't turn up without invariants enabled, and
in practice the deadlock is quite unlikely, but reflects a violation of the
assumptions under which kernel synchronization is designed to work.
Switching to a non-sleepable lock class doesn't provide an instant solution
because the non-sleepable lock will then be held over a potentially sleepable
path for managing the firewall from user space (if a copyin/copyout results in
a page fault that sleeps waiting on disk I/O, or worse, network I/O from
network-backed swap, which could lead directly to the deadlock). Chances are,
this is relatively easy to fix, but someone needs to do that -- ideally
someone very familiar with ipfilter. :-)

In practice, I wouldn't expect the deadlock to occur much/at all, FWIW, so
users with common configurations won't run into a problem, so with invariants
disabled you may well be fine.

