Driver: weird locking/scheduling issue - Linux

This is a discussion on Driver: weird locking/scheduling issue - Linux ; Hello Guys I'm writing a driver and it fails in some unpredictable case. It happens since I switched to kernel 2.6.18 and never with an older kernel (tested with 2.6.17). The driver works few hours then the system freezes with ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: Driver: weird locking/scheduling issue

  1. Driver: weird locking/scheduling issue

    Hello Guys

    I'm writing a driver and it fails in some unpredictable case. It happens since
    I switched to kernel 2.6.18 and never with an older kernel (tested with
    2.6.17). The driver works few hours then the system freezes with a stack trace.
    The reasons below let me think that it might be a locking issue (or a function
    badly used):
    1) I saw in the stack trace the function try_to_wake_up()
    2) I got once this message on my terminal: "BUG: scheduling while atomic"
    comming from my userland process and using the driver

    The driver code is splitted in two parts:
    1) a handler executed on my device hardware interrupt
    2) a user-land application doing read() operation on my driver

    For clarity reason, I only cut'n paste the code that involves locking:

    syscall_read()
    {
    if (wait_event_interruptible(&my_queue, condition))
    return -ERESTARTSYS;
    spin_lock_interrupt(&my_lock, flags);
    // read the linked-list (very fast)
    condition = 0;
    spin_unlock_interrupt(&mylock, flags)
    copy_to_user( result_of_processing );
    }

    interrupt_handler()
    {
    preempt_disable(); // neccessary ?

    spin_lock_irqsave(&my_lock, flags);
    // fill the linked-list (very fast)
    condition = 1;
    spin_unlock_irqrestore(&my_lock, flags);
    wake_up_interruptible(&my_queue);

    preempt_enable();
    }

    Another point, the code is running on a SMP system (2 processors) and the
    kernel is preemptible (CONFIG_PREEMPT). I also tested with a debug kernel
    (debug stack, memory allocation, locks, ...) but it fails after few minutes
    with "stack-overflow: " and no other information.

    Do you see an important mistake ?

    Thanks a lot for your help.

    --
    F.J.

  2. Re: Driver: weird locking/scheduling issue

    In article <4575d9f4$0$2196$426a74cc@news.free.fr>,
    F.Julien wrote:

    >interrupt_handler()
    >{
    > preempt_disable(); // neccessary ?


    While this probably doesn't actually hurt anything I don't
    think you need to worry about being preempted while servicing
    an interrupt.

    >Another point, the code is running on a SMP system (2 processors) and the
    >kernel is preemptible (CONFIG_PREEMPT).


    So have you tried it with one cpu? Have you tried without CONFIG_PREEMPT?

    --
    http://www.spinics.net/lists/kernel/

  3. Re: Driver: weird locking/scheduling issue

    >> interrupt_handler()
    >> {
    >> preempt_disable(); // neccessary ?

    >
    > While this probably doesn't actually hurt anything I don't
    > think you need to worry about being preempted while servicing
    > an interrupt.


    OK.
    I dumped the value of preempt_count() and it was > 65000. Does that make sense
    to you ? If I understand correctly, preempt_enable() and preempt_disable()
    increment and decrement an atomic reference counter(). The value returned by
    preempt_count() seems to be very high.

    >
    >> Another point, the code is running on a SMP system (2 processors) and the
    >> kernel is preemptible (CONFIG_PREEMPT).

    >
    > So have you tried it with one cpu? Have you tried without CONFIG_PREEMPT?


    Yes I tried and it froze too.

    I switched to a lock free algorithm (as far as I have 1 reader and 1 writer)
    and I no longer have any lock in my interrupt handler.

    But it did not fix my problem. Now, I've only one lock (semaphore) in the
    ioctl() syscall:

    syscall_ioctl()
    {
    switch( cmd )
    {
    case CMD1:
    if ( lock_interruptible(&lock) )
    return -ERESTARTSYS;
    // do my stuff
    up(&lock);
    break;
    case CMD2:
    if ( lock_interruptible(&lock) )
    return -ERESTARTSYS;
    // do my stuff
    up(&lock);
    break;
    }
    }

    It is the only lock I have in the driver and I only use it in syscall_ioctl()
    function. I need to keep that lock because I access to my hardware and it has
    to me be protected against multiple access.

    To freeze my system (and sometimes gets and stack trace), I did a tiny
    application which open()/ioctl()/close() my driver. After a hour running in
    loop, the machine locks up. I happen faster if I got an interrupt from my
    hardware (obviously, I do not use the lock in the interrupt handler).

    I've absolutely no idea where the problem is coming from

  4. Re: Driver: weird locking/scheduling issue

    In article <457af5cf$0$19731$426a74cc@news.free.fr>,
    F.Julien wrote:

    >I dumped the value of preempt_count() and it was > 65000. Does that make sense
    >to you ?


    Could it be a negative 16 bit number?

    >To freeze my system (and sometimes gets and stack trace), I did a tiny
    >application which open()/ioctl()/close() my driver. After a hour running in
    >loop, the machine locks up. I happen faster if I got an interrupt from my
    >hardware (obviously, I do not use the lock in the interrupt handler).
    >
    >I've absolutely no idea where the problem is coming from


    It might be in code you haven't posted here.


  5. Re: Driver: weird locking/scheduling issue

    >
    > syscall_read()
    > {
    > if (wait_event_interruptible(&my_queue, condition))
    > return -ERESTARTSYS;
    > spin_lock_interrupt(&my_lock, flags);
    > // read the linked-list (very fast)
    > condition = 0;
    > spin_unlock_interrupt(&mylock, flags)
    > copy_to_user( result_of_processing );
    > }
    >
    > interrupt_handler()
    > {
    > preempt_disable(); // neccessary ?
    >
    > spin_lock_irqsave(&my_lock, flags);
    > // fill the linked-list (very fast)
    > condition = 1;
    > spin_unlock_irqrestore(&my_lock, flags);
    > wake_up_interruptible(&my_queue);
    >
    > preempt_enable();
    > }
    >

    I imagine you would use spin_lock_irqsave(spin_lock_irqrestore) in both
    places.

    Max


  6. Re: Driver: weird locking/scheduling issue

    In article <1166205182.219039.270530@t46g2000cwa.googlegroups. com>,
    Max Y wrote:

    >> syscall_read()
    >> {
    >> if (wait_event_interruptible(&my_queue, condition))
    >> return -ERESTARTSYS;
    >> spin_lock_interrupt(&my_lock, flags);
    >> // read the linked-list (very fast)
    >> condition = 0;
    >> spin_unlock_interrupt(&mylock, flags)
    >> copy_to_user( result_of_processing );
    >> }
    >>
    >> interrupt_handler()
    >> {
    >> preempt_disable(); // neccessary ?
    >>
    >> spin_lock_irqsave(&my_lock, flags);
    >> // fill the linked-list (very fast)
    >> condition = 1;
    >> spin_unlock_irqrestore(&my_lock, flags);
    >> wake_up_interruptible(&my_queue);
    >>
    >> preempt_enable();
    >> }


    >I imagine you would use spin_lock_irqsave(spin_lock_irqrestore) in both
    >places.


    I suspect you are right. Grep'ing the entire drivers tree in a recent
    kernel I find zero spin_lock_interrupt calls.

    If the syscall_read() holds the spin_lock when the interrupt handler
    is invoked, the handler will spin forever trying to get the lock that'll
    never be released.

    --
    http://www.spinics.net/lists/kernel/

  7. Re: Driver: weird locking/scheduling issue

    >> I imagine you would use spin_lock_irqsave(spin_lock_irqrestore) in both
    >> places.

    >
    > I suspect you are right. Grep'ing the entire drivers tree in a recent
    > kernel I find zero spin_lock_interrupt calls.
    >
    > If the syscall_read() holds the spin_lock when the interrupt handler
    > is invoked, the handler will spin forever trying to get the lock that'll
    > never be released.


    Correct me if I'm wrong, spin_lock_irqsave() disables interrupts thus the
    interrupt handler would never be invoked.

    I finally found the issue, it's inside my syscall_ioctl(). The stack in the
    call was very high (~6500 bytes). I reduced my stack size (replaced some arrays
    with kmalloc()) and it works fine in both release and debug version of the kernel.

    Thank you both for your help.

  8. Re: Driver: weird locking/scheduling issue

    In article <45832157$0$9735$426a34cc@news.free.fr>,
    F.Julien wrote:

    >Correct me if I'm wrong, spin_lock_irqsave() disables interrupts thus the
    >interrupt handler would never be invoked.


    1. It uses spin_lock_interrupt not spin_lock_irqsave.

    2. It only disables interrupts on the current cpu. (Which is usually ok.)


  9. Re: Driver: weird locking/scheduling issue


    F.Julien wrote:
    > >> I imagine you would use spin_lock_irqsave(spin_lock_irqrestore) in both
    > >> places.

    > >
    > > I suspect you are right. Grep'ing the entire drivers tree in a recent
    > > kernel I find zero spin_lock_interrupt calls.
    > >
    > > If the syscall_read() holds the spin_lock when the interrupt handler
    > > is invoked, the handler will spin forever trying to get the lock that'll
    > > never be released.

    >
    > Correct me if I'm wrong, spin_lock_irqsave() disables interrupts thus the
    > interrupt handler would never be invoked.


    It disables the interrupts and save interrupt flags, other other
    function restore (enable) the flag again. This will introduce interrupt
    latence as you ISR only runs after the lock is release (not block
    forever). That's how you use spin lock to protect resource between an
    ISR and other parites. While the other version spin_lock_irq() (I think
    this may be your spin_lock_interrupt?) does pretty much the same
    thing, but only used when you are sure you are the sole party that
    manupilate the interrupt in question.

    Max


+ Reply to Thread