Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes - Kernel

This is a discussion on Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes - Kernel ; On Fri, 19 Oct 2007, Jarek Poplawski wrote: > But then... your patch seems to make it possible, because it enables > irq to the initial state of the counter. Of course, this could happen > on closing only. That's ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 28 of 28

Thread: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

  1. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Fri, 19 Oct 2007, Jarek Poplawski wrote:

    > But then... your patch seems to make it possible, because it enables
    > irq to the initial state of the counter. Of course, this could happen
    > on closing only.


    That's because free_irq() does not disable the interrupt in the correct
    manner. The scenario is more or less like this:

    phy_interrupt() [depth == 0]
    disable_irq()
    depth++; status |= IRQ_DISABLED;
    ....
    free_irq() [depth == 1]
    status |= IRQ_DISABLED;
    ....
    phy_change() [depth == 1]
    enable_irq()
    depth--; status &= ~IRQ_DISABLED;
    oops!

    Now if free_irq() correctly incremented the depth counter, then the last
    enable_irq() would still decrement it, but with its initial value of 2 it
    would not change the status to reenable the line.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Fri, Oct 19, 2007 at 12:38:29PM +0100, Maciej W. Rozycki wrote:
    > On Thu, 18 Oct 2007, Maciej W. Rozycki wrote:
    >
    > > > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
    > > > racy: eg. if it's done during phy_stop() it can check just before
    > > > the flag is set and reenable interrupts just after phy_stop() ends.

    > >
    > > I remember having a look into it, but it was long ago and I cannot
    > > immediately recall the conclusion. Which means it is either broken or
    > > deserves a comment as non-obvious. I will have a look into it again, but
    > > I am resource-starved a little at the moment, sorry.

    >
    > Well, I have now recalled what the issue is -- we just plainly and simply
    > want to avoid a hardirq spinlock for the very reason we do not do all the
    > processing in the hardirq handler. The thing is we make accesses to the
    > MDIO bus with the phydev lock held and it may take ages until these
    > accesses will have completed. And we cannot afford keeping interrupts
    > disabled for so long.
    >
    > So the only way is to make the check for the HALTED state lockless and
    > make sure any race condition is handled gracefully and does not lead to
    > inconsistent behaviour. Which I think as of what we have in the
    > net-2.6.24 tree is the case, but there are never too many eyes to look at
    > a piece of code, so if anybody feels like proving me wrong, then just go
    > ahead!


    Actually I'm not convinced with this explanation. It seems to me that
    since there are such serious locking problems (especially with rntl),
    there could be once more considered a private workqueue. You've
    written earlier about being a lonely user of this code. But, since
    Benjamin offered his help with changing to mutexes, which looks like
    very reasonable idea to me (probably I miss most of the points...),
    maybe it's very good opportunity to both: make this code better and
    double the user base! I'm interested in looking for such solution
    if Benjamin thinks there could be too few problems for him... So,
    let somebody tell us what could be wrong with this idea?

    Cheers (till Monday),
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Fri, 19 Oct 2007, Jarek Poplawski wrote:

    > Actually I'm not convinced with this explanation. It seems to me that
    > since there are such serious locking problems (especially with rntl),
    > there could be once more considered a private workqueue. You've
    > written earlier about being a lonely user of this code. But, since


    Well, this will change eventually when I submit the patch for Broadcom
    SiByte platforms so that sb1250-mac.c will be able to request an interrupt
    for the PHYs. All the infrastructure is in place except from platform
    code to configure the SOC's GPIO line used for the interrupt input (when
    applicable) and let the driver know what the line actually is.

    > Benjamin offered his help with changing to mutexes, which looks like
    > very reasonable idea to me (probably I miss most of the points...),
    > maybe it's very good opportunity to both: make this code better and
    > double the user base! I'm interested in looking for such solution
    > if Benjamin thinks there could be too few problems for him... So,
    > let somebody tell us what could be wrong with this idea?


    I do not object and can certainly cooperate, but cannot make it a high
    priority work for me at the moment -- sorry.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes


    > Actually I'm not convinced with this explanation. It seems to me that
    > since there are such serious locking problems (especially with rntl),
    > there could be once more considered a private workqueue. You've
    > written earlier about being a lonely user of this code. But, since
    > Benjamin offered his help with changing to mutexes, which looks like
    > very reasonable idea to me (probably I miss most of the points...),
    > maybe it's very good opportunity to both: make this code better and
    > double the user base! I'm interested in looking for such solution
    > if Benjamin thinks there could be too few problems for him... So,
    > let somebody tell us what could be wrong with this idea?


    My main problem is time :-) But I need to do the change to mutex if I am
    to use phylib for emac. I can't tell when I'll have time to do it tho.

    Ben.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
    > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
    > > On 10/18, Jarek Poplawski wrote:
    > > >
    > > > +/**
    > > > + * flush_work_sync - block until a work_struct's callback has terminated

    > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^
    > > Hmm...
    > >
    > > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
    > > > + * if the work is queued.

    > >
    > > Yes, it won't block, but will spin in busy-wait loop until all other works
    > > scheduled before this work are finished. Not good. After that it really
    > > blocks waiting for this work to complete.
    > >
    > > And I am a bit confused. We can't use flush_workqueue() because some of the
    > > queued work_structs may take rtnl_lock, yes? But in that case we can't use
    > > the new flush_work_sync() helper as well, no?


    OK, I know I'm dumber and dumber everyday, but it seems in a hurry I
    got it wrong again or miss something (as usual): these all flushes are
    rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
    looks perfectly fine... (Or am I wrong because: ...?)

    Then, if by any chance I'm right, something like flush_work_sync
    (or changed flush_scheduled_work, if there is no problem with such
    a change of implementation) could be safely (if it's called without
    locks used by flushed work only) done cancel_work_sync() way, by
    running a work function after try_to_grab_pending() returns 1 (after
    list_del_init - of course without respecting a queue order).

    Regards,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On 10/22, Jarek Poplawski wrote:
    >
    > On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
    > > On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
    > > > On 10/18, Jarek Poplawski wrote:
    > > > >
    > > > > +/**
    > > > > + * flush_work_sync - block until a work_struct's callback has terminated
    > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^
    > > > Hmm...
    > > >
    > > > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
    > > > > + * if the work is queued.
    > > >
    > > > Yes, it won't block, but will spin in busy-wait loop until all other works
    > > > scheduled before this work are finished. Not good. After that it really
    > > > blocks waiting for this work to complete.
    > > >
    > > > And I am a bit confused. We can't use flush_workqueue() because some of the
    > > > queued work_structs may take rtnl_lock, yes? But in that case we can't use
    > > > the new flush_work_sync() helper as well, no?

    >
    > OK, I know I'm dumber and dumber everyday,


    You are not alone. I have the same feeling about myself!

    > these all flushes are
    > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
    > looks perfectly fine


    Yes,

    > Then, if by any chance I'm right, something like flush_work_sync
    > (or changed flush_scheduled_work, if there is no problem with such
    > a change of implementation) could be safely (if it's called without
    > locks used by flushed work only) done cancel_work_sync() way, by
    > running a work function after try_to_grab_pending() returns 1


    If this work doesn't rearm itself - yes. (otherwise, the same ->func
    can run twice _at the same time_)

    But again, in this case wait_on_work() after try_to_grab_pending() == 1
    doesn't block, so we can just do

    if (cancel_work_sync(w))
    w->func();

    Oleg.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
    > On 10/22, Jarek Poplawski wrote:

    ....
    > > OK, I know I'm dumber and dumber everyday,

    >
    > You are not alone. I have the same feeling about myself!


    Feeling is not the same, only true knowledge counts!

    >
    > > these all flushes are
    > > rtnl lockup vulnerable wrt. other work functions, but cancel_work_sync
    > > looks perfectly fine

    >
    > Yes,
    >
    > > Then, if by any chance I'm right, something like flush_work_sync
    > > (or changed flush_scheduled_work, if there is no problem with such
    > > a change of implementation) could be safely (if it's called without
    > > locks used by flushed work only) done cancel_work_sync() way, by
    > > running a work function after try_to_grab_pending() returns 1

    >
    > If this work doesn't rearm itself - yes. (otherwise, the same ->func
    > can run twice _at the same time_)
    >
    > But again, in this case wait_on_work() after try_to_grab_pending() == 1
    > doesn't block, so we can just do
    >
    > if (cancel_work_sync(w))
    > w->func();


    Of course, I should have written it much shorter by saying
    flush_scheduled_work could be done internally just like you suggested!

    My point is to make this all safer and simpler, so we don't have to
    remember each time about these differences wrt. locking. Then checking
    of such patches could be much easier. Unless this current behavior
    of flush_scheduled_work has any real advantages, worth of this
    potential unsafety. (Btw. is there any reason to use this with
    rearming works, anyway?)

    Thanks,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Mon, Oct 22, 2007 at 10:02:59PM +0400, Oleg Nesterov wrote:
    ....
    > If this work doesn't rearm itself - yes. (otherwise, the same ->func
    > can run twice _at the same time_)
    >
    > But again, in this case wait_on_work() after try_to_grab_pending() == 1
    > doesn't block, so we can just do
    >
    > if (cancel_work_sync(w))
    > w->func();
    >


    ....but, if it were run just before work_clear_pending()?

    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2