Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes - Kernel

This is a discussion on Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes - Kernel ; On 19-09-2007 16:38, Maciej W. Rozycki wrote: .... > @@ -661,13 +664,22 @@ int phy_stop_interrupts(struct phy_devic > if (err) > phy_error(phydev); > > + free_irq(phydev->irq, phydev); > + > /* > - * Finish any pending work; we might have ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 28

Thread: Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

  1. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On 19-09-2007 16:38, Maciej W. Rozycki wrote:
    ....
    > @@ -661,13 +664,22 @@ int phy_stop_interrupts(struct phy_devic
    > if (err)
    > phy_error(phydev);
    >
    > + free_irq(phydev->irq, phydev);
    > +
    > /*
    > - * Finish any pending work; we might have been scheduled to be called
    > - * from keventd ourselves, but cancel_work_sync() handles that.
    > + * Cannot call flush_scheduled_work() here as desired because
    > + * of rtnl_lock(), but we do not really care about what would
    > + * be done, except from enable_irq(), so cancel any work
    > + * possibly pending and take care of the matter below.
    > */
    > cancel_work_sync(&phydev->phy_queue);


    Hi,

    Could you explain why cancel_work_sync() is better here than
    flush_scheduled_work() wrt. rtnl_lock()?

    Regards,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Mon, 15 Oct 2007, Jarek Poplawski wrote:

    > Could you explain why cancel_work_sync() is better here than
    > flush_scheduled_work() wrt. rtnl_lock()?


    Well, this is actually the bit that made cancel_work_sync() be written in
    the first place. The short story is the netlink lock is most probably
    held at this point (depending on the usage of phy_disconnect()) and there
    is also an event waiting in the queue that requires the lock, so if
    flush_scheduled_work() is called here a deadlock will happen.

    Let me find a reference for a longer story...:

    http://www.linux-mips.org/cgi-bin/me...k.ds.pg.gda.pl

    and then discussed again:

    http://www.uwsg.indiana.edu/hypermai...12.0/0593.html

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Mon, Oct 15, 2007 at 06:03:20PM +0100, Maciej W. Rozycki wrote:
    > On Mon, 15 Oct 2007, Jarek Poplawski wrote:
    >
    > > Could you explain why cancel_work_sync() is better here than
    > > flush_scheduled_work() wrt. rtnl_lock()?

    >
    > Well, this is actually the bit that made cancel_work_sync() be written in
    > the first place. The short story is the netlink lock is most probably
    > held at this point (depending on the usage of phy_disconnect()) and there
    > is also an event waiting in the queue that requires the lock, so if
    > flush_scheduled_work() is called here a deadlock will happen.
    >
    > Let me find a reference for a longer story...:
    >
    > http://www.linux-mips.org/cgi-bin/me...k.ds.pg.gda.pl
    >
    > and then discussed again:
    >
    > http://www.uwsg.indiana.edu/hypermai...12.0/0593.html
    >


    Yes, it's all right here. Sorry for bothering - I should've found this
    by myself.

    I've still some doubts about this possible enable_irq() after
    free_irq(). If it's the only handler the status would be changed again
    and at least some of this code in check_irq_resend() would be run, but
    I can miss something again or/and this doesn't matter, as well.

    Thanks,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Tue, 16 Oct 2007, Jarek Poplawski wrote:

    > Yes, it's all right here. Sorry for bothering - I should've found this
    > by myself.


    Ah, no problem -- even with the right keys you may sometimes get lost in
    the results.

    > I've still some doubts about this possible enable_irq() after
    > free_irq(). If it's the only handler the status would be changed again
    > and at least some of this code in check_irq_resend() would be run, but
    > I can miss something again or/and this doesn't matter, as well.


    Well, enable_irq() and disable_irq() themselves are nesting, so they are
    not a problem. OTOH, free_irq() does not seem to maintain the depth count
    correctly, which looks like a bug to me and which could trigger regardless
    of whether flush_scheduled_work() or cancel_work_sync() was called.

    The reason is CONFIG_DEBUG_SHIRQ which makes a simulated interrupt event
    be sent at the end of free_irq(). It looks like a problem that is
    complementary to one I signalled here:

    http://lkml.org/lkml/2007/9/12/82

    with respect to request_irq(), where, similarly, such an interrupt event
    is sent at the beginning. It looks like nobody was concerned back then,
    but perhaps it is time to do a better investigation now and propose a
    solution.

    I'll think about it and thanks for your inquisitiveness that has led to
    these conclusions.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Tue, Oct 16, 2007 at 06:19:32PM +0100, Maciej W. Rozycki wrote:
    ....
    > Well, enable_irq() and disable_irq() themselves are nesting, so they are
    > not a problem. OTOH, free_irq() does not seem to maintain the depth count
    > correctly, which looks like a bug to me and which could trigger regardless
    > of whether flush_scheduled_work() or cancel_work_sync() was called.


    I'm not sure free_irq() should maintain the depth count - rather warn
    on not zero. But, IMHO, any activity on freed irq seems suspicious to
    me (and doesn't look like very common), even if it's safe with current
    implementation.

    > The reason is CONFIG_DEBUG_SHIRQ which makes a simulated interrupt event
    > be sent at the end of free_irq(). It looks like a problem that is
    > complementary to one I signalled here:
    >
    > http://lkml.org/lkml/2007/9/12/82
    >
    > with respect to request_irq(), where, similarly, such an interrupt event
    > is sent at the beginning. It looks like nobody was concerned back then,
    > but perhaps it is time to do a better investigation now and propose a
    > solution.


    Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
    to be reasonable only in the case of possible resent irqs (so not for
    all irqs). On the other hand, it seems, proper irq handler with proper
    hardware shouldn't have any problems with such a check.

    > I'll think about it and thanks for your inquisitiveness that has led to
    > these conclusions.


    Btw., since, because of this patch, I've had a one more look at phy.c
    and there are a few more doubts, so this time I'll try to bother you
    for real:

    1) phy_change() checks PHY_HALTED flag without lock; I think it's
    racy: eg. if it's done during phy_stop() it can check just before
    the flag is set and reenable interrupts just after phy_stop() ends.

    2) phy_change() doesn't reenable irq line after it sees returns
    with errors; IMHO it should at least write some warning, but maybe
    try some safety plan, so enable_irq() and try to disable interrupts
    and free_irq() on the next call (if it happens). (But, I can be very
    wrong with this - maybe it's OK and official way.)

    3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
    not sure now if it could be dangerous after fixing #1; on the other
    hand even if we know it's not our regular interrupt, with current
    DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
    we are sure it's before/in free_irq() yet.

    4) phy_interrupt() should check return value from schedule_work() and
    enable irq on 0.

    5) phy_stop_interrupts(): maybe I miss something, but it seems
    phy_stop() is required before this, so maybe there should be a
    comment on this?

    6) phy_stop_interrupts(): if I'm not wrong with #3 calling
    phy_disable_interrupts() looks like we are not sure this phy_stop()
    really works; than maybe a WARN_ON?

    7) phy_stop_interrupts(): after above mentioned changes in
    phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
    any reason why there should be more than one skipped enable_irq(),
    so checking return from cancel_work_sync() shouldn't be enough
    instead of this atomic counter.

    8) phy_stop_interrupts(): I'm not sure this additional call from
    DEBUG_SHIRQ should be so dangerous, eg.:

    /*
    * status == PHY_HALTED &&
    * interrupts are stopped after phy_stop()
    */
    if (cancel_work_sync(...))
    enable_irq();

    free_irq(...);
    /*
    * possible schedule_work() from DEBUG_SHIRQ only,
    * but proper check for PHY_HALTED is done;
    * so, let's flush after this too:
    */
    cancel_work_sync();


    Of course, I don't know phy.c enough, so most of this can be terribly
    wrong, then feel free to forget about this - I don't expect you should
    waste any time for explaining me these things - after all they are
    doubts only.

    Regards,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes


    > Btw., since, because of this patch, I've had a one more look at phy.c
    > and there are a few more doubts, so this time I'll try to bother you
    > for real:


    .../...

    While there... is somebody interested in making the whole PHY lib
    operate a task level and use mutexes instead of spinlock ? I need that
    for drivers like EMAC (who use their own PHY layer for now), and I might
    even give a go at adapting phylib myself, but I'd like to take the
    temperature about it first.

    Basically, there is nothing in phylib that is performance critical or
    such that requires it to run at irq time and/or use locks. On the other
    hand, it complicates things in various areas. The most obvious one being
    that it prevents the network driver mii access callbacks from sleeping
    which can be annoying as MDIO can be slow, and some drivers have fancy
    muxes in there that are better off mutexed than spinlocked.

    Ben.


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Wed, Oct 17, 2007 at 10:58:09AM +0200, Jarek Poplawski wrote:
    ....
    > 5) phy_stop_interrupts(): maybe I miss something, but it seems
    > phy_stop() is required before this, so maybe there should be a
    > comment on this?
    >
    > 6) phy_stop_interrupts(): if I'm not wrong with #3 calling


    Should be:
    6) phy_stop_interrupts(): if I'm not wrong with #5 calling

    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Wed, Oct 17, 2007 at 10:58:09AM +0200, Jarek Poplawski wrote:
    ....
    > 8) phy_stop_interrupts(): I'm not sure this additional call from
    > DEBUG_SHIRQ should be so dangerous, eg.:
    >
    > /*
    > * status == PHY_HALTED &&
    > * interrupts are stopped after phy_stop()
    > */
    > if (cancel_work_sync(...))
    > enable_irq();
    >
    > free_irq(...);
    > /*
    > * possible schedule_work() from DEBUG_SHIRQ only,
    > * but proper check for PHY_HALTED is done;
    > * so, let's flush after this too:
    > */
    > cancel_work_sync();


    After rethinking, it looks like this last cancel should be useless.
    So, if phy_interrupt() schedules only if !PHY_HALTED and phy_change()
    does enable_irq() with no exeptions, it seems phy_interrupt() even
    without lock must see PHY_HALTED state before this free_irq() with
    possible DEBUG_SHIRQ call, then maybe only this safety:

    WARN_ON(work_pending(&phydev->phy_queue));


    Btw, I've read this was considered and not liked, but IMHO, if this
    really has to be like this, creating phy's own workqueue seems to be
    resonable, at the very least to reduce latencies to other users of
    this irq.

    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    After reading this and earlier threads about phylib's way of using
    workqueue I think such a lighter and safer wrt. locking alternative
    for flush_scheduled_work should be useful, but maybe it's only my
    imagination.

    So, let's ask Oleg Nesterov, whose solutions are here only
    copy-cut-pasted & possibly abused by myslef.

    --------->
    Subject: flush_work_sync as an alternative for flush_scheduled_work

    Similar to cancel_work_sync() but will only busy wait & block
    (without cancel).

    Signed-off-by: Jarek Poplawski

    ---

    include/linux/workqueue.h | 1 +
    kernel/workqueue.c | 24 ++++++++++++++++++++++++
    2 files changed, 25 insertions(+)

    diff -Nurp 2.6.23-mm1-/include/linux/workqueue.h 2.6.23-mm1/include/linux/workqueue.h
    --- 2.6.23-mm1-/include/linux/workqueue.h 2007-10-12 23:45:24.000000000 +0200
    +++ 2.6.23-mm1/include/linux/workqueue.h 2007-10-17 20:55:26.000000000 +0200
    @@ -192,6 +192,7 @@ extern void init_workqueues(void);
    int execute_in_process_context(work_func_t fn, struct execute_work *);

    extern int cancel_work_sync(struct work_struct *work);
    +extern void flush_work_sync(struct work_struct *work);

    /*
    * Kill off a pending schedule_delayed_work(). Note that the work callback
    diff -Nurp 2.6.23-mm1-/kernel/workqueue.c 2.6.23-mm1/kernel/workqueue.c
    --- 2.6.23-mm1-/kernel/workqueue.c 2007-10-12 23:45:25.000000000 +0200
    +++ 2.6.23-mm1/kernel/workqueue.c 2007-10-17 20:54:03.000000000 +0200
    @@ -539,6 +539,30 @@ int cancel_delayed_work_sync(struct dela
    }
    EXPORT_SYMBOL(cancel_delayed_work_sync);

    +/**
    + * flush_work_sync - block until a work_struct's callback has terminated
    + * @work: the work which is to be flushed
    + *
    + * Similar to cancel_work_sync() but will only busy wait (without cancel)
    + * if the work is queued. If the work's callback appears to be running,
    + * flush_work_sync() will block until it has completed (but doesn't block
    + * while other callbacks are running, like flush_scheduled_work() does).
    + *
    + * It is not allowed to use this function if the work re-queues itself.
    + */
    +void flush_work_sync(struct work_struct *work)
    +{
    + int ret;
    +
    + do {
    + ret = work_pending(work);
    + wait_on_work(work);
    + if (ret)
    + cpu_relax();
    + } while (ret);
    +}
    +EXPORT_SYMBOL(flush_work_sync);
    +
    static struct workqueue_struct *keventd_wq __read_mostly;

    /**
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, 18 Oct 2007, Jarek Poplawski wrote:

    > After rethinking, it looks like this last cancel should be useless.
    > So, if phy_interrupt() schedules only if !PHY_HALTED and phy_change()
    > does enable_irq() with no exeptions, it seems phy_interrupt() even
    > without lock must see PHY_HALTED state before this free_irq() with
    > possible DEBUG_SHIRQ call, then maybe only this safety:
    >
    > WARN_ON(work_pending(&phydev->phy_queue));


    Good point.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Wed, 17 Oct 2007, Jarek Poplawski wrote:

    > I'm not sure free_irq() should maintain the depth count - rather warn
    > on not zero. But, IMHO, any activity on freed irq seems suspicious to
    > me (and doesn't look like very common), even if it's safe with current
    > implementation.


    No way to avoid it with DEBUG_SHIRQ.

    > Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
    > to be reasonable only in the case of possible resent irqs (so not for
    > all irqs). On the other hand, it seems, proper irq handler with proper
    > hardware shouldn't have any problems with such a check.


    What do you mean by "proper irq handler with proper hardware"? Using
    softirqs (they used to be called bottom-halves) is actually a natural way
    of handling any interrupt which requires extensive processing.

    > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
    > racy: eg. if it's done during phy_stop() it can check just before
    > the flag is set and reenable interrupts just after phy_stop() ends.


    I remember having a look into it, but it was long ago and I cannot
    immediately recall the conclusion. Which means it is either broken or
    deserves a comment as non-obvious. I will have a look into it again, but
    I am resource-starved a little at the moment, sorry.

    > 2) phy_change() doesn't reenable irq line after it sees returns
    > with errors; IMHO it should at least write some warning, but maybe
    > try some safety plan, so enable_irq() and try to disable interrupts
    > and free_irq() on the next call (if it happens). (But, I can be very
    > wrong with this - maybe it's OK and official way.)


    No way to do this safely -- at this point the device probably still has
    its interrupt output asserted and the register to clear it is
    inaccessible, so enabling the line will enter an infinite loop. At this
    point the system is no longer stable, so it is better to keep at least
    some functionality, so that it may be attempted to be shut down cleanly,
    rather than make it completely irresponsive. The alternative is panic().

    > 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
    > not sure now if it could be dangerous after fixing #1; on the other
    > hand even if we know it's not our regular interrupt, with current
    > DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
    > we are sure it's before/in free_irq() yet.


    See #1 above.

    > 4) phy_interrupt() should check return value from schedule_work() and
    > enable irq on 0.


    No -- the work already pending will do that.

    > 5) phy_stop_interrupts(): maybe I miss something, but it seems
    > phy_stop() is required before this, so maybe there should be a
    > comment on this?


    The API is documented in: Documentation/networking/phy.txt -- you are
    welcome to improve. If you do not want to get into the gory details, just
    use the cooked interface and phy_disconnect() will do the dirty work for
    you.

    > 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
    > phy_disable_interrupts() looks like we are not sure this phy_stop()
    > really works; than maybe a WARN_ON?


    WARN_ON what?

    > 7) phy_stop_interrupts(): after above mentioned changes in
    > phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
    > any reason why there should be more than one skipped enable_irq(),
    > so checking return from cancel_work_sync() shouldn't be enough
    > instead of this atomic counter.


    CONFIG_DEBUG_SHIRQ. Barring this option, cancel_work_sync() could have
    been moved to the front of free_irq(). I think I have seen this in
    reality, with the interrupt line left stuck disabled afterwards, but I
    will double check when I have an opportunity. The approach implemented
    with this patch does work, which is the important bit, and if
    simplification is possible, then it may be applied later.

    > 8) phy_stop_interrupts(): I'm not sure this additional call from
    > DEBUG_SHIRQ should be so dangerous, eg.:
    >
    > /*
    > * status == PHY_HALTED &&
    > * interrupts are stopped after phy_stop()
    > */
    > if (cancel_work_sync(...))
    > enable_irq();
    >
    > free_irq(...);
    > /*
    > * possible schedule_work() from DEBUG_SHIRQ only,
    > * but proper check for PHY_HALTED is done;
    > * so, let's flush after this too:
    > */
    > cancel_work_sync();


    Well, if there is another handler registered on this line, you'll get
    your interrupt line stuck disabled.

    > Of course, I don't know phy.c enough, so most of this can be terribly
    > wrong, then feel free to forget about this - I don't expect you should
    > waste any time for explaining me these things - after all they are
    > doubts only.


    I am not sure I know phy.c well enough either, ;-) and your concerns are
    appreciated as interesting conclusions may develop. If someone disagrees
    with what I have written here, they are welcome to speak out too.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, Oct 18, 2007 at 12:30:35PM +0100, Maciej W. Rozycki wrote:
    > On Wed, 17 Oct 2007, Jarek Poplawski wrote:
    >
    > > I'm not sure free_irq() should maintain the depth count - rather warn
    > > on not zero. But, IMHO, any activity on freed irq seems suspicious to
    > > me (and doesn't look like very common), even if it's safe with current
    > > implementation.

    >
    > No way to avoid it with DEBUG_SHIRQ.
    >
    > > Yes, these DEBUG_SHIRQ checks are suspicious to me too, but they seem
    > > to be reasonable only in the case of possible resent irqs (so not for
    > > all irqs). On the other hand, it seems, proper irq handler with proper
    > > hardware shouldn't have any problems with such a check.

    >
    > What do you mean by "proper irq handler with proper hardware"? Using
    > softirqs (they used to be called bottom-halves) is actually a natural way
    > of handling any interrupt which requires extensive processing.


    Technically until free_irq returns a handler should respond to calls
    and with proper hardware it should have no problem with checking if
    it's its interrupt, even after disabling this hardware, because of
    possible races.

    But with a scenario like this:

    - disable_irq()
    - disable_my_hadrware_irq()
    - clear_my_hardware_irq()
    - free_irq()
    - enable_irq()

    it seems the handler should respond even after free_irq because there
    could be still interrupts to resend, originally generated by its
    hardware, so such behavior looks very suspicious, at least with some
    type of interrupts.

    So, I think, the idea of DEBUG_SHIRQ is generally right and very
    useful - but, of course, there could be exceptions, which btw. could
    try some hacks under DEBUG_SHIRQ too. And my opinion about
    'properness' was very general (not about phy) too, just like my
    'knowledge' of drivers.

    >
    > > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
    > > racy: eg. if it's done during phy_stop() it can check just before
    > > the flag is set and reenable interrupts just after phy_stop() ends.

    >
    > I remember having a look into it, but it was long ago and I cannot
    > immediately recall the conclusion. Which means it is either broken or
    > deserves a comment as non-obvious. I will have a look into it again, but
    > I am resource-starved a little at the moment, sorry.
    >
    > > 2) phy_change() doesn't reenable irq line after it sees returns
    > > with errors; IMHO it should at least write some warning, but maybe
    > > try some safety plan, so enable_irq() and try to disable interrupts
    > > and free_irq() on the next call (if it happens). (But, I can be very
    > > wrong with this - maybe it's OK and official way.)

    >
    > No way to do this safely -- at this point the device probably still has
    > its interrupt output asserted and the register to clear it is
    > inaccessible, so enabling the line will enter an infinite loop. At this
    > point the system is no longer stable, so it is better to keep at least
    > some functionality, so that it may be attempted to be shut down cleanly,
    > rather than make it completely irresponsive. The alternative is panic().


    Right! But then some warning could be useful, I presume.

    >
    > > 3) phy_interrupt() checks PHY_HALTED flag without lock too, but I'm
    > > not sure now if it could be dangerous after fixing #1; on the other
    > > hand even if we know it's not our regular interrupt, with current
    > > DEBUG_SHIRQ it could be easier to call schedule_work() anyway since
    > > we are sure it's before/in free_irq() yet.

    >
    > See #1 above.
    >
    > > 4) phy_interrupt() should check return value from schedule_work() and
    > > enable irq on 0.

    >
    > No -- the work already pending will do that.


    How? If schedule_work won't succeed because there is a pending one,
    we did disable_irq_nosync 2x, so we would stay at least 1x too deep!

    >
    > > 5) phy_stop_interrupts(): maybe I miss something, but it seems
    > > phy_stop() is required before this, so maybe there should be a
    > > comment on this?

    >
    > The API is documented in: Documentation/networking/phy.txt -- you are
    > welcome to improve. If you do not want to get into the gory details, just
    > use the cooked interface and phy_disconnect() will do the dirty work for
    > you.
    >
    > > 6) phy_stop_interrupts(): if I'm not wrong with #3 calling
    > > phy_disable_interrupts() looks like we are not sure this phy_stop()
    > > really works; than maybe a WARN_ON?

    >
    > WARN_ON what?


    I've thought that phy_stop() could be needed before
    phy_stop_interrupt() to set PHY_HALTED, but since it disables and
    clears interrupts too, then there should be no need to repeat this,
    maybe only check it's done. But if there is no such dependency, then
    no point at all.

    >
    > > 7) phy_stop_interrupts(): after above mentioned changes in
    > > phy_interrupt(), and phy_changes() (always enable_irq()) I can't see
    > > any reason why there should be more than one skipped enable_irq(),
    > > so checking return from cancel_work_sync() shouldn't be enough
    > > instead of this atomic counter.

    >
    > CONFIG_DEBUG_SHIRQ. Barring this option, cancel_work_sync() could have
    > been moved to the front of free_irq(). I think I have seen this in
    > reality, with the interrupt line left stuck disabled afterwards, but I
    > will double check when I have an opportunity. The approach implemented
    > with this patch does work, which is the important bit, and if
    > simplification is possible, then it may be applied later.
    >
    > > 8) phy_stop_interrupts(): I'm not sure this additional call from
    > > DEBUG_SHIRQ should be so dangerous, eg.:
    > >
    > > /*
    > > * status == PHY_HALTED &&
    > > * interrupts are stopped after phy_stop()
    > > */
    > > if (cancel_work_sync(...))
    > > enable_irq();
    > >
    > > free_irq(...);
    > > /*
    > > * possible schedule_work() from DEBUG_SHIRQ only,
    > > * but proper check for PHY_HALTED is done;
    > > * so, let's flush after this too:
    > > */
    > > cancel_work_sync();

    >
    > Well, if there is another handler registered on this line, you'll get
    > your interrupt line stuck disabled.


    I'll rethink this yet...

    >
    > > Of course, I don't know phy.c enough, so most of this can be terribly
    > > wrong, then feel free to forget about this - I don't expect you should
    > > waste any time for explaining me these things - after all they are
    > > doubts only.

    >
    > I am not sure I know phy.c well enough either, ;-) and your concerns are
    > appreciated as interesting conclusions may develop. If someone disagrees
    > with what I have written here, they are welcome to speak out too.


    But, I've enough of other concerns too, so nothing urgent here...

    Many thanks,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, 18 Oct 2007, Jarek Poplawski wrote:

    > Technically until free_irq returns a handler should respond to calls
    > and with proper hardware it should have no problem with checking if
    > it's its interrupt, even after disabling this hardware, because of
    > possible races.


    Well, the hardirq handler can check it, no problem -- it is just it is so
    slow, the latency would be unacceptable. The problem with the softirq
    handler is we do really not want it to be called after the driver has
    already been shut down and its structures freed. It used to happen before
    this flush/cancel call was added with the usual effect (oops) as by then
    they may well have been stamped on already.

    > But with a scenario like this:
    >
    > - disable_irq()
    > - disable_my_hadrware_irq()
    > - clear_my_hardware_irq()
    > - free_irq()
    > - enable_irq()
    >
    > it seems the handler should respond even after free_irq because there
    > could be still interrupts to resend, originally generated by its
    > hardware, so such behavior looks very suspicious, at least with some
    > type of interrupts.


    These are softirqs, not hardware interrupts, so they are as such not
    related to *_irq() infrastructure. The flaw is the depth count of IRQ
    lines is not maintained consistently. This is, according to comments
    around the code in question, to cover up bugs elsewhere. Not a brillant
    idea, I am afraid -- there should be no need to reset the depth upon
    request_irq() and likewise with free_irq(), but there you go. I would be
    happy to investigate a possible solution and rewrite the necessary bits,
    but right now I am committed to other stuff, overdue already, sorry.

    The view could change if we supported hot-pluggable interrupt
    controllers, but it is not the case at the moment right now, so the
    interrupt lines are there to stay for the duration of the system lifespan
    and could be manipulated as necessary.

    > So, I think, the idea of DEBUG_SHIRQ is generally right and very
    > useful - but, of course, there could be exceptions, which btw. could
    > try some hacks under DEBUG_SHIRQ too. And my opinion about
    > 'properness' was very general (not about phy) too, just like my
    > 'knowledge' of drivers.


    The idea is right, no question, but I am not quite sure it has been
    properly architected into our current design. Actually I am almost sure
    of the reverse. This is why I was (and still am) interested in feedback
    on it.

    > Right! But then some warning could be useful, I presume.


    Of course; adding one to phy_error() should be straightforward.

    > > > 4) phy_interrupt() should check return value from schedule_work() and
    > > > enable irq on 0.

    > >
    > > No -- the work already pending will do that.

    >
    > How? If schedule_work won't succeed because there is a pending one,
    > we did disable_irq_nosync 2x, so we would stay at least 1x too deep!


    Correct and my note is misleading, sorry.

    The thing is we shouldn't have come here the second time in the first
    place (which is I think why the check is not there) as handlers for the
    same line are not allowed to run in parallel (cf. irq_desc->lock and
    IRQ_INPROGRESS). Perhaps BUG_ON(!schedule_work()) would be appropriate,
    though I am not sure if we should handle every "impossible" condition we
    can imagine. Quite a lot of hardirq handlers assume two instances will
    not run in parallel, so if it ever happened, then a serious breakage would
    unleash.

    > But, I've enough of other concerns too, so nothing urgent here...


    The problem is at the moment I am still probably the only user of this
    code ;-) -- `grep' through the sources to see how many drivers request an
    IRQ for their PHY through phylib; unless something has changed very
    recently.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On 10/18, Jarek Poplawski wrote:
    >
    > +/**
    > + * flush_work_sync - block until a work_struct's callback has terminated

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^
    Hmm...

    > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
    > + * if the work is queued.


    Yes, it won't block, but will spin in busy-wait loop until all other works
    scheduled before this work are finished. Not good. After that it really
    blocks waiting for this work to complete.

    And I am a bit confused. We can't use flush_workqueue() because some of the
    queued work_structs may take rtnl_lock, yes? But in that case we can't use
    the new flush_work_sync() helper as well, no?

    If we can't just cancel the work, can't we do something like

    if (cancel_work_sync(w))
    w->func(w);

    instead?

    > +void flush_work_sync(struct work_struct *work)
    > +{
    > + int ret;
    > +
    > + do {
    > + ret = work_pending(work);
    > + wait_on_work(work);
    > + if (ret)
    > + cpu_relax();
    > + } while (ret);
    > +}


    If we really the new helper, perhaps we can make it a bit better?

    1. Modify insert_work() to take the "struct list_head *at" parameter instead
    of "int tail". I think this patch will also cleanup the code a bit, and
    shrink a couple of bytes from .text

    2. flush_work_sync() inserts a barrier right after this work and blocks.
    We still need some retry logic to handle the queueing is in progress
    of course, but we won't spin waiting for the other works.

    What do you think?

    Oleg.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, 18 Oct 2007, Oleg Nesterov wrote:

    > If we can't just cancel the work, can't we do something like
    >
    > if (cancel_work_sync(w))
    > w->func(w);
    >
    > instead?


    We do an equivalent of this -- all that we care about that w->func(w)
    would do is enable_irq() and the rest we want to skip at this point. We
    probably do not need the counter in the end though.

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, Oct 18, 2007 at 07:48:19PM +0400, Oleg Nesterov wrote:
    > On 10/18, Jarek Poplawski wrote:
    > >
    > > +/**
    > > + * flush_work_sync - block until a work_struct's callback has terminated

    > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^
    > Hmm...
    >
    > > + * Similar to cancel_work_sync() but will only busy wait (without cancel)
    > > + * if the work is queued.

    >
    > Yes, it won't block, but will spin in busy-wait loop until all other works
    > scheduled before this work are finished. Not good. After that it really
    > blocks waiting for this work to complete.
    >
    > And I am a bit confused. We can't use flush_workqueue() because some of the
    > queued work_structs may take rtnl_lock, yes? But in that case we can't use
    > the new flush_work_sync() helper as well, no?


    OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO OOOOOOOOOOOOOOOOOPS!

    Of course, we can't!!! I remembered there was this issue long time
    ago, but then I've had some break in tracking net & workqueue. So,
    while reading this patch I was alarmed at first, and self-misled
    later. I think, there is definitely needed some warning about
    locking (or unlocking) during these flush_ & cancel_ functions.
    (Btw, I've very much wondered now, why I didn't notice at that 'old'
    time, that you added such a great feature (wrt. locking) and I even
    didn't notice this...).

    So, Maciej (and other readers of this thread) - I withdraw my false
    opinion from my second message here: it's very wrong to call this
    sched_work_sync() with rtnl_lock(). It's only less probable to lockup
    with this than with flush_schedule_work().

    >
    > If we can't just cancel the work, can't we do something like
    >
    > if (cancel_work_sync(w))
    > w->func(w);
    >
    > instead?
    >
    > > +void flush_work_sync(struct work_struct *work)
    > > +{
    > > + int ret;
    > > +
    > > + do {
    > > + ret = work_pending(work);
    > > + wait_on_work(work);
    > > + if (ret)
    > > + cpu_relax();
    > > + } while (ret);
    > > +}

    >
    > If we really the new helper, perhaps we can make it a bit better?
    >
    > 1. Modify insert_work() to take the "struct list_head *at" parameter instead
    > of "int tail". I think this patch will also cleanup the code a bit, and
    > shrink a couple of bytes from .text


    Looks like a very good idea, but I need more time to rethink this.
    Probably some code example should be helpful.

    >
    > 2. flush_work_sync() inserts a barrier right after this work and blocks.
    > We still need some retry logic to handle the queueing is in progress
    > of course, but we won't spin waiting for the other works.


    Until monday I should have an opinion on that (today a bit under
    fire...).

    >
    > What do you think?


    Since there is no gain wrt. locking with my current proposal, I
    withdraw this patch of course.

    It looks like my wrong patch was great idea because we got this very
    precious Oleg's opinion! (I know I'm a genius sometimes...)

    Thanks very much,
    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Fri, Oct 19, 2007 at 09:50:14AM +0200, Jarek Poplawski wrote:
    ....
    > sched_work_sync() with rtnl_lock(). It's only less probable to lockup
    > with this than with flush_schedule_work().


    ....But, not much less...

    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] flush_work_sync vs. flush_scheduled_work Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, 2007-10-18 at 19:48 +0400, Oleg Nesterov wrote:

    > > +void flush_work_sync(struct work_struct *work)


    > If we really the new helper, perhaps we can make it a bit better?
    >
    > 1. Modify insert_work() to take the "struct list_head *at" parameter instead
    > of "int tail". I think this patch will also cleanup the code a bit, and
    > shrink a couple of bytes from .text
    >
    > 2. flush_work_sync() inserts a barrier right after this work and blocks.
    > We still need some retry logic to handle the queueing is in progress
    > of course, but we won't spin waiting for the other works.


    3. Add lockdep annotation like the other API. Andrew just sent my
    patch (used to be two patches by somebody's request but that's fine)
    titled "workqueue: debug flushing deadlocks with lockdep" to Linus.

    johannes

    -----BEGIN PGP SIGNATURE-----
    Comment: Johannes Berg (powerbook)

    iQIVAwUARxhkLKVg1VMiehFYAQINUQ/+LEr012kcOTShkwe6EJAGnDYrs7AHjZwx
    PmL2FwQztWUWkInCrXAf5s8DQkD1oswmveiE5ZiqOPdp9Eg6qq v3112AX88LynPd
    ZKLILC702lia6PIItDyeMaeEmfkgortobM6YlJAsk9E7z/IFzVep94eFWB5HD2rT
    +zTOqvG5LI9hkIvtObqAYZ5b4MHSNzkN65NouEDc8Hwr/AV/uitFgc79IdLzXkwS
    lvpVT9gw8/UuZQ2mqhMPBCMToAznsEiqH2xDrw5I6BUr9duGLZfyY3Kmqd3L CHTh
    JKtFsRl19FlPYczclNSwc5q+1FX+IYYuZIuXAO8A6ijaHceuxu d7XKif4vciu+wT
    JJB1x69e5GgIl+JlDHmBdXp6BjoN21hTvW35uOd0wb86kB0Fjr UOX7183ReQppC/
    qTtKjXtMhOpcTz61z8wHAvxNSosfGmMsoAxUM1JVsQ6tQwLvOe OdovMo2vfmCa/o
    GQhx6Fd7DpGKlBeadhd1gI3Hd2bu94Gvffsg1a+6BdjXuBfja5 GNQiEDeSgTYf6c
    3j11BaC0m0lYqMXLRdljyVU1cAAxeXHWGArXCuzp4BiWkhQ+nF X+/mbBZBwh9EEn
    DRqEtYUQgfaBGnxJrHPYoBlXXYdX3/+pzz4hmsFNkmxAlTHbeWT9mSSIE3EmLxHU
    AX7y8E8ba/0=
    =M8Ey
    -----END PGP SIGNATURE-----


  19. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, Oct 18, 2007 at 12:30:35PM +0100, Maciej W. Rozycki wrote:
    > On Wed, 17 Oct 2007, Jarek Poplawski wrote:

    ....
    > > 2) phy_change() doesn't reenable irq line after it sees returns
    > > with errors; IMHO it should at least write some warning, but maybe
    > > try some safety plan, so enable_irq() and try to disable interrupts
    > > and free_irq() on the next call (if it happens). (But, I can be very
    > > wrong with this - maybe it's OK and official way.)

    >
    > No way to do this safely -- at this point the device probably still has
    > its interrupt output asserted and the register to clear it is
    > inaccessible, so enabling the line will enter an infinite loop. At this
    > point the system is no longer stable, so it is better to keep at least
    > some functionality, so that it may be attempted to be shut down cleanly,
    > rather than make it completely irresponsive. The alternative is panic().


    But then... your patch seems to make it possible, because it enables
    irq to the initial state of the counter. Of course, this could happen
    on closing only.

    Jarek P.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] PHYLIB: IRQ event workqueue handling fixes

    On Thu, 18 Oct 2007, Maciej W. Rozycki wrote:

    > > 1) phy_change() checks PHY_HALTED flag without lock; I think it's
    > > racy: eg. if it's done during phy_stop() it can check just before
    > > the flag is set and reenable interrupts just after phy_stop() ends.

    >
    > I remember having a look into it, but it was long ago and I cannot
    > immediately recall the conclusion. Which means it is either broken or
    > deserves a comment as non-obvious. I will have a look into it again, but
    > I am resource-starved a little at the moment, sorry.


    Well, I have now recalled what the issue is -- we just plainly and simply
    want to avoid a hardirq spinlock for the very reason we do not do all the
    processing in the hardirq handler. The thing is we make accesses to the
    MDIO bus with the phydev lock held and it may take ages until these
    accesses will have completed. And we cannot afford keeping interrupts
    disabled for so long.

    So the only way is to make the check for the HALTED state lockless and
    make sure any race condition is handled gracefully and does not lead to
    inconsistent behaviour. Which I think as of what we have in the
    net-2.6.24 tree is the case, but there are never too many eyes to look at
    a piece of code, so if anybody feels like proving me wrong, then just go
    ahead!

    Maciej
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast