2.6.25-rc7: Ugh. - Kernel

This is a discussion on 2.6.25-rc7: Ugh. - Kernel ; On Wednesday 02 April 2008, Alan Stern wrote: > On Wed, 2 Apr 2008, David Brownell wrote: > > > Those hcd->state tests have been getting more and more > > dodgey as time goes by. At this point I ...

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5
Results 81 to 84 of 84

Thread: 2.6.25-rc7: Ugh.

  1. Re: [PATCH] usb ehci_iaa_watchdog fix

    On Wednesday 02 April 2008, Alan Stern wrote:
    > On Wed, 2 Apr 2008, David Brownell wrote:
    >
    > > Those hcd->state tests have been getting more and more
    > > dodgey as time goes by. At this point I hardly trust
    > > any of them. There *IS* no clear state machine which
    > > governs the usbcore/HCD interaction.

    >
    > ISTR trying to make that same point a few times over the past two or
    > three years... :-)


    Was there any argument about that fact? The question was
    more "shouldn't there actually *be* such a state machine"
    than "do we have one yet".


    > It would make more sense for each HCD to keep its own private "state"
    > variable. The interaction with usbcore can be broken down into a few
    > simple tests, such as:
    >
    > Is the HC dead?
    > Is the HC (i.e., the PCI or platform device) suspended?
    > Is the HC running?
    > Is it okay to submit an URB?
    >
    > There's plenty of redundancy in this list, and some of the information
    > may already be available in hcd->self.root_hub->state. In many or most
    > cases, these questions can't be answered in a race-free manner anyhow,
    > which limits their usefulness.


    At any given instant, all of them have valid answers.
    The only one that's not resolvable by mutual exclusion
    on the HCD's spinlock is "is it dead", since the HC
    could crap out at any time. (Nowadays that's fortunately
    rare ... we've been whomping on such bugs for long enough
    by now!)

    - Dave


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: 2.6.25-rc7: Ugh.

    David Miller wrote:
    > From: rct@frus.com (Bob Tracy)
    > Date: Thu, 27 Mar 2008 23:42:58 -0500 (CDT)
    >
    >> I don't understand this claim. On what incredibly fast piece of iron
    >> can you possibly do between 8 and 16 kernel builds (the range I've
    >> encountered when I do the "git bisect" dance), boot each one, and
    >> evaluate the results in so small a period of time?

    >
    > Builds are just over a minute on my box. I can do a full
    > kernel build plus reboot cycle in under 2 minutes.
    >

    Most of us got over "mine is bigger than yours" bragging at about age
    13. He said "laptop," that means slow builds, slow installs, slow boots,
    and unless he's on a fat net pipe to a faster machine, build elsewhere
    isn't much of a saving.

    --
    Bill Davidsen
    "We have more to fear from the bungling of the incompetent than from
    the machinations of the wicked." - from Slashdot

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] usb ehci_iaa_watchdog fix

    On Wednesday 02 April 2008, Alan Stern wrote:
    > On Wed, 2 Apr 2008, David Brownell wrote:
    >
    > > > It fixes the cause rather than the symptom.

    > >
    > > I'm not sure I'd agree with that.

    >
    > Really? The logic seemed clear enough to me.
    >
    > 1. Evidently the ehci_iaa_watchdog routine was getting called at a
    > time when the host controller wasn't running -- which had to
    > have been after it was suspended.


    The change to remove the HC_IS_RUNNING test meant that routine
    would then work when hcd->state is HC_STATE_SUSPENDED or possibly
    HC_STATE_RESUMING. (Since those are the two states for which
    HC_IS_RUNNING fails, other than HC_STATE_HALTED which shouldn't
    be happening here...)

    Now, prior to the now-merged patch that was skipping some work,
    and that seemed to cause resume problems. Making it not skip
    the work made the resume behave (as did the now-merged patch).


    > 2. But ehci_bus_suspend() calls end_unlink_async(), which turns
    > off the IAA watchdog timer.


    Not exactly. First, ehci_bus_suspend() force the timer off all
    by itself ... way too early, while things can still retrigger it.
    That's a buggy idiom that should be fixed. (As you've agreed,
    elsewhere.)

    Second, it doesn't always call end_unlink_async() ... only when
    one or more async unlink is still pending.

    Third, if it *does* call end_unlink_async(), it can retriger
    the timer when it needs to do another unlink. Only when the
    HC is halted (HC_STATE_HALT) is that logic bypassed, scrubbing
    several endpoints at once. (And a minor fourth point, direct
    calls to end_unlink_async leaves the IAA IRQ machinery active.)

    I think your fix handles the "one pending unlink" case, but
    not the more general "N pending unlinks" ...


    > 3. Hence the timer must have been restarted later on while
    > ehci_bus_suspend() was still running. The call to ehci_work()
    > appeared to be the only place that could have happened.


    Removing the state check in the watchdog changed behavior, so the
    HC must have been in one of the two states I listed above when
    that watchdog fired (SUSPENDED or RESUMING). And it had work to
    do, which (because of the state check) it didn't do.


    > 4. Thus moving the end_unlink_async() call to after ehci_work()
    > (or just to be doubly safe, after ehci_halt() and the change
    > to HC_STATE_SUSPENDED) should take care of all pending QH
    > unlinks and leave none of them unfinished.


    It takes care of *ONE* pending QH unlink. If there's more than
    one, it retriggers the timer, so this problem will reappear...


    > Strictly speaking, it would be best to move the del_timer_sync() calls
    > to after ehci_lock is released. But it doesn't really matter if the
    > timer routines get invoked after the controller is suspended.


    When the timer would be retriggered, to finish additional unlinks,
    it does matter. A complete fix for this would (a) disable the IAA
    watchdog timer later, once it can never be retriggered again, and
    (b) make end_unlink_async handle the multiple-unlinks case when the
    HC is suspended, including not retriggering the watchdog.

    - Dave


    >
    > Alan Stern
    >



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] usb ehci_iaa_watchdog fix

    On Mon, 7 Apr 2008, David Brownell wrote:

    > Third, if it *does* call end_unlink_async(), it can retriger
    > the timer when it needs to do another unlink. Only when the
    > HC is halted (HC_STATE_HALT) is that logic bypassed, scrubbing
    > several endpoints at once. (And a minor fourth point, direct
    > calls to end_unlink_async leaves the IAA IRQ machinery active.)
    >
    > I think your fix handles the "one pending unlink" case, but
    > not the more general "N pending unlinks" ...


    Sounds right. That test at the end of start_unlink_async() should be
    changed to !RUNNING instead of HALT.

    To help prevent unwanted recursion, I wrote a patch some time ago that
    would batch up multiple unlinks. (The idea was to keep track of which
    entries were added to the reclaim queue before the current IAA cycle
    got started, and handle them all at one stroke when the current cycle
    ends.) That patch is a bit out-of-date now, but it shouldn't be too
    hard to bring it up to speed.

    > When the timer would be retriggered, to finish additional unlinks,
    > it does matter. A complete fix for this would (a) disable the IAA
    > watchdog timer later, once it can never be retriggered again, and
    > (b) make end_unlink_async handle the multiple-unlinks case when the
    > HC is suspended, including not retriggering the watchdog.


    Agreed. If the code does (a) after (b) then end_unlink_async doesn't
    need to avoid retriggering the watchdog.

    Alan Stern

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5