[PATCH 0/79] smpboot integration - Kernel

This is a discussion on [PATCH 0/79] smpboot integration - Kernel ; On Fri, Mar 21, 2008 at 2:41 PM, Yinghai Lu wrote: > > On Fri, Mar 21, 2008 at 1:03 PM, Yinghai Lu wrote: > > > > On Fri, Mar 21, 2008 at 12:55 PM, Ingo Molnar wrote: > ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 29 of 29

Thread: [PATCH 0/79] smpboot integration

  1. Re: [PATCH 0/79] smpboot integration

    On Fri, Mar 21, 2008 at 2:41 PM, Yinghai Lu wrote:
    >
    > On Fri, Mar 21, 2008 at 1:03 PM, Yinghai Lu wrote:
    > >
    > > On Fri, Mar 21, 2008 at 12:55 PM, Ingo Molnar wrote:
    > > >
    > > > * Yinghai Lu wrote:
    > > >
    > > >
    > > > > how to bisect x86.git tree?. it always said it can not bisect seeked
    > > > > tree.
    > > >
    > > > ah. Does this work:
    > > >
    > > > git-checkout my-tree x86/latest
    > > >
    > > > git-bisect start
    > > > git-bisect good x86/base
    > > > git-bisect bad x86/latest
    > > >
    > > > x86/base is the Linus tree that x86.git is based against. You can see
    > > > all the x86.git/latest changes by doing:
    > > >
    > > > git-log x86/base..x86/latest
    > > >
    > > > Ingo
    > > >

    > >
    > > works. will let you know the result.
    > >

    >
    > yhlu@mpk:~/xx/xx/kernel/x86/linux-2.6> git-bisect bad
    > d1c707188ad646c8094cac9afb1738e7d0196ff2 is first bad commit
    > commit d1c707188ad646c8094cac9afb1738e7d0196ff2
    > Author: Glauber de Oliveira Costa
    > Date: Wed Mar 19 14:25:53 2008 -0300
    >
    > x86: include mach_apic.h in smpboot_64.c and smpboot.c
    >
    > After the inclusion, a lot of files needs fixing for conflicts,
    > some of them in the headers themselves, to accomodate for both
    > i386 and x86_64 versions.
    >
    > [ mingo@elte.hu: build fix ]
    >
    > Signed-off-by: Glauber Costa
    > Signed-off-by: Ingo Molnar
    >
    > :040000 040000 19f574e64bb8003bbe984f3a8c1315db969dfdcd
    > 6ffe96588c77bc936705599fa110107856201115 M arch
    > :040000 040000 61269347ad4f384ed85cc87c4f2d004ed94492ac
    > 8f5c713da25579a3cdf63db3d4c2f795261d0521 M include
    > yhlu@mpk:~/xx/xx/kernel/x86/linux-2.6>
    >


    attached patch fix that.

    YH


  2. Re: [PATCH 58/79] [PATCH] include mach_apic.h in smpboot_64.c and smpboot.c

    On Thu, Mar 20, 2008 at 7:25 AM, Glauber Costa wrote:
    >
    > Yinghai Lu wrote:
    > > On Wed, Mar 19, 2008 at 10:25 AM, Glauber de Oliveira Costa
    > > wrote:
    > >> From: Glauber Costa
    > >>
    > >> After the inclusion, a lot of files needs fixing for conflicts,
    > >> some of them in the headers themselves, to accomodate for both
    > >> i386 and x86_64 versions.
    > >>
    > >> Signed-off-by: Glauber Costa
    > >> ---
    > >> arch/x86/kernel/acpi/boot.c | 2 ++
    > >> arch/x86/kernel/mpparse_64.c | 2 ++
    > >> arch/x86/kernel/smpboot.c | 2 ++
    > >> arch/x86/kernel/smpboot_64.c | 1 +
    > >> arch/x86/vdso/Makefile | 2 +-
    > >> include/asm-x86/apic.h | 1 -
    > >> include/asm-x86/apicdef.h | 6 ------
    > >> include/asm-x86/mach-default/mach_apic.h | 11 +++++++++++
    > >> include/asm-x86/mach-default/mach_apicdef.h | 5 +++++
    > >> include/asm-x86/smp_64.h | 9 +--------
    > >> 10 files changed, 25 insertions(+), 16 deletions(-)

    > >
    > > please don't.
    > >
    > > before this patch
    > > include/asm-x86/mach_apic.h is only for x86_64 only
    > > include/asm-x86/mach-default/mach_apic.h is for i386 only.
    > >
    > > and both have __ASM_MACH_APIC_H defined.
    > >
    > > may need another name?
    > >
    > > YH

    > Another name is possible, but I'd prefer to get rid of the
    > asm-x86/mach_apic.h. The goal here is to have things integrated, so
    > unless really necessary, this is prefered.


    anyway, before that I hope you can rename include/asm-x86/mach_apic.h
    to include/asm-x86/apic_ext.h

    you may move bits from apic_ext.h to mach-default/mach_apic.h later.

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    Maciej W. Rozycki wrote:
    > On Thu, 20 Mar 2008, Glauber Costa wrote:
    >
    >>> Are you sure this actually triggers for APIC chips affected by the erratum
    >>> in question? And please note that for them the effect of two consecutive
    >>> writes will be much more disastrous than setting a bit in the ESR register.

    >> I'm not _sure_, but I can't find anything in the errata list that states
    >> otherwise. It would be great that anyone has such a system to test it. But
    >> with the current conditions, it will break bootup code. In case it is really a
    >> problem, we'd need to make a special case for that.

    >
    > I have dug out the relevant erratum -- it is the 11AP one as referred to
    > from arch/x86/kernel/smp_32.c and the text even mentions the EOI register
    > explicitly:
    >
    > "This problem affects systems that use HOLD/HLDA or BOFF# and enable the
    > local APIC of the CPU. If the second APIC write cycle is an EOI (End of
    > Interrupt) cycle, the CPU will stop servicing subsequent interrupts of
    > equal or less priority. This may cause the system to hang. If the second
    > APIC write cycle is not an EOI, the failure mode would depend on the
    > particular APIC register that is not updated correctly."
    >
    > But on this occasion I took the opportunity to refresh my memory on the
    > ESR register and there is apparently no bit there, at least up to
    > Pentium4, that would signify an error resulting from an incorrect access
    > type -- only accesses to invalid register indices are marked as errors.
    >
    > Which bit of the ESR can you see set as a result of using an RMW cycle to
    > the EOI register and with what kind of CPU/APIC? And why wouldn't it have
    > affected older kernels? -- the error interrupt has been kept enabled by
    > Linux for ages and writes to the EOI register are frequent enough it would
    > be hard to miss the resulting flood of errors. Hmm...
    >

    I see bit 7 - Illegal Register Address being set.
    I believe the reason we never saw it, is that the ESR register is not
    checked that often when interrupts are enabled. In the new bootup state
    machine, that is inherited from x86_64, we call do_boot_cpu with irqs
    clearly enabled, and check esr in the process.

    But I can understand from the spec you posted that this is clearly an
    error. So I'd have better come up with a new solution from this

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 0/79] smpboot integration

    Ingo Molnar wrote:
    > * Yinghai Lu wrote:
    >
    >> attached patch fix that.

    >
    > thanks Yinghai, applied.

    Thanks for seeking and patching this, Yinghai. The granularity proved
    quite useful once more ;-)

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    On Mon, 24 Mar 2008, Glauber Costa wrote:

    > I see bit 7 - Illegal Register Address being set.


    Hmm, it looks like the only one to be reasonably set under these
    conditions, but I have never seen it reported for a read cycle to the EOI
    register anyway, so I suppose it has to be a relatively recent addition.

    > I believe the reason we never saw it, is that the ESR register is not checked
    > that often when interrupts are enabled. In the new bootup state machine, that


    Well, as I wrote, the error interrupt handler is always enabled,
    reporting the state recorded in the ESR register as soon as an error
    condition triggers and if an RMW cycle was a problem before, we would have
    seen a flood of reports from people -- like we indeed have many times for
    inter-APIC bus data corruption that triggers the same event (using bits
    3:0 in the ESR as relevant).

    > is inherited from x86_64, we call do_boot_cpu with irqs clearly enabled, and
    > check esr in the process.


    Please note that ESR may hold some leftover state from whatever happened
    before Linux has taken control, so it is reasonable and I think actually
    recommended by Intel (FWIW) to clear the register before enabling the
    error interrupt. For how to clear the ESR properly, please see
    setup_local_APIC() -- subtle differences and errata in various APIC
    implementations have made it more complicated than necessary, sigh...

    > But I can understand from the spec you posted that this is clearly an error.
    > So I'd have better come up with a new solution from this


    Well, with CONFIG_X86_GOOD_APIC set there is no RMW access to the ESR as
    apic_write_around() expands to apic_write(). And the option is meant to
    be clear only for the original integrated APIC as included in the Pentium
    processor ("Pentium-Classic" in the Kconfig nomenclature). I have no
    means to test such a system, but I still have a working dual-Pentium-MMX
    machine, which features local APICs that should be the same modulo errata.
    I may check and see whether a RMW cycle to the ESR triggers any problems
    with this computer, but the box is currently at the other end of the
    continent, so it will take a while.

    I have asked this question already: what kind of CPU are you running on?
    Do you really need to have CONFIG_X86_GOOD_APIC clear with it?

    Maciej
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    "Maciej W. Rozycki" writes:
    >
    > Please note that ESR may hold some leftover state from whatever happened
    > before Linux has taken control, so it is reasonable and I think actually
    > recommended by Intel (FWIW) to clear the register before enabling the
    > error interrupt. For how to clear the ESR properly, please see
    > setup_local_APIC() -- subtle differences and errata in various APIC
    > implementations have made it more complicated than necessary, sigh...


    iirc a lot of the ESR weirdness was on NUMAQ only, which is down to one
    of two last machines running Linux which will hopefully die soon ...

    -Andi

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    Maciej W. Rozycki wrote:
    >> is inherited from x86_64, we call do_boot_cpu with irqs clearly enabled, and
    >> check esr in the process.

    >
    > Please note that ESR may hold some leftover state from whatever happened
    > before Linux has taken control, so it is reasonable and I think actually
    > recommended by Intel (FWIW) to clear the register before enabling the
    > error interrupt. For how to clear the ESR properly, please see
    > setup_local_APIC() -- subtle differences and errata in various APIC
    > implementations have made it more complicated than necessary, sigh...

    which excerpt specifically are you talking about ?
    the only ESR mention I see in setup_local_APIC() is this:

    /* Pound the ESR really hard over the head with a big hammer -
    mbligh */
    if (esr_disable) {
    apic_write(APIC_ESR, 0);
    apic_write(APIC_ESR, 0);
    apic_write(APIC_ESR, 0);
    apic_write(APIC_ESR, 0);
    }
    which seems more like a disablement.

    the bootup code does clean it, tough, by writing and reading the ESR.

    >> But I can understand from the spec you posted that this is clearly an error.
    >> So I'd have better come up with a new solution from this

    >
    > Well, with CONFIG_X86_GOOD_APIC set there is no RMW access to the ESR as
    > apic_write_around() expands to apic_write(). And the option is meant to
    > be clear only for the original integrated APIC as included in the Pentium
    > processor ("Pentium-Classic" in the Kconfig nomenclature). I have no
    > means to test such a system, but I still have a working dual-Pentium-MMX
    > machine, which features local APICs that should be the same modulo errata.
    > I may check and see whether a RMW cycle to the ESR triggers any problems
    > with this computer, but the box is currently at the other end of the
    > continent, so it will take a while.
    >
    > I have asked this question already: what kind of CPU are you running on?
    > Do you really need to have CONFIG_X86_GOOD_APIC clear with it?
    >

    My testings that triggered that were with qemu, with randconfigs.
    Probably it has a good apic, but it is good that it triggered anyway.
    Otherwise I'd never see it.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    On Tue, 25 Mar 2008, Glauber Costa wrote:

    > the only ESR mention I see in setup_local_APIC() is this:
    >
    > /* Pound the ESR really hard over the head with a big hammer - mbligh
    > */
    > if (esr_disable) {
    > apic_write(APIC_ESR, 0);
    > apic_write(APIC_ESR, 0);
    > apic_write(APIC_ESR, 0);
    > apic_write(APIC_ESR, 0);
    > }
    > which seems more like a disablement.


    There is more later on...

    > the bootup code does clean it, tough, by writing and reading the ESR.


    .... basically for the original Pentium and Pentium/MMX APIC you only had
    to read the ESR to get at the bits. The read would clear them as well as
    a side-effect. Although at that stage already it was mentioned in the
    spec that for future compatibility a write of zero beforehand (ignored as
    the register was r/o) should be performed. Which indeed became a
    requirement from PentiumPro onwards as with these processors it was the
    write that copied the internal error latches into the visible ESR
    register. Except that some Pentium APICs had an erratum, where ESR was
    indeed r/w and the leading write of zero would actually clear the register
    losing the recorded state, so it had to be avoided despite the
    recommendation. Hence the code you can see within:

    if (integrated && !esr_disable) {
    }

    I suppose other APIC implementers were not that keen on keeping bug
    compatibility, so chances are other APIC core work just fine as specified
    by the architecture (for whatever the meaning of "fine" is).

    Note the usual APIC error interrupt handler is smp_error_interrupt().

    > > I have asked this question already: what kind of CPU are you running on?
    > > Do you really need to have CONFIG_X86_GOOD_APIC clear with it?
    > >

    > My testings that triggered that were with qemu, with randconfigs. Probably it
    > has a good apic, but it is good that it triggered anyway. Otherwise I'd never
    > see it.


    Ah, I see -- it may be worth checking what actual hardware does and
    fixing QEMU if necessary for it to match reality then. ;-)

    OTOH, if actual modern hardware triggered such an error, then for the
    sake of a generic "runs everywhere" kernel either ack_APIC_irq() or even
    apic_write_around() could be modified to perform a run-time check if
    configured with !CONFIG_X86_GOOD_APIC and avoid the read if unnecessary;
    it's an erratum workaround after all and SMP Pentium systems suffering
    from this bug (UP Pentium systems did not nor had a way to enable the
    local APIC normally) are probably an insignificant minority if any at all
    left these days. Therefore it should be a negligible sacrifice of
    performance.

    Maciej
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 45/79] [PATCH] fix apic acking of irqs

    Maciej W. Rozycki wrote:
    > On Tue, 25 Mar 2008, Glauber Costa wrote:
    >
    >> the only ESR mention I see in setup_local_APIC() is this:
    >>
    >> /* Pound the ESR really hard over the head with a big hammer - mbligh
    >> */
    >> if (esr_disable) {
    >> apic_write(APIC_ESR, 0);
    >> apic_write(APIC_ESR, 0);
    >> apic_write(APIC_ESR, 0);
    >> apic_write(APIC_ESR, 0);
    >> }
    >> which seems more like a disablement.

    >
    > There is more later on...
    >
    >> the bootup code does clean it, tough, by writing and reading the ESR.

    >
    > ... basically for the original Pentium and Pentium/MMX APIC you only had
    > to read the ESR to get at the bits. The read would clear them as well as
    > a side-effect. Although at that stage already it was mentioned in the
    > spec that for future compatibility a write of zero beforehand (ignored as
    > the register was r/o) should be performed. Which indeed became a
    > requirement from PentiumPro onwards as with these processors it was the
    > write that copied the internal error latches into the visible ESR
    > register. Except that some Pentium APICs had an erratum, where ESR was
    > indeed r/w and the leading write of zero would actually clear the register
    > losing the recorded state, so it had to be avoided despite the
    > recommendation. Hence the code you can see within:
    >
    > if (integrated && !esr_disable) {
    > }
    >
    > I suppose other APIC implementers were not that keen on keeping bug
    > compatibility, so chances are other APIC core work just fine as specified
    > by the architecture (for whatever the meaning of "fine" is).
    >
    > Note the usual APIC error interrupt handler is smp_error_interrupt().
    >
    >>> I have asked this question already: what kind of CPU are you running on?
    >>> Do you really need to have CONFIG_X86_GOOD_APIC clear with it?
    >>>

    >> My testings that triggered that were with qemu, with randconfigs. Probably it
    >> has a good apic, but it is good that it triggered anyway. Otherwise I'd never
    >> see it.

    >
    > Ah, I see -- it may be worth checking what actual hardware does and
    > fixing QEMU if necessary for it to match reality then. ;-)
    >
    > OTOH, if actual modern hardware triggered such an error, then for the
    > sake of a generic "runs everywhere" kernel either ack_APIC_irq() or even
    > apic_write_around() could be modified to perform a run-time check if
    > configured with !CONFIG_X86_GOOD_APIC and avoid the read if unnecessary;
    > it's an erratum workaround after all and SMP Pentium systems suffering
    > from this bug (UP Pentium systems did not nor had a way to enable the
    > local APIC normally) are probably an insignificant minority if any at all
    > left these days. Therefore it should be a negligible sacrifice of
    > performance.

    I just tested in some real i386 that I have around here, and the reading
    of EOI does not seem to be illegal. (Well, at least in those I've tested).

    OTOH, ignoring the read in qemu makes the tree boot just okay. So I
    agree with you now, we might well fix qemu, and revert this patch.

    Ingo, any word on that?



    > Maciej


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2