AGP and PAT (induced?) problem (on AMD family 6) - Kernel

This is a discussion on AGP and PAT (induced?) problem (on AMD family 6) - Kernel ; Hi Dave. A while ago I sent a message about long AGP delays upon starting and exiting X: http://marc.info/?l=linux-kernel&m=121647129632110&w=2 There was no reply (if that was due to the linux.ie address, could you perhaps update it in MAINTAINERS?) but today ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 46

Thread: AGP and PAT (induced?) problem (on AMD family 6)

  1. AGP and PAT (induced?) problem (on AMD family 6)

    Hi Dave.

    A while ago I sent a message about long AGP delays upon starting and
    exiting X:

    http://marc.info/?l=linux-kernel&m=121647129632110&w=2

    There was no reply (if that was due to the linux.ie address, could you
    perhaps update it in MAINTAINERS?) but today Shaohua Li posted a patch
    that made me wonder about PAT in this context:

    http://marc.info/?l=linux-kernel&m=121783222306075&w=2
    http://marc.info/?l=linux-kernel&m=121783222406078&w=2
    http://marc.info/?l=linux-kernel&m=121783222406081&w=2

    His patch does not solve anything appreciable for me -- the delays are
    still as described in that previous post, with an exception for (with
    Option "AGSize" "64") delays upon exiting X that are now sometimes as
    bad as a full 12 seconds.

    What _does_ solve this though is booting with the "nopat" command line
    parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
    With "nopat", there's no problem to be seen anymore -- exiting X
    specifically is instantaneous.

    With or without PAT, my /proc/mtrr is always:

    reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
    reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
    reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1

    under X joined by:

    reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2

    This is a machine with 768M, the AGP aperture set to 64MB and a 32MB
    Matrox Millenium G550 AGP card. More detail in previous post.

    Is this something inherent to PAT? Inherent to PAT on AMD family 6?
    Inherent to DRM/AGP with PAT? On AMD family 6?

    This is probably fairly important to get sorted because although I don't
    know what's where at the moment, last I saw was a patch in x86/tip that
    enabled PAT on many more models including all of AMD.

    For reference, /proc/cpuinfo:

    processor : 0
    vendor_id : AuthenticAMD
    cpu family : 6
    model : 7
    model name : AMD Duron(tm) Processor
    stepping : 1
    cpu MHz : 1313.094
    cache size : 64 KB
    fdiv_bug : no
    hlt_bug : no
    f00f_bug : no
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 1
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
    pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
    bogomips : 2628.89
    clflush size : 32
    power management: ts

    and the PAT enabler patch that I apply locally to 2.6.26:

    diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c
    b/arch/x86/kernel/cpu/addon_cpuid_features.c
    index c2e1ce3..8992282 100644
    --- a/arch/x86/kernel/cpu/addon_cpuid_features.c
    +++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
    @@ -55,7 +55,7 @@ void __cpuinit validate_pat_support(struct cpuinfo_x86 *c)
    {
    switch (c->x86_vendor) {
    case X86_VENDOR_AMD:
    - if (c->x86 >= 0xf && c->x86 <= 0x11)
    + if (c->x86 == 6 || c->x86 >= 0xf)
    return;
    break;
    case X86_VENDOR_INTEL:

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
    > What _does_ solve this though is booting with the "nopat" command line
    > parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
    > With "nopat", there's no problem to be seen anymore -- exiting X
    > specifically is instantaneous.
    >
    > With or without PAT, my /proc/mtrr is always:
    >
    > reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
    > reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
    > reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
    >
    > under X joined by:
    >
    > reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2


    To get some more debug data, can you please retest with latest kernel
    (2.6.27-rc2) using "debugpat" kernel option and provide dmesg
    output plus contents of /x86/pat_memtype_list?


    Thanks,

    Andreas


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 06-08-08 15:51, Andreas Herrmann wrote:

    > On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
    >> What _does_ solve this though is booting with the "nopat" command line
    >> parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
    >> With "nopat", there's no problem to be seen anymore -- exiting X
    >> specifically is instantaneous.
    >>
    >> With or without PAT, my /proc/mtrr is always:
    >>
    >> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
    >> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
    >> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
    >>
    >> under X joined by:
    >>
    >> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2

    >
    > To get some more debug data, can you please retest with latest kernel
    > (2.6.27-rc2)


    Problem present on vanilla -rc2.

    > using "debugpat" kernel option and provide dmesg output


    No... my kernel message buffer isn't large enough for that :-(

    Right, I guess I now know where the delay is coming from. I suppose this
    is not expected. dmesg as captured after starting X and without
    "debugpat" at:

    http://members.home.nl/rene.herman/pat/dmesg.x

    Truncated dmesg with "debugpat":

    http://members.home.nl/rene.herman/pat/dmesg.x.debugpat

    > plus contents of /x86/pat_memtype_list?


    Before starting X (1K):

    http://members.home.nl/rene.herman/p...nsole.debugpat

    After starting X (625K):

    http://members.home.nl/rene.herman/p...ist.x.debugpat

    (This is with 64MB AGP memory)

    More data:

    http://members.home.nl/rene.herman/p...27-rc2-current
    http://members.home.nl/rene.herman/pat/xorg.conf
    http://members.home.nl/rene.herman/pat/Xorg.0.log

    Thanks,
    Rene
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: AGP and PAT (induced?) problem (on AMD family 6)



    Please note this a 2.6.27 problem (given that PAT isn't enabled by
    default, not a _pure_ regression I guess, but still).

    I also still don't know if you (Andreas), Dave or Yinghai should be the
    To: on this but given that you've been the only one to react at all...

    On 06-08-08 22:57, Rene Herman wrote:

    > On 06-08-08 15:51, Andreas Herrmann wrote:
    >
    >> On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
    >>> What _does_ solve this though is booting with the "nopat" command
    >>> line parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron
    >>> myself. With "nopat", there's no problem to be seen anymore --
    >>> exiting X specifically is instantaneous.
    >>>
    >>> With or without PAT, my /proc/mtrr is always:
    >>>
    >>> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
    >>> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
    >>> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
    >>>
    >>> under X joined by:
    >>>
    >>> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2

    >>
    >> To get some more debug data, can you please retest with latest kernel
    >> (2.6.27-rc2)

    >
    > Problem present on vanilla -rc2.
    >
    >> using "debugpat" kernel option and provide dmesg output

    >
    > No... my kernel message buffer isn't large enough for that :-(
    >
    > Right, I guess I now know where the delay is coming from. I suppose this
    > is not expected. dmesg as captured after starting X and without
    > "debugpat" at:
    >
    > http://members.home.nl/rene.herman/pat/dmesg.x
    >
    > Truncated dmesg with "debugpat":
    >
    > http://members.home.nl/rene.herman/pat/dmesg.x.debugpat
    >
    >> plus contents of /x86/pat_memtype_list?

    >
    > Before starting X (1K):
    >
    > http://members.home.nl/rene.herman/p...nsole.debugpat
    >
    > After starting X (625K):
    >
    > http://members.home.nl/rene.herman/p...ist.x.debugpat
    >
    > (This is with 64MB AGP memory)
    >
    > More data:
    >
    > http://members.home.nl/rene.herman/p...27-rc2-current
    > http://members.home.nl/rene.herman/pat/xorg.conf
    > http://members.home.nl/rene.herman/pat/Xorg.0.log

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: AGP and PAT (induced?) problem (on AMD family 6)


    (more people Cc:-ed)

    * Rene Herman wrote:

    > Hi Dave.
    >
    > A while ago I sent a message about long AGP delays upon starting and
    > exiting X:
    >
    > http://marc.info/?l=linux-kernel&m=121647129632110&w=2
    >
    > There was no reply (if that was due to the linux.ie address, could you
    > perhaps update it in MAINTAINERS?) but today Shaohua Li posted a patch
    > that made me wonder about PAT in this context:
    >
    > http://marc.info/?l=linux-kernel&m=121783222306075&w=2
    > http://marc.info/?l=linux-kernel&m=121783222406078&w=2
    > http://marc.info/?l=linux-kernel&m=121783222406081&w=2
    >
    > His patch does not solve anything appreciable for me -- the delays are
    > still as described in that previous post, with an exception for (with
    > Option "AGSize" "64") delays upon exiting X that are now sometimes as
    > bad as a full 12 seconds.
    >
    > What _does_ solve this though is booting with the "nopat" command line
    > parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
    > With "nopat", there's no problem to be seen anymore -- exiting X
    > specifically is instantaneous.
    >
    > With or without PAT, my /proc/mtrr is always:
    >
    > reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
    > reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
    > reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
    >
    > under X joined by:
    >
    > reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2
    >
    > This is a machine with 768M, the AGP aperture set to 64MB and a 32MB
    > Matrox Millenium G550 AGP card. More detail in previous post.
    >
    > Is this something inherent to PAT? Inherent to PAT on AMD family 6?
    > Inherent to DRM/AGP with PAT? On AMD family 6?
    >
    > This is probably fairly important to get sorted because although I don't
    > know what's where at the moment, last I saw was a patch in x86/tip that
    > enabled PAT on many more models including all of AMD.
    >
    > For reference, /proc/cpuinfo:
    >
    > processor : 0
    > vendor_id : AuthenticAMD
    > cpu family : 6
    > model : 7
    > model name : AMD Duron(tm) Processor
    > stepping : 1
    > cpu MHz : 1313.094
    > cache size : 64 KB
    > fdiv_bug : no
    > hlt_bug : no
    > f00f_bug : no
    > coma_bug : no
    > fpu : yes
    > fpu_exception : yes
    > cpuid level : 1
    > wp : yes
    > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
    > pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
    > bogomips : 2628.89
    > clflush size : 32
    > power management: ts
    >
    > and the PAT enabler patch that I apply locally to 2.6.26:
    >
    > diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c
    > b/arch/x86/kernel/cpu/addon_cpuid_features.c
    > index c2e1ce3..8992282 100644
    > --- a/arch/x86/kernel/cpu/addon_cpuid_features.c
    > +++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
    > @@ -55,7 +55,7 @@ void __cpuinit validate_pat_support(struct cpuinfo_x86 *c)
    > {
    > switch (c->x86_vendor) {
    > case X86_VENDOR_AMD:
    > - if (c->x86 >= 0xf && c->x86 <= 0x11)
    > + if (c->x86 == 6 || c->x86 >= 0xf)
    > return;
    > break;
    > case X86_VENDOR_INTEL:


    agreed - +12 seconds wait suggest some rather fundamental breakage. Did
    we go back to uncached for some critical display area that makes X start
    up (shut down) that slowly? Did we mark the BIOS uncacheable perhaps,
    causing X to execute BIOS code very slowly?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 15-08-08 16:22, Ingo Molnar wrote:

    > (more people Cc:-ed)


    Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449

    > agreed - +12 seconds wait suggest some rather fundamental breakage.
    > Did we go back to uncached for some critical display area that makes
    > X start up (shut down) that slowly? Did we mark the BIOS uncacheable
    > perhaps, causing X to execute BIOS code very slowly?


    Quite a lot "uncached-minus" in those lists. I am desperately trying to
    avoid a clue about mostly anything graphics related so, "I dunno".

    I haven't just disabled PAT yet (although I was about to just do so) and
    am available for testing.

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 15-08-08 17:24, Rene Herman wrote:

    > On 15-08-08 16:22, Ingo Molnar wrote:
    >
    >> (more people Cc:-ed)

    >
    > Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
    >
    >> agreed - +12 seconds wait suggest some rather fundamental breakage.
    >> Did we go back to uncached for some critical display area that makes
    >> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
    >> perhaps, causing X to execute BIOS code very slowly?

    >
    > Quite a lot "uncached-minus" in those lists. I am desperately trying to
    > avoid a clue about mostly anything graphics related so, "I dunno".
    >
    > I haven't just disabled PAT yet (although I was about to just do so) and
    > am available for testing.




    Additional observation with respect to first,next shutdown:

    With Option "AGPSize" "64", and booted with "nopat", X startup (from
    startx to functional desktop) is approximately 5 seconds,
    shutdown is 1 second as calibration times.

    Booted without "nopat", X startup seems to alternate between 10+ and 16+
    seconds and for shutdown -- the first shutdown after boot takes some 14
    seconds total, subsequent shutdowns settle at around 5 seconds.

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: AGP and PAT (induced?) problem (on AMD family 6)


    * Rene Herman wrote:

    > On 15-08-08 17:24, Rene Herman wrote:
    >
    >> On 15-08-08 16:22, Ingo Molnar wrote:
    >>
    >>> (more people Cc:-ed)

    >>
    >> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
    >>
    >>> agreed - +12 seconds wait suggest some rather fundamental breakage.
    >>> Did we go back to uncached for some critical display area that makes
    >>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
    >>> perhaps, causing X to execute BIOS code very slowly?

    >>
    >> Quite a lot "uncached-minus" in those lists. I am desperately trying to
    >> avoid a clue about mostly anything graphics related so, "I dunno".
    >>
    >> I haven't just disabled PAT yet (although I was about to just do so)
    >> and am available for testing.

    >
    >
    >
    > Additional observation with respect to first,next shutdown:
    >
    > With Option "AGPSize" "64", and booted with "nopat", X startup (from
    > startx to functional desktop) is approximately 5 seconds,
    > shutdown is 1 second as calibration times.
    >
    > Booted without "nopat", X startup seems to alternate between 10+ and
    > 16+ seconds and for shutdown -- the first shutdown after boot takes
    > some 14 seconds total, subsequent shutdowns settle at around 5
    > seconds.


    would it be possible to start up and shut down X in the slow case via
    strace, by doing something like this:

    strace -f -ttt -TTT -o trace.log startx

    and see which system calls (or other activities) took suspiciously long?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 19-08-08 12:26, Ingo Molnar wrote:

    > * Rene Herman wrote:
    >
    >> On 15-08-08 17:24, Rene Herman wrote:
    >>
    >>> On 15-08-08 16:22, Ingo Molnar wrote:
    >>>
    >>>> (more people Cc:-ed)
    >>> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
    >>>
    >>>> agreed - +12 seconds wait suggest some rather fundamental breakage.
    >>>> Did we go back to uncached for some critical display area that makes
    >>>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
    >>>> perhaps, causing X to execute BIOS code very slowly?
    >>> Quite a lot "uncached-minus" in those lists. I am desperately trying to
    >>> avoid a clue about mostly anything graphics related so, "I dunno".
    >>>
    >>> I haven't just disabled PAT yet (although I was about to just do so)
    >>> and am available for testing.

    >>
    >>
    >> Additional observation with respect to first,next shutdown:
    >>
    >> With Option "AGPSize" "64", and booted with "nopat", X startup (from
    >> startx to functional desktop) is approximately 5 seconds,
    >> shutdown is 1 second as calibration times.
    >>
    >> Booted without "nopat", X startup seems to alternate between 10+ and
    >> 16+ seconds and for shutdown -- the first shutdown after boot takes
    >> some 14 seconds total, subsequent shutdowns settle at around 5
    >> seconds.

    >
    > would it be possible to start up and shut down X in the slow case via
    > strace, by doing something like this:
    >
    > strace -f -ttt -TTT -o trace.log startx
    >
    > and see which system calls (or other activities) took suspiciously long?


    It wouldn't it seems. Root X (needed for the strace) works fine but
    started this way hangs indefinitely.

    I believe the 14 seconds for first shutdown to 5 later might be telling.
    Sounds like something might have fixed up uncached entries.

    I'd really like a reply from the AGP or PAT side right about now.

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:
    > On 19-08-08 12:26, Ingo Molnar wrote:
    >
    > > * Rene Herman wrote:
    > >
    > >> On 15-08-08 17:24, Rene Herman wrote:
    > >>
    > >>> On 15-08-08 16:22, Ingo Molnar wrote:
    > >>>
    > >>>> (more people Cc:-ed)
    > >>> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
    > >>>
    > >>>> agreed - +12 seconds wait suggest some rather fundamental breakage.
    > >>>> Did we go back to uncached for some critical display area that makes
    > >>>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
    > >>>> perhaps, causing X to execute BIOS code very slowly?
    > >>> Quite a lot "uncached-minus" in those lists. I am desperately trying to
    > >>> avoid a clue about mostly anything graphics related so, "I dunno".
    > >>>
    > >>> I haven't just disabled PAT yet (although I was about to just do so)
    > >>> and am available for testing.
    > >>
    > >>
    > >> Additional observation with respect to first,next shutdown:
    > >>
    > >> With Option "AGPSize" "64", and booted with "nopat", X startup (from
    > >> startx to functional desktop) is approximately 5 seconds,
    > >> shutdown is 1 second as calibration times.
    > >>
    > >> Booted without "nopat", X startup seems to alternate between 10+ and
    > >> 16+ seconds and for shutdown -- the first shutdown after boot takes
    > >> some 14 seconds total, subsequent shutdowns settle at around 5
    > >> seconds.

    > >
    > > would it be possible to start up and shut down X in the slow case via
    > > strace, by doing something like this:
    > >
    > > strace -f -ttt -TTT -o trace.log startx
    > >
    > > and see which system calls (or other activities) took suspiciously long?

    >
    > It wouldn't it seems. Root X (needed for the strace) works fine but
    > started this way hangs indefinitely.
    >
    > I believe the 14 seconds for first shutdown to 5 later might be telling.
    > Sounds like something might have fixed up uncached entries.
    >
    > I'd really like a reply from the AGP or PAT side right about now.
    >


    Hmm. Looks like there are more than 16000 entries in the PAT list!

    This delay may be due to the overhead of parsing this linked list everytime
    for a new entry, rather than any problem with cache setting itself.

    I am working on a patch to optimize this pat list parsing for the simple case.
    Should be able to send it out later today, for testing.

    Thanks,
    Venki

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 19-08-08 21:07, Venki Pallipadi wrote:

    > On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:


    >> I believe the 14 seconds for first shutdown to 5 later might be
    >> telling. Sounds like something might have fixed up uncached
    >> entries.
    >>
    >> I'd really like a reply from the AGP or PAT side right about now.

    >
    > Hmm. Looks like there are more than 16000 entries in the PAT list!
    >
    > This delay may be due to the overhead of parsing this linked list
    > everytime for a new entry, rather than any problem with cache setting
    > itself.
    >
    > I am working on a patch to optimize this pat list parsing for the
    > simple case. Should be able to send it out later today, for testing.


    Thanks for the reply. It's with 64MB of AGP memory which I guess is at
    the low end these days. Would your reply mean that basically everyone on
    2.6.27 should now be experiencing this?

    I noticed it was PAT related due to Shaohua Li's:

    http://marc.info/?l=linux-kernel&m=121783222306075&w=2

    which lists very different times (patch there did not help any).

    As another by the way, probably not surprising but I earlier also tried
    both unmounting and completely compiling out debugfs just in case I
    was seeing a debugging related sysmptom. No help either.

    It's evening here so I'll probably not be able to test until tomorrow.

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Tue, Aug 19, 2008 at 12:22:10PM -0700, Rene Herman wrote:
    > On 19-08-08 21:07, Venki Pallipadi wrote:
    >
    > > On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:

    >
    > >> I believe the 14 seconds for first shutdown to 5 later might be
    > >> telling. Sounds like something might have fixed up uncached
    > >> entries.
    > >>
    > >> I'd really like a reply from the AGP or PAT side right about now.

    > >
    > > Hmm. Looks like there are more than 16000 entries in the PAT list!
    > >
    > > This delay may be due to the overhead of parsing this linked list
    > > everytime for a new entry, rather than any problem with cache setting
    > > itself.
    > >
    > > I am working on a patch to optimize this pat list parsing for the
    > > simple case. Should be able to send it out later today, for testing.

    >
    > Thanks for the reply. It's with 64MB of AGP memory which I guess is at
    > the low end these days. Would your reply mean that basically everyone on
    > 2.6.27 should now be experiencing this?
    >
    > I noticed it was PAT related due to Shaohua Li's:
    >
    > http://marc.info/?l=linux-kernel&m=121783222306075&w=2
    >
    > which lists very different times (patch there did not help any).
    >
    > As another by the way, probably not surprising but I earlier also tried
    > both unmounting and completely compiling out debugfs just in case I
    > was seeing a debugging related sysmptom. No help either.
    >
    > It's evening here so I'll probably not be able to test until tomorrow.
    >


    Below is the patch I am testing. Let me know if this patch helps.

    Thanks,
    Venki


    Test patch. Adds cached_entry to list add routine, in order to speed up the
    lookup for sequential reserve_memtype calls.

    Signed-off-by: Venkatesh Pallipadi

    ---
    arch/x86/mm/pat.c | 33 +++++++++++++++++++++++++++++++--
    1 file changed, 31 insertions(+), 2 deletions(-)

    Index: linux-2.6/arch/x86/mm/pat.c
    ================================================== =================
    --- linux-2.6.orig/arch/x86/mm/pat.c 2008-08-19 15:21:07.000000000 -0700
    +++ linux-2.6/arch/x86/mm/pat.c 2008-08-19 16:00:52.000000000 -0700
    @@ -207,6 +207,9 @@ static int chk_conflict(struct memtype *
    return -EBUSY;
    }

    +static struct memtype *cached_entry;
    +static u64 cached_start;
    +
    /*
    * req_type typically has one of the:
    * - _PAGE_CACHE_WB
    @@ -280,11 +283,17 @@ int reserve_memtype(u64 start, u64 end,

    spin_lock(&memtype_lock);

    + if (cached_entry && start >= cached_start)
    + entry = cached_entry;
    + else
    + entry = list_entry(&memtype_list, struct memtype, nd);
    +
    /* Search for existing mapping that overlaps the current range */
    where = NULL;
    - list_for_each_entry(entry, &memtype_list, nd) {
    + list_for_each_entry_continue(entry, &memtype_list, nd) {
    if (end <= entry->start) {
    where = entry->nd.prev;
    + cached_entry = list_entry(where, struct memtype, nd);
    break;
    } else if (start <= entry->start) { /* end > entry->start */
    err = chk_conflict(new, entry, new_type);
    @@ -292,6 +301,8 @@ int reserve_memtype(u64 start, u64 end,
    dprintk("Overlap at 0x%Lx-0x%Lx\n",
    entry->start, entry->end);
    where = entry->nd.prev;
    + cached_entry = list_entry(where,
    + struct memtype, nd);
    }
    break;
    } else if (start < entry->end) { /* start > entry->start */
    @@ -299,7 +310,20 @@ int reserve_memtype(u64 start, u64 end,
    if (!err) {
    dprintk("Overlap at 0x%Lx-0x%Lx\n",
    entry->start, entry->end);
    - where = &entry->nd;
    + cached_entry = list_entry(entry->nd.prev,
    + struct memtype, nd);
    +
    + /*
    + * Move to right position in the linked
    + * list to add this new entry
    + */
    + list_for_each_entry_continue(entry,
    + &memtype_list, nd) {
    + if (start <= entry->start) {
    + where = entry->nd.prev;
    + break;
    + }
    + }
    }
    break;
    }
    @@ -314,6 +338,8 @@ int reserve_memtype(u64 start, u64 end,
    return err;
    }

    + cached_start = start;
    +
    if (where)
    list_add(&new->nd, where);
    else
    @@ -343,6 +369,9 @@ int free_memtype(u64 start, u64 end)
    spin_lock(&memtype_lock);
    list_for_each_entry(entry, &memtype_list, nd) {
    if (entry->start == start && entry->end == end) {
    + if (cached_entry == entry || cached_start == start)
    + cached_entry = NULL;
    +
    list_del(&entry->nd);
    kfree(entry);
    err = 0;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: AGP and PAT (induced?) problem (on AMD family 6)


    * Venki Pallipadi wrote:

    > > I'd really like a reply from the AGP or PAT side right about now.

    >
    > Hmm. Looks like there are more than 16000 entries in the PAT list!


    hm, btw., why is that?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: AGP and PAT (induced?) problem (on AMD family 6)


    * Venki Pallipadi wrote:

    > Below is the patch I am testing. Let me know if this patch helps.


    i've queued this fix up in tip/x86/urgent for more testing - as ~10
    seconds delays are serious enough to warrant a quick fix.

    Rene, you might want to try tip/master, which has this integrated as
    well:

    http://people.redhat.com/mingo/tip.git/README

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 20-08-08 12:04, Ingo Molnar wrote:

    > * Venki Pallipadi wrote:
    >
    >>> I'd really like a reply from the AGP or PAT side right about now.

    >> Hmm. Looks like there are more than 16000 entries in the PAT list!

    >
    > hm, btw., why is that?


    Because 64M of AGP memory divided by 4K pages is 16K. That is, the
    underlying problem seems to be AGP drivers using order 0 allocations.
    I'm looking.

    Do note also that this means that Venki's change would not constitite a
    correct/final fix. Sure, caching the last entry speeds up traversing a
    16K entry list but the issue is that there shouldn't be a 16K entry
    list. Through AGP, or maybe even by coalescing entries in the PAT list
    if that's at all possible (I guess it's not really).

    Even if such a more fundamental fix isn't (easily) available, the PAT
    code already comments that the list, which is sorted by ->start value,
    is expected to be short, and should be turned into an rbtree if it isn't
    which might be slightly less of a bandaid.

    Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
    seems so I've added Dave Jones for a possible comment from the AGP side.
    If I'm reading this right upto now, still many AGP driver (among which
    my amd-k7-agp) are affected.

    In the short run and if I'm not just mistaken, the best fix might be to
    make PAT dependent on not having a dumb AGP driver (but as said, still
    looking).

    Note that my chipset is capable of a 2G AGP aperture. That's 512K pages
    if fully used, 256K for 1G, 128K for 512M, ...

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 20-08-08 12:50, Rene Herman wrote:

    > On 20-08-08 12:04, Ingo Molnar wrote:
    >
    >> * Venki Pallipadi wrote:
    >>
    >>>> I'd really like a reply from the AGP or PAT side right about now.
    >>> Hmm. Looks like there are more than 16000 entries in the PAT list!

    >>
    >> hm, btw., why is that?

    >
    > Because 64M of AGP memory divided by 4K pages is 16K. That is, the
    > underlying problem seems to be AGP drivers using order 0 allocations.
    > I'm looking.
    >
    > Do note also that this means that Venki's change would not constitite a
    > correct/final fix. Sure, caching the last entry speeds up traversing a
    > 16K entry list but the issue is that there shouldn't be a 16K entry
    > list. Through AGP, or maybe even by coalescing entries in the PAT list
    > if that's at all possible (I guess it's not really).
    >
    > Even if such a more fundamental fix isn't (easily) available, the PAT
    > code already comments that the list, which is sorted by ->start value,
    > is expected to be short, and should be turned into an rbtree if it isn't
    > which might be slightly less of a bandaid.
    >
    > Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
    > seems so I've added Dave Jones for a possible comment from the AGP side.
    > If I'm reading this right upto now, still many AGP driver (among which
    > my amd-k7-agp) are affected.


    This was based on a wrong reading; I was looking at the GATT allocation.

    I'm giving up looking until someone can tell me whether or not those 16K
    entries are expected though. I have just one AGP card in a PAT capable
    machine.

    How many entries in /debug/x86/pat_memtype_list are there on other AGP
    systems with Option "AGPSize" "64" in their xorg.conf:"Device" section
    (and their AGP aperture set to 64M or bigger in the BIOS)?

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Wed, Aug 20, 2008 at 07:27:22AM -0700, Rene Herman wrote:
    > On 20-08-08 12:50, Rene Herman wrote:
    >
    > > On 20-08-08 12:04, Ingo Molnar wrote:
    > >
    > >> * Venki Pallipadi wrote:
    > >>
    > >>>> I'd really like a reply from the AGP or PAT side right about now.
    > >>> Hmm. Looks like there are more than 16000 entries in the PAT list!
    > >>
    > >> hm, btw., why is that?

    > >
    > > Because 64M of AGP memory divided by 4K pages is 16K. That is, the
    > > underlying problem seems to be AGP drivers using order 0 allocations.
    > > I'm looking.
    > >
    > > Do note also that this means that Venki's change would not constitite a
    > > correct/final fix. Sure, caching the last entry speeds up traversing a
    > > 16K entry list but the issue is that there shouldn't be a 16K entry
    > > list. Through AGP, or maybe even by coalescing entries in the PAT list
    > > if that's at all possible (I guess it's not really).
    > >
    > > Even if such a more fundamental fix isn't (easily) available, the PAT
    > > code already comments that the list, which is sorted by ->start value,
    > > is expected to be short, and should be turned into an rbtree if it isn't
    > > which might be slightly less of a bandaid.
    > >
    > > Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
    > > seems so I've added Dave Jones for a possible comment from the AGP side.
    > > If I'm reading this right upto now, still many AGP driver (among which
    > > my amd-k7-agp) are affected.

    >
    > This was based on a wrong reading; I was looking at the GATT allocation.
    >
    > I'm giving up looking until someone can tell me whether or not those 16K
    > entries are expected though. I have just one AGP card in a PAT capable
    > machine.
    >


    OK. I have reproduced this list size issue locally and this order 1
    allocation and set_memory_uc on that allocation is actually coming from
    agp_allocate_memory() -> agp_generic_alloc_page() -> map_page_into_agp()
    agp_allocate_memory breaks higher order page requests into order 1 allocs.

    On my system I see multiple agp_allocate_memory requests for nrpages
    8841, 1020, 16, 2160, 2160, 8192. Together they end up resulting in
    more than 22K entries in PAT pages.

    Thanks,
    Venki

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Wed, Aug 20, 2008 at 8:50 PM, Rene Herman wrote:
    > On 20-08-08 12:04, Ingo Molnar wrote:
    >
    >> * Venki Pallipadi wrote:
    >>
    >>>> I'd really like a reply from the AGP or PAT side right about now.
    >>>
    >>> Hmm. Looks like there are more than 16000 entries in the PAT list!

    >>
    >> hm, btw., why is that?

    >
    > Because 64M of AGP memory divided by 4K pages is 16K. That is, the
    > underlying problem seems to be AGP drivers using order 0 allocations. I'm
    > looking.
    >
    > Do note also that this means that Venki's change would not constitite a
    > correct/final fix. Sure, caching the last entry speeds up traversing a 16K
    > entry list but the issue is that there shouldn't be a 16K entry list.
    > Through AGP, or maybe even by coalescing entries in the PAT list if that's
    > at all possible (I guess it's not really).
    >
    > Even if such a more fundamental fix isn't (easily) available, the PAT code
    > already comments that the list, which is sorted by ->start value, is
    > expected to be short, and should be turned into an rbtree if it isn't which
    > might be slightly less of a bandaid.
    >
    > Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
    > seems so I've added Dave Jones for a possible comment from the AGP side.
    > If I'm reading this right upto now, still many AGP driver (among which my
    > amd-k7-agp) are affected.


    I haven't anything to add, I'm the maintainer not the author, all the
    people who wrote the offending code were
    already involved.

    Dave.
    >
    > In the short run and if I'm not just mistaken, the best fix might be to make
    > PAT dependent on not having a dumb AGP driver (but as said, still looking).
    >
    > Note that my chipset is capable of a 2G AGP aperture. That's 512K pages if
    > fully used, 256K for 1G, 128K for 512M, ...
    >
    > Rene.
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On 20-08-08 23:02, Dave Airlie wrote:

    > On Wed, Aug 20, 2008 at 8:50 PM, Rene Herman
    > wrote:
    >> On 20-08-08 12:04, Ingo Molnar wrote:
    >>
    >>> * Venki Pallipadi wrote:
    >>>
    >>>>> I'd really like a reply from the AGP or PAT side right about
    >>>>> now.
    >>>> Hmm. Looks like there are more than 16000 entries in the PAT
    >>>> list!
    >>> hm, btw., why is that?

    >> Because 64M of AGP memory divided by 4K pages is 16K. That is, the
    >> underlying problem seems to be AGP drivers using order 0
    >> allocations. I'm looking.


    [ ... ]

    > I haven't anything to add, I'm the maintainer not the author, all the
    > people who wrote the offending code were already involved.


    The underlying problem is the order 0 allocations (agp_allocate_memory
    --> agp_generic_allocate_page) where each single page is set uncached
    individually, creating a PAT entry.

    Non order 0 allocations generally would ofcourse help. That's very much
    AGP internal -- do you feel that's the way to go?

    All the current AGP drivers except sgi-agp use agp_generic_alloc_page().

    Doing a quick local hack to collect pages in agp_allocate_memory() into
    regions and set the regions (generally 1) UC in one fell swoop, but I
    don't know if that's safe (and it feels like a rather poor hack anyway).

    (not to mention that it's time for bed again).

    Rene.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: AGP and PAT (induced?) problem (on AMD family 6)

    On Thu, Aug 21, 2008 at 7:40 AM, Rene Herman wrote:
    > On 20-08-08 21:41, Venki Pallipadi wrote:
    >
    >> OK. I have reproduced this list size issue locally and this order 1
    >> allocation and set_memory_uc on that allocation is actually coming
    >> from agp_allocate_memory() -> agp_generic_alloc_page() ->
    >> map_page_into_agp() agp_allocate_memory breaks higher order page
    >> requests into order 1 allocs.
    >>
    >> On my system I see multiple agp_allocate_memory requests for nrpages 8841,
    >> 1020, 16, 2160, 2160, 8192. Together they end up resulting in more than 22K
    >> entries in PAT pages.

    >
    > Okay, thanks for the confirmation.
    >
    > Now, how to fix...
    >
    > Firstly, it seems we can conclude that any expectancy of a short PAT list is
    > simply destroyed by AGP. I believe the best thing migh be to look into
    > "fixing" AGP rather than PAT for now?
    >
    > In a sense the entire purpose of the AGP GART is collecting non contiguous
    > pages but given that in practice it's generally still just one or at most a
    > few regions, going to multi-page allocs sounds most appetising to me.
    >
    > All in tree AGP drivers except sgi-agp use agp_generic_alloc_page(), ali via
    > m1541_alloc_page and i460 via i460_alloc_page.


    In the future we will be getting more smaller AGP allocs, so the other
    problem needs a fix as well.

    http://git.kernel.org/?p=linux/kerne...=agp-pageattr2

    contains some code I started on before that moves the interfaces
    around, Shaohua has been looking at
    it as it needs the changes to the set_pages interface as well, which
    is where I ran out of time/steam last time.

    However with alloc/free pages we could change to a higher order
    allocation function as long as it fell back to lower
    orders internally.

    Dave.

    >
    > Rene.
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast