Page swap allocation failure 2.6.25 - Kernel

This is a discussion on Page swap allocation failure 2.6.25 - Kernel ; Hi For a while now I have been receiving page swap allocation failures Similar to http://lkml.org/lkml/2008/6/10/3 and http://lkml.org/lkml/2008/2/19/298 and I have filed a bug with debian ( Bug#486300) It seems like any time I put the system under load, transferring ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Page swap allocation failure 2.6.25

  1. Page swap allocation failure 2.6.25

    Hi


    For a while now I have been receiving page swap allocation failures


    Similar to http://lkml.org/lkml/2008/6/10/3 and
    http://lkml.org/lkml/2008/2/19/298 and I have filed a bug with debian (
    Bug#486300)


    It seems like any time I put the system under load, transferring large
    files across the network (1G nic, a r8186 and forcedeth and a
    broadcom). I keep getting these errors


    this will writing to a XFS parition
    Jul 13 13:28:30 nas kernel: [ 648.120681] md2_raid5: page allocation
    failure. order:2, mode:0x20
    Jul 13 13:28:30 nas kernel: [ 648.120688] Pid: 1042, comm: md2_raid5
    Not tainted 2.6.25-2-amd64 #1
    Jul 13 13:28:30 nas kernel: [ 648.120689]
    Jul 13 13:28:30 nas kernel: [ 648.120690] Call Trace:
    Jul 13 13:28:30 nas kernel: [ 648.120694] []
    __alloc_pages+0x2f8/0x312
    Jul 13 13:28:30 nas kernel: [ 648.120729] []
    kmem_getpages+0xc5/0x193
    Jul 13 13:28:30 nas kernel: [ 648.120733] []
    fallback_alloc+0x147/0x1c0
    Jul 13 13:28:30 nas kernel: [ 648.120740] []
    kmem_cache_alloc_node+0x105/0x138
    Jul 13 13:28:30 nas kernel: [ 648.120746] []
    __alloc_skb+0x64/0x12d
    Jul 13 13:28:30 nas kernel: [ 648.120756] []
    :r8168:rtl8168_rx_fill+0x64/0x106
    Jul 13 13:28:30 nas kernel: [ 648.120765] []
    :r8168:rtl8168_rx_interrupt+0x324/0x392
    Jul 13 13:28:30 nas kernel: [ 648.120774] []
    :r8168:rtl8168_poll+0x36/0x184
    Jul 13 13:28:30 nas kernel: [ 648.120783] []
    net_rx_action+0xab/0x18c
    Jul 13 13:28:30 nas kernel: [ 648.120790] []
    __do_softirq+0x5c/0xd1
    Jul 13 13:28:30 nas kernel: [ 648.120793] []
    ack_apic_level+0x38/0xd8
    Jul 13 13:28:30 nas kernel: [ 648.120799] []
    call_softirq+0x1c/0x28
    Jul 13 13:28:30 nas kernel: [ 648.120804] []
    do_softirq+0x3c/0x81
    Jul 13 13:28:30 nas kernel: [ 648.120806] []
    irq_exit+0x3f/0x83
    Jul 13 13:28:30 nas kernel: [ 648.120809] []
    do_IRQ+0xb9/0xd9
    Jul 13 13:28:30 nas kernel: [ 648.120813] []
    ret_from_intr+0x0/0x19
    Jul 13 13:28:30 nas kernel: [ 648.120815] []
    :async_memcpy:async_memcpy+0x98/0xe0
    Jul 13 13:28:30 nas kernel: [ 648.120833] []
    :raid456:async_copy_data+0xf0/0x128
    Jul 13 13:28:30 nas kernel: [ 648.120844] []
    :raid456:raid5_run_ops+0x1f6/0x4d4
    Jul 13 13:28:30 nas kernel: [ 648.120854] []
    :raid456:handle_stripe5+0xaae/0xac5
    Jul 13 13:28:30 nas kernel: [ 648.120866] []
    :raid456:handle_stripe+0xd1a/0xd96
    Jul 13 13:28:30 nas kernel: [ 648.120871] []
    __wake_up_common+0x41/0x74
    Jul 13 13:28:30 nas kernel: [ 648.120878] []
    __wake_up+0x38/0x4e
    Jul 13 13:28:30 nas kernel: [ 648.120890] []
    :raid456:raid5d+0x306/0x316
    Jul 13 13:28:30 nas kernel: [ 648.120908] []
    :md_mod:md_thread+0xd7/0xed
    Jul 13 13:28:30 nas kernel: [ 648.120913] []
    autoremove_wake_function+0x0/0x2e
    Jul 13 13:28:30 nas kernel: [ 648.120923] []
    :md_mod:md_thread+0x0/0xed
    Jul 13 13:28:30 nas kernel: [ 648.120925] []
    kthread+0x47/0x74
    Jul 13 13:28:30 nas kernel: [ 648.120928] []
    schedule_tail+0x27/0x5c
    Jul 13 13:28:30 nas kernel: [ 648.120931] []
    child_rip+0xa/0x12
    Jul 13 13:28:30 nas kernel: [ 648.120941] []
    kthread+0x0/0x74
    Jul 13 13:28:30 nas kernel: [ 648.120943] []
    child_rip+0x0/0x12
    Jul 13 13:28:30 nas kernel: [ 648.120946]
    Jul 13 13:28:30 nas kernel: [ 648.120947] Mem-info:
    Jul 13 13:28:30 nas kernel: [ 648.120949] Node 0 DMA per-cpu:
    Jul 13 13:28:30 nas kernel: [ 648.120951] CPU 0: hi: 0, btch: 1
    usd: 0
    Jul 13 13:28:30 nas kernel: [ 648.120952] CPU 1: hi: 0, btch: 1
    usd: 0
    Jul 13 13:28:30 nas kernel: [ 648.120954] Node 0 DMA32 per-cpu:
    Jul 13 13:28:30 nas kernel: [ 648.120955] CPU 0: hi: 186, btch: 31
    usd: 127
    Jul 13 13:28:30 nas kernel: [ 648.120957] CPU 1: hi: 186, btch: 31
    usd: 191
    Jul 13 13:28:30 nas kernel: [ 648.120958] Node 0 Normal per-cpu:
    Jul 13 13:28:30 nas kernel: [ 648.120960] CPU 0: hi: 186, btch: 31
    usd: 89
    Jul 13 13:28:30 nas kernel: [ 648.120962] CPU 1: hi: 186, btch: 31
    usd: 183
    Jul 13 13:28:30 nas kernel: [ 648.120964] Active:10959 inactive:917990
    dirty:65346 writeback:3076 unstable:0
    Jul 13 13:28:30 nas kernel: [ 648.120965] free:5002 slab:37834
    mapped:2069 pagetables:539 bounce:0
    Jul 13 13:28:30 nas kernel: [ 648.120967] Node 0 DMA free:11944kB
    min:20kB low:24kB high:28kB active:0kB inactive:0kB present:11384kB
    pages_scanned:0 all_unreclaimable? yes
    Jul 13 13:28:30 nas kernel: [ 648.120971] lowmem_reserve[]: 0 3254 3885
    3885
    Jul 13 13:28:30 nas kernel: [ 648.120973] Node 0 DMA32 free:7400kB
    min:6668kB low:8332kB high:10000kB active:8468kB inactive:3134788kB
    present:3332192kB pages_scanned:0 all_unreclaimable? no
    Jul 13 13:28:30 nas kernel: [ 648.120977] lowmem_reserve[]: 0 0 631 631
    Jul 13 13:28:30 nas kernel: [ 648.120979] Node 0 Normal free:664kB
    min:1292kB low:1612kB high:1936kB active:35368kB inactive:537172kB
    present:646400kB pages_scanned:0 all_unreclaimable? no
    Jul 13 13:28:30 nas kernel: [ 648.120983] lowmem_reserve[]: 0 0 0 0
    Jul 13 13:28:30 nas kernel: [ 648.120985] Node 0 DMA: 4*4kB 5*8kB
    3*16kB 4*32kB 3*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 0*2048kB 2*4096kB
    = 11944kB
    Jul 13 13:28:30 nas kernel: [ 648.120992] Node 0 DMA32: 1151*4kB
    273*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB
    0*4096kB = 7316kB
    Jul 13 13:28:30 nas kernel: [ 648.120997] Node 0 Normal: 120*4kB 1*8kB
    0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB
    = 616kB
    Jul 13 13:28:30 nas kernel: [ 648.121004] 921869 total pagecache pages
    Jul 13 13:28:30 nas kernel: [ 648.121005] Swap cache: add 31, delete 0,
    find 0/0
    Jul 13 13:28:30 nas kernel: [ 648.121007] Free swap = 4194172kB
    Jul 13 13:28:30 nas kernel: [ 648.121008] Total swap = 4194296kB
    Jul 13 13:28:30 nas kernel: [ 648.121009] Free swap: 4194172kB
    Jul 13 13:28:30 nas kernel: [ 648.124409] 1015808 pages of RAM
    Jul 13 13:28:30 nas kernel: [ 648.124409] 32483 reserved pages
    Jul 13 13:28:30 nas kernel: [ 648.124409] 907806 pages shared
    Jul 13 13:28:30 nas kernel: [ 648.124409] 31 pages swap cached

    I also have nfsd causing the same problems and sshd when i use scp.


    I have pages and pages of these dumps

    Help

    alex



    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAkh5y0MACgkQkZz88chpJ2OYMACg62Ywdmmizs ICbCArHc+LQ881
    /9MAnAz8kt9PW2Xgr/AxU5w7sdP8Wm73
    =PngP
    -----END PGP SIGNATURE-----


  2. Re: Page swap allocation failure 2.6.25

    Alex Samad :
    [...]
    > For a while now I have been receiving page swap allocation failures
    >
    >
    > Similar to http://lkml.org/lkml/2008/6/10/3 and


    Order 0 failure. Your is an order 2 one.

    > http://lkml.org/lkml/2008/2/19/298


    Order 3 failure which was fixed with the e1000e driver.

    > and I have filed a bug with debian (Bug#486300)
    >
    >
    > It seems like any time I put the system under load, transferring large
    > files across the network (1G nic, a r8186 and forcedeth and a
    > broadcom). I keep getting these errors


    May I assume that you are working with a MTU greater than 1500 bytes on
    each interface ? If so plese add netdev@vger.kernel.org to the Cc: and
    remove linux-kernel@ from the Cc:.

    [...]
    > Jul 13 13:28:30 nas kernel: [ 648.120756] []
    > :r8168:rtl8168_rx_fill+0x64/0x106


    It looks more like Realtek's out-of-tree driver than like the in-kernel
    one. Is it a customised kernel ?

    [...]
    > Help


    Don't panic.

    --
    Ueimor
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Page swap allocation failure 2.6.25

    On Sun, Jul 13, 2008 at 01:02:22PM +0200, Francois Romieu wrote:
    > Alex Samad :
    > [...]
    > > For a while now I have been receiving page swap allocation failures
    > >
    > >
    > > Similar to http://lkml.org/lkml/2008/6/10/3 and

    >
    > Order 0 failure. Your is an order 2 one.
    >
    > > http://lkml.org/lkml/2008/2/19/298

    >
    > Order 3 failure which was fixed with the e1000e driver.



    not sure about these, I will take your word for it.

    >
    > > and I have filed a bug with debian (Bug#486300)
    > >
    > >
    > > It seems like any time I put the system under load, transferring large
    > > files across the network (1G nic, a r8186 and forcedeth and a
    > > broadcom). I keep getting these errors

    >
    > May I assume that you are working with a MTU greater than 1500 bytes on
    > each interface ? If so plese add netdev@vger.kernel.org to the Cc: and
    > remove linux-kernel@ from the Cc:.


    I have 3 boxes, 2 are setup with > 1500 mtu and 1 isn't (the one with
    the r8186 driver), I have tested with >1500 mtu and with mtu = 1500 with
    the same result.

    >
    > [...]
    > > Jul 13 13:28:30 nas kernel: [ 648.120756] []
    > > :r8168:rtl8168_rx_fill+0x64/0x106

    >
    > It looks more like Realtek's out-of-tree driver than like the in-kernel
    > one. Is it a customised kernel ?

    The kernel is a stock debian amd64 kernel, not customised by me.

    I did build the r8168 from the realtek site.

    bit more info on the setup

    I have 2 laptops (both HP's), 1(A) running Vista 1(B) running Debian lenny/sid
    (2.6.25). I have three servers 2 shuttles (forcedeth) (multimedia & hufpuf ) 1 gigabyte
    (realtek) (nas).

    The nas box is the one I coped the error from the syslog. it is
    primarily a nfs nas. Hufpuf is the samba box, it used to be the nas
    box. it currently mounts a few (large) shares from nas. Multimedia is a
    backup server.

    A & B & NAS have 1500 MTU

    multimedia and hufpuf can run with 9100 mtu

    I have tried
    i) coping files from A to hufpuf (smb) which then sends it on to nas via
    nfs
    ii) copy files from B to nas (nfs)
    iii) scp from B to hufpuf and then on to nas via nfs
    iv) scp from B to nas
    v) scp from hufpuf to nas
    vi) scp from hufpuf to multimedia
    vii) scp from multimedia to nas
    viii) hufpuf nfs to nas
    ix) multimedia nfs to nas

    all of these have caused these errors.

    when I was testing again today, I noticed when I was coping from A to
    hufpuf and then onto nas. that smaller files say < 200M would go okay,
    anything greater (or if the total of the files was greater) then I would
    start to get the errors.


    >
    > [...]
    > > Help

    >
    > Don't panic.

    not panicing yet but I am a bit concerned. the data seems to be okay
    even after these errors


    >
    > --
    > Ueimor
    >


    --
    "You see, the Senate wants to take away some of the powers of the administrative branch."

    - George W. Bush
    09/19/2002
    Washington, DC

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAkh569QACgkQkZz88chpJ2OeUwCg1nXz1Vit3z JuX1t3lPfxnF7A
    HnkAoJMNQX+NeoHji09dlr1udGpTHCEn
    =caHr
    -----END PGP SIGNATURE-----


  4. Re: Page swap allocation failure 2.6.25

    On Sun, Jul 13, 2008 at 09:49:44PM +1000, Alex Samad wrote:
    > On Sun, Jul 13, 2008 at 01:02:22PM +0200, Francois Romieu wrote:
    > > Alex Samad :
    > > [...]
    > > > For a while now I have been receiving page swap allocation failures
    > > >
    > > >
    > > > Similar to http://lkml.org/lkml/2008/6/10/3 and

    > >
    > > Order 0 failure. Your is an order 2 one.
    > >
    > > > http://lkml.org/lkml/2008/2/19/298

    > >
    > > Order 3 failure which was fixed with the e1000e driver.

    >
    >
    > not sure about these, I will take your word for it.
    >
    > >
    > > > and I have filed a bug with debian (Bug#486300)
    > > >
    > > >
    > > > It seems like any time I put the system under load, transferring large
    > > > files across the network (1G nic, a r8186 and forcedeth and a
    > > > broadcom). I keep getting these errors

    > >
    > > May I assume that you are working with a MTU greater than 1500 bytes on
    > > each interface ? If so plese add netdev@vger.kernel.org to the Cc: and
    > > remove linux-kernel@ from the Cc:.

    >
    > I have 3 boxes, 2 are setup with > 1500 mtu and 1 isn't (the one with
    > the r8186 driver), I have tested with >1500 mtu and with mtu = 1500 with
    > the same result.
    >
    > >
    > > [...]
    > > > Jul 13 13:28:30 nas kernel: [ 648.120756] []
    > > > :r8168:rtl8168_rx_fill+0x64/0x106

    > >
    > > It looks more like Realtek's out-of-tree driver than like the in-kernel
    > > one. Is it a customised kernel ?

    > The kernel is a stock debian amd64 kernel, not customised by me.
    >
    > I did build the r8168 from the realtek site.
    >
    > bit more info on the setup
    >
    > I have 2 laptops (both HP's), 1(A) running Vista 1(B) running Debian lenny/sid
    > (2.6.25). I have three servers 2 shuttles (forcedeth) (multimedia & hufpuf ) 1 gigabyte
    > (realtek) (nas).
    >
    > The nas box is the one I coped the error from the syslog. it is
    > primarily a nfs nas. Hufpuf is the samba box, it used to be the nas
    > box. it currently mounts a few (large) shares from nas. Multimedia is a
    > backup server.
    >
    > A & B & NAS have 1500 MTU
    >
    > multimedia and hufpuf can run with 9100 mtu
    >
    > I have tried
    > i) coping files from A to hufpuf (smb) which then sends it on to nas via
    > nfs
    > ii) copy files from B to nas (nfs)
    > iii) scp from B to hufpuf and then on to nas via nfs
    > iv) scp from B to nas
    > v) scp from hufpuf to nas
    > vi) scp from hufpuf to multimedia
    > vii) scp from multimedia to nas
    > viii) hufpuf nfs to nas
    > ix) multimedia nfs to nas
    >
    > all of these have caused these errors.
    >
    > when I was testing again today, I noticed when I was coping from A to
    > hufpuf and then onto nas. that smaller files say < 200M would go okay,
    > anything greater (or if the total of the files was greater) then I would
    > start to get the errors.
    >


    I have done some more testing, I found that I had this line in my
    sysctl.conf ( a hand over from a long ago)

    net.ipv4.tcp_rmem = 4096 87380 2097152

    this was in my 2 servers multimedia and hufpuf (forcedeth), I have
    removed these and gone back to defaults.

    Running a quick test scp'ing from the nas box to multimedia and to
    hufpuf, doesn't cause any page faults, but scp to the nas box causes
    more page faults. I tried scping between multimedia and hufpuf with
    jumbo frames and that went all okay.

    So it looks like it might be the 8186 drivers, that being the case I
    will cc netdev@vger.kernel.org. I will leave linux-kernel still here for
    a trial

    thanks



    >
    > >
    > > [...]
    > > > Help

    > >
    > > Don't panic.

    > not panicing yet but I am a bit concerned. the data seems to be okay
    > even after these errors


    thanks

    >
    >
    > >
    > > --
    > > Ueimor
    > >

    >
    > --
    > "You see, the Senate wants to take away some of the powers of the administrative branch."
    >
    > - George W. Bush
    > 09/19/2002
    > Washington, DC




    --
    "See, free nations are peaceful nations. Free nations don't attack each other. Free nations don't develop weapons of mass destruction. "

    - George W. Bush
    10/03/2003
    Milwaukee, WI

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAkh6sosACgkQkZz88chpJ2ObRQCdHxTHEUVadA 4tZjb2dYeXDdyI
    52wAoJru0wAoLgb1AQDtlYNw5HFTHGLh
    =cOhE
    -----END PGP SIGNATURE-----


+ Reply to Thread