NFS regression? Odd delays and lockups accessing an NFS export. - Kernel

This is a discussion on NFS regression? Odd delays and lockups accessing an NFS export. - Kernel ; On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote: > > Looks like you ran this on the client. Sorry, Ian, I should have been > more specific. You need to modify the rpc_debug file on the server. I tried ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 41 to 59 of 59

Thread: NFS regression? Odd delays and lockups accessing an NFS export.

  1. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote:
    >
    > Looks like you ran this on the client. Sorry, Ian, I should have been
    > more specific. You need to modify the rpc_debug file on the server.


    I tried this on the server but it's pretty crippling (the server is
    quite weedy, 300MHz K3 or something).

    I'll leave it logging overnight since things should be pretty quiet (and
    that's often when the problem occurs) but if there's a less aggressive
    setting than 256 but which would still be useful I could leave it on
    permanently until the problem happens.

    Ian.
    --
    Ian Campbell

    Marriages are made in heaven and consummated on earth.
    -- John Lyly

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAki7CqEACgkQM0+0qS9rzVlzCwCgmRve7F25dz DC+NeywP0hF4gr
    F+cAoLhNPR1btyuZkr31AfLYHwg6pQAZ
    =9h3S
    -----END PGP SIGNATURE-----


  2. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    Ian Campbell wrote:
    > On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote:
    >
    >> Looks like you ran this on the client. Sorry, Ian, I should have been
    >> more specific. You need to modify the rpc_debug file on the server.
    >>

    >
    > I tried this on the server but it's pretty crippling (the server is
    > quite weedy, 300MHz K3 or something).
    >
    > I'll leave it logging overnight since things should be pretty quiet (and
    > that's often when the problem occurs) but if there's a less aggressive
    > setting than 256 but which would still be useful I could leave it on
    > permanently until the problem happens.
    >
    >

    Thanks Ian. Unfortunately, that's as fine grained as it gets. 256
    (0x100) is the bit for transport logging.
    > Ian.
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 2008-09-01 at 12:20 -0500, Tom Tucker wrote:
    > Thanks Ian. Unfortunately, that's as fine grained as it gets. 256
    > (0x100) is the bit for transport logging.


    Doh, of course, I was thinking of 255...

    No repro today. I'll let you know when I see something.

    Ian.

    --
    Ian Campbell

    History is nothing but a collection of fables and useless trifles,
    cluttered up with a mass of unnecessary figures and proper names.
    -- Leo Tolstoy

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAki8KlsACgkQM0+0qS9rzVkPHQCgsAOGAHSt3e YeEJudAZWeR5KQ
    ma4AoNoZlWra64hyjHMdTY5jc3cYWrtI
    =pHPO
    -----END PGP SIGNATURE-----


  4. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 2008-09-01 at 12:20 -0500, Tom Tucker wrote:
    > Ian Campbell wrote:
    > > On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote:
    > >
    > >> Looks like you ran this on the client. Sorry, Ian, I should have been
    > >> more specific. You need to modify the rpc_debug file on the server.
    > >>

    > >
    > > I tried this on the server but it's pretty crippling (the server is
    > > quite weedy, 300MHz K3 or something).
    > >
    > > I'll leave it logging overnight since things should be pretty quiet (and
    > > that's often when the problem occurs) but if there's a less aggressive
    > > setting than 256 but which would still be useful I could leave it on
    > > permanently until the problem happens.
    > >
    > >

    > Thanks Ian. Unfortunately, that's as fine grained as it gets. 256
    > (0x100) is the bit for transport logging.


    Still no repro with the debugging enabled I'm afraid.

    I'm wondering if the performance hit makes it unlikely to trigger since
    everything slows to a crawl. I haven't been able to leave it on 24/7
    since it makes all the NFS clients nearly unusable but I've had it on
    overnight and when there's no one around etc, which is when the hang
    generally seems to occur.

    I forgot to re-enable it one evening and, perhaps inevitably, it crashed
    overnight. I turned the debugging on and captured the following. It's
    not much since syslog had given up for some reason, probably when the
    disk filled up with logs. 192.168.1.5 is a non-problematic client so I
    don't think the log will be useful since the problematic client
    (192.168.1.4) isn't mentioned..

    Ian.

    vc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.094587] svc: TCP record, 140 bytes
    [4009900.098604] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    [4009900.102587] svc: TCP complete record (140 bytes)
    [4009900.106592] svc: transport ee5fd600 put into queue
    [4009900.110613] svc: got len=140
    [4009900.114645] svc: transport ee5fd600 busy, not enqueued
    [4009900.118752] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.122636] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.126588] svc: transport ee5fd600 busy, not enqueued
    [4009900.130601] svc: transport ee5fd600 busy, not enqueued
    [4009900.134593] svc: server eefc4000 waiting for data (to = 900000)
    [4009900.162711] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.166598] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    [4009900.170594] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.174611] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.178592] svc: TCP record, 136 bytes
    [4009900.182606] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    [4009900.186592] svc: TCP complete record (136 bytes)
    [4009900.190599] svc: transport ee5fd600 put into queue
    [4009900.194605] svc: got len=136
    [4009900.198632] svc: transport ee5fd600 busy, not enqueued
    [4009900.202717] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009900.206664] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.210594] svc: transport ee5fd600 busy, not enqueued
    [4009900.214611] svc: transport ee5fd600 busy, not enqueued
    [4009900.218599] svc: server eef4a000 waiting for data (to = 900000)
    [4009900.254640] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.258603] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    [4009900.262601] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.266625] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.270598] svc: TCP record, 128 bytes
    [4009900.274618] svc: socket ee5fd600 recvfrom(dd54d080, 3968) = 128
    [4009900.278598] svc: TCP complete record (128 bytes)
    [4009900.282605] svc: transport ee5fd600 put into queue
    [4009900.286611] svc: got len=128
    [4009900.290642] svc: transport ee5fd600 busy, not enqueued
    [4009900.294788] svc: socket ee5fd600 sendto([efb5c000 116... ], 116) = 116 (addr 192.168.1.5, port=910)
    [4009900.298707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.302600] svc: transport ee5fd600 busy, not enqueued
    [4009900.306622] svc: transport ee5fd600 busy, not enqueued
    [4009900.310604] svc: server ee978000 waiting for data (to = 900000)
    [4009900.346682] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.350609] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    [4009900.354606] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.358626] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.362603] svc: TCP record, 148 bytes
    [4009900.366622] svc: socket ee5fd600 recvfrom(eec92094, 3948) = 148
    [4009900.370604] svc: TCP complete record (148 bytes)
    [4009900.374612] svc: transport ee5fd600 put into queue
    [4009900.378610] svc: got len=148
    [4009900.382652] svc: transport ee5fd600 busy, not enqueued
    [4009900.386829] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.390688] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.394606] svc: transport ee5fd600 busy, not enqueued
    [4009900.398621] svc: transport ee5fd600 busy, not enqueued
    [4009900.402611] svc: server ee8fe000 waiting for data (to = 900000)
    [4009900.438683] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.442614] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    [4009900.446612] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.450634] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.454609] svc: TCP record, 156 bytes
    [4009900.458631] svc: socket ee5fd600 recvfrom(dd78509c, 3940) = 156
    [4009900.462692] svc: TCP complete record (156 bytes)
    [4009900.466693] svc: transport ee5fd600 put into queue
    [4009900.470610] svc: got len=156
    [4009900.474657] svc: transport ee5fd600 busy, not enqueued
    [4009900.478801] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.482696] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.486611] svc: transport ee5fd600 busy, not enqueued
    [4009900.490630] svc: transport ee5fd600 busy, not enqueued
    [4009900.494617] svc: server eef12000 waiting for data (to = 900000)
    [4009900.522817] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.526622] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    [4009900.530617] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.534639] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.538615] svc: TCP record, 124 bytes
    [4009900.542632] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    [4009900.546615] svc: TCP complete record (124 bytes)
    [4009900.550621] svc: transport ee5fd600 put into queue
    [4009900.554616] svc: got len=124
    [4009900.558656] svc: transport ee5fd600 busy, not enqueued
    [4009900.562766] svc: socket ee5fd600 sendto([dd5e1000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009900.566724] svc: transport ee5fd600 busy, not enqueued
    [4009900.570622] svc: server ee9f2000 waiting for data (to = 900000)
    [4009900.607202] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.610629] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    [4009900.614623] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.618647] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.622619] svc: TCP record, 124 bytes
    [4009900.626641] svc: socket ee5fd600 recvfrom(e971607c, 3972) = 124
    [4009900.630620] svc: TCP complete record (124 bytes)
    [4009900.634630] svc: transport ee5fd600 put into queue
    [4009900.638621] svc: got len=124
    [4009900.642661] svc: transport ee5fd600 busy, not enqueued
    [4009900.646760] svc: socket ee5fd600 sendto([e95ee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009900.650734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.654622] svc: transport ee5fd600 busy, not enqueued
    [4009900.658640] svc: transport ee5fd600 busy, not enqueued
    [4009900.662626] svc: server eeeb7000 waiting for data (to = 900000)
    [4009900.691707] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.694637] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    [4009900.698628] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.702646] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.706625] svc: TCP record, 156 bytes
    [4009900.710654] svc: socket ee5fd600 recvfrom(e99b009c, 3940) = 156
    [4009900.714626] svc: TCP complete record (156 bytes)
    [4009900.718632] svc: transport ee5fd600 put into queue
    [4009900.722626] svc: got len=156
    [4009900.725731] svc: transport ee5fd600 busy, not enqueued
    [4009900.726797] svc: socket ee5fd600 sendto([e9bee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.730717] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.734627] svc: transport ee5fd600 busy, not enqueued
    [4009900.738646] svc: transport ee5fd600 busy, not enqueued
    [4009900.742631] svc: server eef31000 waiting for data (to = 900000)
    [4009900.778691] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.782635] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    [4009900.786633] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.790649] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.794631] svc: TCP record, 144 bytes
    [4009900.798663] svc: socket ee5fd600 recvfrom(e996b090, 3952) = 144
    [4009900.802632] svc: TCP complete record (144 bytes)
    [4009900.806638] svc: transport ee5fd600 put into queue
    [4009900.810632] svc: got len=144
    [4009900.814675] svc: transport ee5fd600 busy, not enqueued
    [4009900.818858] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.822707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.826632] svc: transport ee5fd600 busy, not enqueued
    [4009900.830647] svc: transport ee5fd600 busy, not enqueued
    [4009900.834638] svc: server eefc4000 waiting for data (to = 900000)
    [4009900.862781] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.866647] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    [4009900.870644] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.874651] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.878636] svc: TCP record, 136 bytes
    [4009900.882649] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    [4009900.886636] svc: TCP complete record (136 bytes)
    [4009900.890642] svc: transport ee5fd600 put into queue
    [4009900.894654] svc: got len=136
    [4009900.898678] svc: transport ee5fd600 busy, not enqueued
    [4009900.902761] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009900.906714] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009900.910638] svc: transport ee5fd600 busy, not enqueued
    [4009900.914654] svc: transport ee5fd600 busy, not enqueued
    [4009900.918643] svc: server eef4a000 waiting for data (to = 900000)
    [4009900.954826] svc: transport ee5fd600 dequeued, inuse=1
    [4009900.958648] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    [4009900.962645] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009900.966667] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009900.970642] svc: TCP record, 140 bytes
    [4009900.974659] svc: socket ee5fd600 recvfrom(dd54d08c, 3956) = 140
    [4009900.978652] svc: TCP complete record (140 bytes)
    [4009900.982649] svc: transport ee5fd600 put into queue
    [4009900.986642] svc: got len=140
    [4009900.990686] svc: transport ee5fd600 busy, not enqueued
    [4009900.994848] svc: socket ee5fd600 sendto([efb5c000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009900.998725] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.002644] svc: transport ee5fd600 busy, not enqueued
    [4009901.006662] svc: transport ee5fd600 busy, not enqueued
    [4009901.010648] svc: server ee978000 waiting for data (to = 900000)
    [4009901.046694] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.050652] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    [4009901.054650] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.058666] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.062647] svc: TCP record, 156 bytes
    [4009901.066669] svc: socket ee5fd600 recvfrom(eec9209c, 3940) = 156
    [4009901.070648] svc: TCP complete record (156 bytes)
    [4009901.074655] svc: transport ee5fd600 put into queue
    [4009901.078648] svc: got len=156
    [4009901.082688] svc: transport ee5fd600 busy, not enqueued
    [4009901.086844] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009901.090750] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.094650] svc: transport ee5fd600 busy, not enqueued
    [4009901.098671] svc: transport ee5fd600 busy, not enqueued
    [4009901.102655] svc: server ee8fe000 waiting for data (to = 900000)
    [4009901.138700] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.142658] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    [4009901.146656] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.150678] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.154654] svc: TCP record, 124 bytes
    [4009901.158671] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    [4009901.162654] svc: TCP complete record (124 bytes)
    [4009901.166659] svc: transport ee5fd600 put into queue
    [4009901.170654] svc: got len=124
    [4009901.174694] svc: transport ee5fd600 busy, not enqueued
    [4009901.178811] svc: socket ee5fd600 sendto([dd67d000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009901.182741] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.186655] svc: transport ee5fd600 busy, not enqueued
    [4009901.190673] svc: transport ee5fd600 busy, not enqueued
    [4009901.194661] svc: server ee9f2000 waiting for data (to = 900000)
    [4009901.222747] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.226690] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    [4009901.230661] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.234677] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.238658] svc: TCP record, 152 bytes
    [4009901.242673] svc: socket ee5fd600 recvfrom(dd785098, 3944) = 152
    [4009901.246659] svc: TCP complete record (152 bytes)
    [4009901.250664] svc: transport ee5fd600 put into queue
    [4009901.254659] svc: got len=152
    [4009901.258700] svc: transport ee5fd600 busy, not enqueued
    [4009901.262809] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009901.266734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.270660] svc: transport ee5fd600 busy, not enqueued
    [4009901.274677] svc: transport ee5fd600 busy, not enqueued
    [4009901.278665] svc: server eef12000 waiting for data (to = 900000)
    [4009901.314692] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.318668] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    [4009901.322667] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.326682] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.330664] svc: TCP record, 156 bytes
    [4009901.334681] svc: socket ee5fd600 recvfrom(e971609c, 3940) = 156
    [4009901.338665] svc: TCP complete record (156 bytes)
    [4009901.342672] svc: transport ee5fd600 put into queue
    [4009901.346665] svc: got len=156
    [4009901.350707] svc: transport ee5fd600 busy, not enqueued
    [4009901.354849] svc: socket ee5fd600 sendto([e95ee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009901.358747] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.362666] svc: transport ee5fd600 busy, not enqueued
    [4009901.366683] svc: transport ee5fd600 busy, not enqueued
    [4009901.370672] svc: server eeeb7000 waiting for data (to = 900000)
    [4009901.406725] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.410674] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    [4009901.414672] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.418691] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.422670] svc: TCP record, 136 bytes
    [4009901.426690] svc: socket ee5fd600 recvfrom(e99b0088, 3960) = 136
    [4009901.430671] svc: TCP complete record (136 bytes)
    [4009901.434677] svc: transport ee5fd600 put into queue
    [4009901.438684] svc: got len=136
    [4009901.442710] svc: transport ee5fd600 busy, not enqueued
    [4009901.446878] svc: socket ee5fd600 sendto([e9bee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    [4009901.450767] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.454672] svc: transport ee5fd600 busy, not enqueued
    [4009901.458754] svc: transport ee5fd600 busy, not enqueued
    [4009901.462690] svc: server eef31000 waiting for data (to = 900000)
    [4009901.499151] svc: transport ee5fd600 dequeued, inuse=1
    [4009901.502687] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    [4009901.506679] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    [4009901.510698] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    [4009901.514675] svc: TCP record, 140 bytes
    [4009901.518694] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    [4009901.522676] svc: TCP complete record (140 bytes)
    [4009901.526683] svc: transport ee5fd600 put into queue
    [4009901.530676] svc: got len=140
    [4009901.534716] svc: transport ee5fd600 busy, not enqueued
    [4009901.542824] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    [4009901.546806] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    [4009901.550678] svc: transport ee5fd600 busy, not enqueued
    [4009901.554700] svc: transport ee5fd600 busy, not enqueued
    [4009901.558682] svc: server eefc4000 waiting for data (to = 900000)


    --
    Ian Campbell

    GIVE: Support the helpless victims of computer error.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Wed, Sep 10, 2008 at 09:40:15AM +0100, Ian Campbell wrote:
    > On Mon, 2008-09-01 at 12:20 -0500, Tom Tucker wrote:
    > > Ian Campbell wrote:
    > > > On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote:
    > > >
    > > >> Looks like you ran this on the client. Sorry, Ian, I should have been
    > > >> more specific. You need to modify the rpc_debug file on the server.
    > > >>
    > > >
    > > > I tried this on the server but it's pretty crippling (the server is
    > > > quite weedy, 300MHz K3 or something).
    > > >
    > > > I'll leave it logging overnight since things should be pretty quiet (and
    > > > that's often when the problem occurs) but if there's a less aggressive
    > > > setting than 256 but which would still be useful I could leave it on
    > > > permanently until the problem happens.
    > > >
    > > >

    > > Thanks Ian. Unfortunately, that's as fine grained as it gets. 256
    > > (0x100) is the bit for transport logging.

    >
    > Still no repro with the debugging enabled I'm afraid.
    >
    > I'm wondering if the performance hit makes it unlikely to trigger since
    > everything slows to a crawl. I haven't been able to leave it on 24/7
    > since it makes all the NFS clients nearly unusable but I've had it on
    > overnight and when there's no one around etc, which is when the hang
    > generally seems to occur.


    OK, thanks for the attempt. We should figure out how to tone down those
    dprintk's.

    I'll try to take a closer look at this, but I'm travelling starting
    tomorrow, and realistically may not find time to look at this for a
    couple weeks. If someone else gets to it before me, great....

    --b.

    >
    > I forgot to re-enable it one evening and, perhaps inevitably, it crashed
    > overnight. I turned the debugging on and captured the following. It's
    > not much since syslog had given up for some reason, probably when the
    > disk filled up with logs. 192.168.1.5 is a non-problematic client so I
    > don't think the log will be useful since the problematic client
    > (192.168.1.4) isn't mentioned..
    >
    > Ian.
    >
    > vc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.094587] svc: TCP record, 140 bytes
    > [4009900.098604] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    > [4009900.102587] svc: TCP complete record (140 bytes)
    > [4009900.106592] svc: transport ee5fd600 put into queue
    > [4009900.110613] svc: got len=140
    > [4009900.114645] svc: transport ee5fd600 busy, not enqueued
    > [4009900.118752] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.122636] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.126588] svc: transport ee5fd600 busy, not enqueued
    > [4009900.130601] svc: transport ee5fd600 busy, not enqueued
    > [4009900.134593] svc: server eefc4000 waiting for data (to = 900000)
    > [4009900.162711] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.166598] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    > [4009900.170594] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.174611] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.178592] svc: TCP record, 136 bytes
    > [4009900.182606] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    > [4009900.186592] svc: TCP complete record (136 bytes)
    > [4009900.190599] svc: transport ee5fd600 put into queue
    > [4009900.194605] svc: got len=136
    > [4009900.198632] svc: transport ee5fd600 busy, not enqueued
    > [4009900.202717] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009900.206664] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.210594] svc: transport ee5fd600 busy, not enqueued
    > [4009900.214611] svc: transport ee5fd600 busy, not enqueued
    > [4009900.218599] svc: server eef4a000 waiting for data (to = 900000)
    > [4009900.254640] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.258603] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    > [4009900.262601] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.266625] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.270598] svc: TCP record, 128 bytes
    > [4009900.274618] svc: socket ee5fd600 recvfrom(dd54d080, 3968) = 128
    > [4009900.278598] svc: TCP complete record (128 bytes)
    > [4009900.282605] svc: transport ee5fd600 put into queue
    > [4009900.286611] svc: got len=128
    > [4009900.290642] svc: transport ee5fd600 busy, not enqueued
    > [4009900.294788] svc: socket ee5fd600 sendto([efb5c000 116... ], 116) = 116 (addr 192.168.1.5, port=910)
    > [4009900.298707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.302600] svc: transport ee5fd600 busy, not enqueued
    > [4009900.306622] svc: transport ee5fd600 busy, not enqueued
    > [4009900.310604] svc: server ee978000 waiting for data (to = 900000)
    > [4009900.346682] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.350609] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    > [4009900.354606] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.358626] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.362603] svc: TCP record, 148 bytes
    > [4009900.366622] svc: socket ee5fd600 recvfrom(eec92094, 3948) = 148
    > [4009900.370604] svc: TCP complete record (148 bytes)
    > [4009900.374612] svc: transport ee5fd600 put into queue
    > [4009900.378610] svc: got len=148
    > [4009900.382652] svc: transport ee5fd600 busy, not enqueued
    > [4009900.386829] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.390688] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.394606] svc: transport ee5fd600 busy, not enqueued
    > [4009900.398621] svc: transport ee5fd600 busy, not enqueued
    > [4009900.402611] svc: server ee8fe000 waiting for data (to = 900000)
    > [4009900.438683] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.442614] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    > [4009900.446612] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.450634] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.454609] svc: TCP record, 156 bytes
    > [4009900.458631] svc: socket ee5fd600 recvfrom(dd78509c, 3940) = 156
    > [4009900.462692] svc: TCP complete record (156 bytes)
    > [4009900.466693] svc: transport ee5fd600 put into queue
    > [4009900.470610] svc: got len=156
    > [4009900.474657] svc: transport ee5fd600 busy, not enqueued
    > [4009900.478801] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.482696] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.486611] svc: transport ee5fd600 busy, not enqueued
    > [4009900.490630] svc: transport ee5fd600 busy, not enqueued
    > [4009900.494617] svc: server eef12000 waiting for data (to = 900000)
    > [4009900.522817] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.526622] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    > [4009900.530617] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.534639] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.538615] svc: TCP record, 124 bytes
    > [4009900.542632] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    > [4009900.546615] svc: TCP complete record (124 bytes)
    > [4009900.550621] svc: transport ee5fd600 put into queue
    > [4009900.554616] svc: got len=124
    > [4009900.558656] svc: transport ee5fd600 busy, not enqueued
    > [4009900.562766] svc: socket ee5fd600 sendto([dd5e1000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009900.566724] svc: transport ee5fd600 busy, not enqueued
    > [4009900.570622] svc: server ee9f2000 waiting for data (to = 900000)
    > [4009900.607202] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.610629] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    > [4009900.614623] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.618647] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.622619] svc: TCP record, 124 bytes
    > [4009900.626641] svc: socket ee5fd600 recvfrom(e971607c, 3972) = 124
    > [4009900.630620] svc: TCP complete record (124 bytes)
    > [4009900.634630] svc: transport ee5fd600 put into queue
    > [4009900.638621] svc: got len=124
    > [4009900.642661] svc: transport ee5fd600 busy, not enqueued
    > [4009900.646760] svc: socket ee5fd600 sendto([e95ee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009900.650734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.654622] svc: transport ee5fd600 busy, not enqueued
    > [4009900.658640] svc: transport ee5fd600 busy, not enqueued
    > [4009900.662626] svc: server eeeb7000 waiting for data (to = 900000)
    > [4009900.691707] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.694637] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    > [4009900.698628] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.702646] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.706625] svc: TCP record, 156 bytes
    > [4009900.710654] svc: socket ee5fd600 recvfrom(e99b009c, 3940) = 156
    > [4009900.714626] svc: TCP complete record (156 bytes)
    > [4009900.718632] svc: transport ee5fd600 put into queue
    > [4009900.722626] svc: got len=156
    > [4009900.725731] svc: transport ee5fd600 busy, not enqueued
    > [4009900.726797] svc: socket ee5fd600 sendto([e9bee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.730717] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.734627] svc: transport ee5fd600 busy, not enqueued
    > [4009900.738646] svc: transport ee5fd600 busy, not enqueued
    > [4009900.742631] svc: server eef31000 waiting for data (to = 900000)
    > [4009900.778691] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.782635] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    > [4009900.786633] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.790649] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.794631] svc: TCP record, 144 bytes
    > [4009900.798663] svc: socket ee5fd600 recvfrom(e996b090, 3952) = 144
    > [4009900.802632] svc: TCP complete record (144 bytes)
    > [4009900.806638] svc: transport ee5fd600 put into queue
    > [4009900.810632] svc: got len=144
    > [4009900.814675] svc: transport ee5fd600 busy, not enqueued
    > [4009900.818858] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.822707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.826632] svc: transport ee5fd600 busy, not enqueued
    > [4009900.830647] svc: transport ee5fd600 busy, not enqueued
    > [4009900.834638] svc: server eefc4000 waiting for data (to = 900000)
    > [4009900.862781] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.866647] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    > [4009900.870644] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.874651] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.878636] svc: TCP record, 136 bytes
    > [4009900.882649] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    > [4009900.886636] svc: TCP complete record (136 bytes)
    > [4009900.890642] svc: transport ee5fd600 put into queue
    > [4009900.894654] svc: got len=136
    > [4009900.898678] svc: transport ee5fd600 busy, not enqueued
    > [4009900.902761] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009900.906714] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009900.910638] svc: transport ee5fd600 busy, not enqueued
    > [4009900.914654] svc: transport ee5fd600 busy, not enqueued
    > [4009900.918643] svc: server eef4a000 waiting for data (to = 900000)
    > [4009900.954826] svc: transport ee5fd600 dequeued, inuse=1
    > [4009900.958648] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    > [4009900.962645] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009900.966667] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009900.970642] svc: TCP record, 140 bytes
    > [4009900.974659] svc: socket ee5fd600 recvfrom(dd54d08c, 3956) = 140
    > [4009900.978652] svc: TCP complete record (140 bytes)
    > [4009900.982649] svc: transport ee5fd600 put into queue
    > [4009900.986642] svc: got len=140
    > [4009900.990686] svc: transport ee5fd600 busy, not enqueued
    > [4009900.994848] svc: socket ee5fd600 sendto([efb5c000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009900.998725] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.002644] svc: transport ee5fd600 busy, not enqueued
    > [4009901.006662] svc: transport ee5fd600 busy, not enqueued
    > [4009901.010648] svc: server ee978000 waiting for data (to = 900000)
    > [4009901.046694] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.050652] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    > [4009901.054650] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.058666] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.062647] svc: TCP record, 156 bytes
    > [4009901.066669] svc: socket ee5fd600 recvfrom(eec9209c, 3940) = 156
    > [4009901.070648] svc: TCP complete record (156 bytes)
    > [4009901.074655] svc: transport ee5fd600 put into queue
    > [4009901.078648] svc: got len=156
    > [4009901.082688] svc: transport ee5fd600 busy, not enqueued
    > [4009901.086844] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009901.090750] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.094650] svc: transport ee5fd600 busy, not enqueued
    > [4009901.098671] svc: transport ee5fd600 busy, not enqueued
    > [4009901.102655] svc: server ee8fe000 waiting for data (to = 900000)
    > [4009901.138700] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.142658] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    > [4009901.146656] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.150678] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.154654] svc: TCP record, 124 bytes
    > [4009901.158671] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    > [4009901.162654] svc: TCP complete record (124 bytes)
    > [4009901.166659] svc: transport ee5fd600 put into queue
    > [4009901.170654] svc: got len=124
    > [4009901.174694] svc: transport ee5fd600 busy, not enqueued
    > [4009901.178811] svc: socket ee5fd600 sendto([dd67d000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009901.182741] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.186655] svc: transport ee5fd600 busy, not enqueued
    > [4009901.190673] svc: transport ee5fd600 busy, not enqueued
    > [4009901.194661] svc: server ee9f2000 waiting for data (to = 900000)
    > [4009901.222747] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.226690] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    > [4009901.230661] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.234677] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.238658] svc: TCP record, 152 bytes
    > [4009901.242673] svc: socket ee5fd600 recvfrom(dd785098, 3944) = 152
    > [4009901.246659] svc: TCP complete record (152 bytes)
    > [4009901.250664] svc: transport ee5fd600 put into queue
    > [4009901.254659] svc: got len=152
    > [4009901.258700] svc: transport ee5fd600 busy, not enqueued
    > [4009901.262809] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009901.266734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.270660] svc: transport ee5fd600 busy, not enqueued
    > [4009901.274677] svc: transport ee5fd600 busy, not enqueued
    > [4009901.278665] svc: server eef12000 waiting for data (to = 900000)
    > [4009901.314692] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.318668] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    > [4009901.322667] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.326682] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.330664] svc: TCP record, 156 bytes
    > [4009901.334681] svc: socket ee5fd600 recvfrom(e971609c, 3940) = 156
    > [4009901.338665] svc: TCP complete record (156 bytes)
    > [4009901.342672] svc: transport ee5fd600 put into queue
    > [4009901.346665] svc: got len=156
    > [4009901.350707] svc: transport ee5fd600 busy, not enqueued
    > [4009901.354849] svc: socket ee5fd600 sendto([e95ee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009901.358747] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.362666] svc: transport ee5fd600 busy, not enqueued
    > [4009901.366683] svc: transport ee5fd600 busy, not enqueued
    > [4009901.370672] svc: server eeeb7000 waiting for data (to = 900000)
    > [4009901.406725] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.410674] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    > [4009901.414672] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.418691] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.422670] svc: TCP record, 136 bytes
    > [4009901.426690] svc: socket ee5fd600 recvfrom(e99b0088, 3960) = 136
    > [4009901.430671] svc: TCP complete record (136 bytes)
    > [4009901.434677] svc: transport ee5fd600 put into queue
    > [4009901.438684] svc: got len=136
    > [4009901.442710] svc: transport ee5fd600 busy, not enqueued
    > [4009901.446878] svc: socket ee5fd600 sendto([e9bee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > [4009901.450767] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.454672] svc: transport ee5fd600 busy, not enqueued
    > [4009901.458754] svc: transport ee5fd600 busy, not enqueued
    > [4009901.462690] svc: server eef31000 waiting for data (to = 900000)
    > [4009901.499151] svc: transport ee5fd600 dequeued, inuse=1
    > [4009901.502687] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    > [4009901.506679] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > [4009901.510698] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > [4009901.514675] svc: TCP record, 140 bytes
    > [4009901.518694] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    > [4009901.522676] svc: TCP complete record (140 bytes)
    > [4009901.526683] svc: transport ee5fd600 put into queue
    > [4009901.530676] svc: got len=140
    > [4009901.534716] svc: transport ee5fd600 busy, not enqueued
    > [4009901.542824] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > [4009901.546806] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > [4009901.550678] svc: transport ee5fd600 busy, not enqueued
    > [4009901.554700] svc: transport ee5fd600 busy, not enqueued
    > [4009901.558682] svc: server eefc4000 waiting for data (to = 900000)
    >
    >
    > --
    > Ian Campbell
    >
    > GIVE: Support the helpless victims of computer error.
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    J. Bruce Fields wrote:
    > On Wed, Sep 10, 2008 at 09:40:15AM +0100, Ian Campbell wrote:
    >
    >> On Mon, 2008-09-01 at 12:20 -0500, Tom Tucker wrote:
    >>
    >>> Ian Campbell wrote:
    >>>
    >>>> On Sun, 2008-08-31 at 14:51 -0500, Tom Tucker wrote:
    >>>>
    >>>>
    >>>>> Looks like you ran this on the client. Sorry, Ian, I should have been
    >>>>> more specific. You need to modify the rpc_debug file on the server.
    >>>>>
    >>>>>
    >>>> I tried this on the server but it's pretty crippling (the server is
    >>>> quite weedy, 300MHz K3 or something).
    >>>>
    >>>> I'll leave it logging overnight since things should be pretty quiet (and
    >>>> that's often when the problem occurs) but if there's a less aggressive
    >>>> setting than 256 but which would still be useful I could leave it on
    >>>> permanently until the problem happens.
    >>>>
    >>>>
    >>>>
    >>> Thanks Ian. Unfortunately, that's as fine grained as it gets. 256
    >>> (0x100) is the bit for transport logging.
    >>>

    >> Still no repro with the debugging enabled I'm afraid.
    >>
    >> I'm wondering if the performance hit makes it unlikely to trigger since
    >> everything slows to a crawl. I haven't been able to leave it on 24/7
    >> since it makes all the NFS clients nearly unusable but I've had it on
    >> overnight and when there's no one around etc, which is when the hang
    >> generally seems to occur.
    >>

    >
    > OK, thanks for the attempt. We should figure out how to tone down those
    > dprintk's.
    >
    > I'll try to take a closer look at this, but I'm travelling starting
    > tomorrow, and realistically may not find time to look at this for a
    > couple weeks. If someone else gets to it before me, great....
    >
    > --b.
    >
    >
    >> I forgot to re-enable it one evening and, perhaps inevitably, it crashed
    >> overnight. I turned the debugging on and captured the following. It's
    >> not much since syslog had given up for some reason, probably when the
    >> disk filled up with logs. 192.168.1.5 is a non-problematic client so I
    >> don't think the log will be useful since the problematic client
    >> (192.168.1.4) isn't mentioned..
    >>
    >> Ian.
    >>
    >>


    Iain sadly yes. There's a thread stuck holding the BUSY bit or a thread
    failed to clear the bit properly
    (maybe an error path). Data continues to arrives, but the transport
    never gets put back on the queue
    because it's BUSY bit is set. In other words, this is a different error
    than the one we've been chasing.

    If I sent you a patch, could you rebuild the kernel and give it a whirl?
    Also, can you give me a
    kernel.org relative commit-id or tag for the kernel that you're using?

    Thanks,
    Tom

    >> vc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.094587] svc: TCP record, 140 bytes
    >> [4009900.098604] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    >> [4009900.102587] svc: TCP complete record (140 bytes)
    >> [4009900.106592] svc: transport ee5fd600 put into queue
    >> [4009900.110613] svc: got len=140
    >> [4009900.114645] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.118752] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.122636] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.126588] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.130601] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.134593] svc: server eefc4000 waiting for data (to = 900000)
    >> [4009900.162711] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.166598] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.170594] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.174611] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.178592] svc: TCP record, 136 bytes
    >> [4009900.182606] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    >> [4009900.186592] svc: TCP complete record (136 bytes)
    >> [4009900.190599] svc: transport ee5fd600 put into queue
    >> [4009900.194605] svc: got len=136
    >> [4009900.198632] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.202717] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009900.206664] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.210594] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.214611] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.218599] svc: server eef4a000 waiting for data (to = 900000)
    >> [4009900.254640] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.258603] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.262601] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.266625] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.270598] svc: TCP record, 128 bytes
    >> [4009900.274618] svc: socket ee5fd600 recvfrom(dd54d080, 3968) = 128
    >> [4009900.278598] svc: TCP complete record (128 bytes)
    >> [4009900.282605] svc: transport ee5fd600 put into queue
    >> [4009900.286611] svc: got len=128
    >> [4009900.290642] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.294788] svc: socket ee5fd600 sendto([efb5c000 116... ], 116) = 116 (addr 192.168.1.5, port=910)
    >> [4009900.298707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.302600] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.306622] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.310604] svc: server ee978000 waiting for data (to = 900000)
    >> [4009900.346682] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.350609] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.354606] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.358626] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.362603] svc: TCP record, 148 bytes
    >> [4009900.366622] svc: socket ee5fd600 recvfrom(eec92094, 3948) = 148
    >> [4009900.370604] svc: TCP complete record (148 bytes)
    >> [4009900.374612] svc: transport ee5fd600 put into queue
    >> [4009900.378610] svc: got len=148
    >> [4009900.382652] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.386829] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.390688] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.394606] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.398621] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.402611] svc: server ee8fe000 waiting for data (to = 900000)
    >> [4009900.438683] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.442614] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.446612] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.450634] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.454609] svc: TCP record, 156 bytes
    >> [4009900.458631] svc: socket ee5fd600 recvfrom(dd78509c, 3940) = 156
    >> [4009900.462692] svc: TCP complete record (156 bytes)
    >> [4009900.466693] svc: transport ee5fd600 put into queue
    >> [4009900.470610] svc: got len=156
    >> [4009900.474657] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.478801] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.482696] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.486611] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.490630] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.494617] svc: server eef12000 waiting for data (to = 900000)
    >> [4009900.522817] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.526622] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.530617] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.534639] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.538615] svc: TCP record, 124 bytes
    >> [4009900.542632] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    >> [4009900.546615] svc: TCP complete record (124 bytes)
    >> [4009900.550621] svc: transport ee5fd600 put into queue
    >> [4009900.554616] svc: got len=124
    >> [4009900.558656] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.562766] svc: socket ee5fd600 sendto([dd5e1000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009900.566724] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.570622] svc: server ee9f2000 waiting for data (to = 900000)
    >> [4009900.607202] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.610629] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.614623] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.618647] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.622619] svc: TCP record, 124 bytes
    >> [4009900.626641] svc: socket ee5fd600 recvfrom(e971607c, 3972) = 124
    >> [4009900.630620] svc: TCP complete record (124 bytes)
    >> [4009900.634630] svc: transport ee5fd600 put into queue
    >> [4009900.638621] svc: got len=124
    >> [4009900.642661] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.646760] svc: socket ee5fd600 sendto([e95ee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009900.650734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.654622] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.658640] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.662626] svc: server eeeb7000 waiting for data (to = 900000)
    >> [4009900.691707] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.694637] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.698628] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.702646] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.706625] svc: TCP record, 156 bytes
    >> [4009900.710654] svc: socket ee5fd600 recvfrom(e99b009c, 3940) = 156
    >> [4009900.714626] svc: TCP complete record (156 bytes)
    >> [4009900.718632] svc: transport ee5fd600 put into queue
    >> [4009900.722626] svc: got len=156
    >> [4009900.725731] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.726797] svc: socket ee5fd600 sendto([e9bee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.730717] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.734627] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.738646] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.742631] svc: server eef31000 waiting for data (to = 900000)
    >> [4009900.778691] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.782635] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.786633] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.790649] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.794631] svc: TCP record, 144 bytes
    >> [4009900.798663] svc: socket ee5fd600 recvfrom(e996b090, 3952) = 144
    >> [4009900.802632] svc: TCP complete record (144 bytes)
    >> [4009900.806638] svc: transport ee5fd600 put into queue
    >> [4009900.810632] svc: got len=144
    >> [4009900.814675] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.818858] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.822707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.826632] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.830647] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.834638] svc: server eefc4000 waiting for data (to = 900000)
    >> [4009900.862781] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.866647] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.870644] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.874651] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.878636] svc: TCP record, 136 bytes
    >> [4009900.882649] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    >> [4009900.886636] svc: TCP complete record (136 bytes)
    >> [4009900.890642] svc: transport ee5fd600 put into queue
    >> [4009900.894654] svc: got len=136
    >> [4009900.898678] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.902761] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009900.906714] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009900.910638] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.914654] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.918643] svc: server eef4a000 waiting for data (to = 900000)
    >> [4009900.954826] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009900.958648] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    >> [4009900.962645] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009900.966667] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009900.970642] svc: TCP record, 140 bytes
    >> [4009900.974659] svc: socket ee5fd600 recvfrom(dd54d08c, 3956) = 140
    >> [4009900.978652] svc: TCP complete record (140 bytes)
    >> [4009900.982649] svc: transport ee5fd600 put into queue
    >> [4009900.986642] svc: got len=140
    >> [4009900.990686] svc: transport ee5fd600 busy, not enqueued
    >> [4009900.994848] svc: socket ee5fd600 sendto([efb5c000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009900.998725] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.002644] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.006662] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.010648] svc: server ee978000 waiting for data (to = 900000)
    >> [4009901.046694] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.050652] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.054650] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.058666] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.062647] svc: TCP record, 156 bytes
    >> [4009901.066669] svc: socket ee5fd600 recvfrom(eec9209c, 3940) = 156
    >> [4009901.070648] svc: TCP complete record (156 bytes)
    >> [4009901.074655] svc: transport ee5fd600 put into queue
    >> [4009901.078648] svc: got len=156
    >> [4009901.082688] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.086844] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009901.090750] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.094650] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.098671] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.102655] svc: server ee8fe000 waiting for data (to = 900000)
    >> [4009901.138700] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.142658] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.146656] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.150678] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.154654] svc: TCP record, 124 bytes
    >> [4009901.158671] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    >> [4009901.162654] svc: TCP complete record (124 bytes)
    >> [4009901.166659] svc: transport ee5fd600 put into queue
    >> [4009901.170654] svc: got len=124
    >> [4009901.174694] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.178811] svc: socket ee5fd600 sendto([dd67d000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009901.182741] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.186655] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.190673] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.194661] svc: server ee9f2000 waiting for data (to = 900000)
    >> [4009901.222747] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.226690] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.230661] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.234677] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.238658] svc: TCP record, 152 bytes
    >> [4009901.242673] svc: socket ee5fd600 recvfrom(dd785098, 3944) = 152
    >> [4009901.246659] svc: TCP complete record (152 bytes)
    >> [4009901.250664] svc: transport ee5fd600 put into queue
    >> [4009901.254659] svc: got len=152
    >> [4009901.258700] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.262809] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009901.266734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.270660] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.274677] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.278665] svc: server eef12000 waiting for data (to = 900000)
    >> [4009901.314692] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.318668] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.322667] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.326682] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.330664] svc: TCP record, 156 bytes
    >> [4009901.334681] svc: socket ee5fd600 recvfrom(e971609c, 3940) = 156
    >> [4009901.338665] svc: TCP complete record (156 bytes)
    >> [4009901.342672] svc: transport ee5fd600 put into queue
    >> [4009901.346665] svc: got len=156
    >> [4009901.350707] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.354849] svc: socket ee5fd600 sendto([e95ee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009901.358747] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.362666] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.366683] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.370672] svc: server eeeb7000 waiting for data (to = 900000)
    >> [4009901.406725] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.410674] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.414672] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.418691] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.422670] svc: TCP record, 136 bytes
    >> [4009901.426690] svc: socket ee5fd600 recvfrom(e99b0088, 3960) = 136
    >> [4009901.430671] svc: TCP complete record (136 bytes)
    >> [4009901.434677] svc: transport ee5fd600 put into queue
    >> [4009901.438684] svc: got len=136
    >> [4009901.442710] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.446878] svc: socket ee5fd600 sendto([e9bee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    >> [4009901.450767] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.454672] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.458754] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.462690] svc: server eef31000 waiting for data (to = 900000)
    >> [4009901.499151] svc: transport ee5fd600 dequeued, inuse=1
    >> [4009901.502687] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    >> [4009901.506679] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    >> [4009901.510698] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    >> [4009901.514675] svc: TCP record, 140 bytes
    >> [4009901.518694] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    >> [4009901.522676] svc: TCP complete record (140 bytes)
    >> [4009901.526683] svc: transport ee5fd600 put into queue
    >> [4009901.530676] svc: got len=140
    >> [4009901.534716] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.542824] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    >> [4009901.546806] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    >> [4009901.550678] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.554700] svc: transport ee5fd600 busy, not enqueued
    >> [4009901.558682] svc: server eefc4000 waiting for data (to = 900000)
    >>
    >>
    >> --
    >> Ian Campbell
    >>
    >> GIVE: Support the helpless victims of computer error.
    >>
    >>


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-09-12 at 18:15 -0500, Tom Tucker wrote:
    > Iain sadly yes. There's a thread stuck holding the BUSY bit or a thread
    > failed to clear the bit properly
    > (maybe an error path). Data continues to arrives, but the transport
    > never gets put back on the queue
    > because it's BUSY bit is set. In other words, this is a different error
    > than the one we've been chasing.
    >
    > If I sent you a patch, could you rebuild the kernel and give it a whirl?
    > Also, can you give me a
    > kernel.org relative commit-id or tag for the kernel that you're using?


    I sure could. I'm using the Debian kernel which is currently at 2.6.26.3
    (pkg version 2.6.26-4) although I have an update to 2.6.26.4 (via pkg
    2.6.26-5) pending.

    If I'm going to build my own I'll start with current git
    (a551b98d5f6fce5897d497abd8bfb262efb33d2a) and repro there before trying
    your patch.

    Ian.

    >
    > Thanks,
    > Tom
    >
    > >> vc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.094587] svc: TCP record, 140 bytes
    > >> [4009900.098604] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    > >> [4009900.102587] svc: TCP complete record (140 bytes)
    > >> [4009900.106592] svc: transport ee5fd600 put into queue
    > >> [4009900.110613] svc: got len=140
    > >> [4009900.114645] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.118752] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.122636] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.126588] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.130601] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.134593] svc: server eefc4000 waiting for data (to = 900000)
    > >> [4009900.162711] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.166598] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.170594] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.174611] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.178592] svc: TCP record, 136 bytes
    > >> [4009900.182606] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    > >> [4009900.186592] svc: TCP complete record (136 bytes)
    > >> [4009900.190599] svc: transport ee5fd600 put into queue
    > >> [4009900.194605] svc: got len=136
    > >> [4009900.198632] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.202717] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009900.206664] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.210594] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.214611] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.218599] svc: server eef4a000 waiting for data (to = 900000)
    > >> [4009900.254640] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.258603] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.262601] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.266625] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.270598] svc: TCP record, 128 bytes
    > >> [4009900.274618] svc: socket ee5fd600 recvfrom(dd54d080, 3968) = 128
    > >> [4009900.278598] svc: TCP complete record (128 bytes)
    > >> [4009900.282605] svc: transport ee5fd600 put into queue
    > >> [4009900.286611] svc: got len=128
    > >> [4009900.290642] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.294788] svc: socket ee5fd600 sendto([efb5c000 116... ], 116) = 116 (addr 192.168.1.5, port=910)
    > >> [4009900.298707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.302600] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.306622] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.310604] svc: server ee978000 waiting for data (to = 900000)
    > >> [4009900.346682] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.350609] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.354606] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.358626] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.362603] svc: TCP record, 148 bytes
    > >> [4009900.366622] svc: socket ee5fd600 recvfrom(eec92094, 3948) = 148
    > >> [4009900.370604] svc: TCP complete record (148 bytes)
    > >> [4009900.374612] svc: transport ee5fd600 put into queue
    > >> [4009900.378610] svc: got len=148
    > >> [4009900.382652] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.386829] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.390688] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.394606] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.398621] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.402611] svc: server ee8fe000 waiting for data (to = 900000)
    > >> [4009900.438683] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.442614] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.446612] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.450634] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.454609] svc: TCP record, 156 bytes
    > >> [4009900.458631] svc: socket ee5fd600 recvfrom(dd78509c, 3940) = 156
    > >> [4009900.462692] svc: TCP complete record (156 bytes)
    > >> [4009900.466693] svc: transport ee5fd600 put into queue
    > >> [4009900.470610] svc: got len=156
    > >> [4009900.474657] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.478801] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.482696] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.486611] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.490630] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.494617] svc: server eef12000 waiting for data (to = 900000)
    > >> [4009900.522817] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.526622] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.530617] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.534639] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.538615] svc: TCP record, 124 bytes
    > >> [4009900.542632] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    > >> [4009900.546615] svc: TCP complete record (124 bytes)
    > >> [4009900.550621] svc: transport ee5fd600 put into queue
    > >> [4009900.554616] svc: got len=124
    > >> [4009900.558656] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.562766] svc: socket ee5fd600 sendto([dd5e1000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009900.566724] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.570622] svc: server ee9f2000 waiting for data (to = 900000)
    > >> [4009900.607202] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.610629] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.614623] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.618647] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.622619] svc: TCP record, 124 bytes
    > >> [4009900.626641] svc: socket ee5fd600 recvfrom(e971607c, 3972) = 124
    > >> [4009900.630620] svc: TCP complete record (124 bytes)
    > >> [4009900.634630] svc: transport ee5fd600 put into queue
    > >> [4009900.638621] svc: got len=124
    > >> [4009900.642661] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.646760] svc: socket ee5fd600 sendto([e95ee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009900.650734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.654622] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.658640] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.662626] svc: server eeeb7000 waiting for data (to = 900000)
    > >> [4009900.691707] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.694637] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.698628] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.702646] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.706625] svc: TCP record, 156 bytes
    > >> [4009900.710654] svc: socket ee5fd600 recvfrom(e99b009c, 3940) = 156
    > >> [4009900.714626] svc: TCP complete record (156 bytes)
    > >> [4009900.718632] svc: transport ee5fd600 put into queue
    > >> [4009900.722626] svc: got len=156
    > >> [4009900.725731] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.726797] svc: socket ee5fd600 sendto([e9bee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.730717] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.734627] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.738646] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.742631] svc: server eef31000 waiting for data (to = 900000)
    > >> [4009900.778691] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.782635] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.786633] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.790649] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.794631] svc: TCP record, 144 bytes
    > >> [4009900.798663] svc: socket ee5fd600 recvfrom(e996b090, 3952) = 144
    > >> [4009900.802632] svc: TCP complete record (144 bytes)
    > >> [4009900.806638] svc: transport ee5fd600 put into queue
    > >> [4009900.810632] svc: got len=144
    > >> [4009900.814675] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.818858] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.822707] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.826632] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.830647] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.834638] svc: server eefc4000 waiting for data (to = 900000)
    > >> [4009900.862781] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.866647] svc: server eef4a000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.870644] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.874651] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.878636] svc: TCP record, 136 bytes
    > >> [4009900.882649] svc: socket ee5fd600 recvfrom(e975a088, 3960) = 136
    > >> [4009900.886636] svc: TCP complete record (136 bytes)
    > >> [4009900.890642] svc: transport ee5fd600 put into queue
    > >> [4009900.894654] svc: got len=136
    > >> [4009900.898678] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.902761] svc: socket ee5fd600 sendto([d6464000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009900.906714] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009900.910638] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.914654] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.918643] svc: server eef4a000 waiting for data (to = 900000)
    > >> [4009900.954826] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009900.958648] svc: server ee978000, pool 0, transport ee5fd600, inuse=2
    > >> [4009900.962645] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009900.966667] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009900.970642] svc: TCP record, 140 bytes
    > >> [4009900.974659] svc: socket ee5fd600 recvfrom(dd54d08c, 3956) = 140
    > >> [4009900.978652] svc: TCP complete record (140 bytes)
    > >> [4009900.982649] svc: transport ee5fd600 put into queue
    > >> [4009900.986642] svc: got len=140
    > >> [4009900.990686] svc: transport ee5fd600 busy, not enqueued
    > >> [4009900.994848] svc: socket ee5fd600 sendto([efb5c000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009900.998725] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.002644] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.006662] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.010648] svc: server ee978000 waiting for data (to = 900000)
    > >> [4009901.046694] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.050652] svc: server ee8fe000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.054650] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.058666] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.062647] svc: TCP record, 156 bytes
    > >> [4009901.066669] svc: socket ee5fd600 recvfrom(eec9209c, 3940) = 156
    > >> [4009901.070648] svc: TCP complete record (156 bytes)
    > >> [4009901.074655] svc: transport ee5fd600 put into queue
    > >> [4009901.078648] svc: got len=156
    > >> [4009901.082688] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.086844] svc: socket ee5fd600 sendto([e98bf000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009901.090750] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.094650] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.098671] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.102655] svc: server ee8fe000 waiting for data (to = 900000)
    > >> [4009901.138700] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.142658] svc: server ee9f2000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.146656] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.150678] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.154654] svc: TCP record, 124 bytes
    > >> [4009901.158671] svc: socket ee5fd600 recvfrom(e977007c, 3972) = 124
    > >> [4009901.162654] svc: TCP complete record (124 bytes)
    > >> [4009901.166659] svc: transport ee5fd600 put into queue
    > >> [4009901.170654] svc: got len=124
    > >> [4009901.174694] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.178811] svc: socket ee5fd600 sendto([dd67d000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009901.182741] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.186655] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.190673] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.194661] svc: server ee9f2000 waiting for data (to = 900000)
    > >> [4009901.222747] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.226690] svc: server eef12000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.230661] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.234677] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.238658] svc: TCP record, 152 bytes
    > >> [4009901.242673] svc: socket ee5fd600 recvfrom(dd785098, 3944) = 152
    > >> [4009901.246659] svc: TCP complete record (152 bytes)
    > >> [4009901.250664] svc: transport ee5fd600 put into queue
    > >> [4009901.254659] svc: got len=152
    > >> [4009901.258700] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.262809] svc: socket ee5fd600 sendto([ee92b000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009901.266734] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.270660] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.274677] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.278665] svc: server eef12000 waiting for data (to = 900000)
    > >> [4009901.314692] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.318668] svc: server eeeb7000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.322667] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.326682] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.330664] svc: TCP record, 156 bytes
    > >> [4009901.334681] svc: socket ee5fd600 recvfrom(e971609c, 3940) = 156
    > >> [4009901.338665] svc: TCP complete record (156 bytes)
    > >> [4009901.342672] svc: transport ee5fd600 put into queue
    > >> [4009901.346665] svc: got len=156
    > >> [4009901.350707] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.354849] svc: socket ee5fd600 sendto([e95ee000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009901.358747] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.362666] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.366683] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.370672] svc: server eeeb7000 waiting for data (to = 900000)
    > >> [4009901.406725] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.410674] svc: server eef31000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.414672] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.418691] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.422670] svc: TCP record, 136 bytes
    > >> [4009901.426690] svc: socket ee5fd600 recvfrom(e99b0088, 3960) = 136
    > >> [4009901.430671] svc: TCP complete record (136 bytes)
    > >> [4009901.434677] svc: transport ee5fd600 put into queue
    > >> [4009901.438684] svc: got len=136
    > >> [4009901.442710] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.446878] svc: socket ee5fd600 sendto([e9bee000 124... ], 124) = 124 (addr 192.168.1.5, port=910)
    > >> [4009901.450767] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.454672] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.458754] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.462690] svc: server eef31000 waiting for data (to = 900000)
    > >> [4009901.499151] svc: transport ee5fd600 dequeued, inuse=1
    > >> [4009901.502687] svc: server eefc4000, pool 0, transport ee5fd600, inuse=2
    > >> [4009901.506679] svc: tcp_recv ee5fd600 data 1 conn 0 close 0
    > >> [4009901.510698] svc: socket ee5fd600 recvfrom(ee5fd764, 0) = 4
    > >> [4009901.514675] svc: TCP record, 140 bytes
    > >> [4009901.518694] svc: socket ee5fd600 recvfrom(e996b08c, 3956) = 140
    > >> [4009901.522676] svc: TCP complete record (140 bytes)
    > >> [4009901.526683] svc: transport ee5fd600 put into queue
    > >> [4009901.530676] svc: got len=140
    > >> [4009901.534716] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.542824] svc: socket ee5fd600 sendto([dfbb0000 232... ], 232) = 232 (addr 192.168.1.5, port=910)
    > >> [4009901.546806] svc: socket c1888960 TCP data ready (svsk ee5fd600)
    > >> [4009901.550678] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.554700] svc: transport ee5fd600 busy, not enqueued
    > >> [4009901.558682] svc: server eefc4000 waiting for data (to = 900000)
    > >>
    > >>
    > >> --
    > >> Ian Campbell
    > >>
    > >> GIVE: Support the helpless victims of computer error.
    > >>
    > >>

    >
    >

    --
    Ian Campbell

    No rock so hard but that a little wave
    May beat admission in a thousand years.
    -- Tennyson

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjLgG8ACgkQM0+0qS9rzVkLugCbBRkuGGnPnj 7X8BP0znpos7je
    jRoAn2CgEbd1wdwnMMAXWTN++yuLZvab
    =pRez
    -----END PGP SIGNATURE-----


  8. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    (dropping the e1000 guys, it seems unnecessary to keep spamming them
    with this issue when it's unlikely to be anything to do with them. I've
    left their list on for now so they know. I'd suggest dropping it from
    any replies.)

    On Sat, 2008-09-13 at 09:57 +0100, Ian Campbell wrote:
    > On Fri, 2008-09-12 at 18:15 -0500, Tom Tucker wrote:
    > > Iain sadly yes. There's a thread stuck holding the BUSY bit or a thread
    > > failed to clear the bit properly
    > > (maybe an error path). Data continues to arrives, but the transport
    > > never gets put back on the queue
    > > because it's BUSY bit is set. In other words, this is a different error
    > > than the one we've been chasing.
    > >
    > > If I sent you a patch, could you rebuild the kernel and give it a whirl?
    > > Also, can you give me a
    > > kernel.org relative commit-id or tag for the kernel that you're using?

    >
    > I sure could. I'm using the Debian kernel which is currently at 2.6.26.3
    > (pkg version 2.6.26-4) although I have an update to 2.6.26.4 (via pkg
    > 2.6.26-5) pending.
    >
    > If I'm going to build my own I'll start with current git
    > (a551b98d5f6fce5897d497abd8bfb262efb33d2a) and repro there before trying
    > your patch.


    FYI I've repro'd with
    commit a551b98d5f6fce5897d497abd8bfb262efb33d2a
    Merge: d1c6d2e... 50bed2e...
    Author: Linus Torvalds
    Date: Thu Sep 11 11:50:15 2008 -0700

    Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

    * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    sg: disable interrupts inside sg_copy_buffer
    which was the latest git a few days back.

    I'm going to start bisecting between v2.6.25 and v2.6.26. There's 173
    commits in fs/nfs* net/sunrpc in that interval so with a day per test I
    should have something next week...

    Ian.
    --
    Ian Campbell

    Vini, vidi, Linux!
    -- Unknown source

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjPSMsACgkQM0+0qS9rzVnbYQCgzMmAzAO18O 2pPMwETHdOi3KZ
    PmAAoNAafL5cZZ86s6F3zXeQNVmBMsZH
    =1Uzq
    -----END PGP SIGNATURE-----


  9. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    Ian Campbell wrote:
    > (dropping the e1000 guys, it seems unnecessary to keep spamming them
    > with this issue when it's unlikely to be anything to do with them. I've
    > left their list on for now so they know. I'd suggest dropping it from
    > any replies.)
    >
    > On Sat, 2008-09-13 at 09:57 +0100, Ian Campbell wrote:
    >
    >> On Fri, 2008-09-12 at 18:15 -0500, Tom Tucker wrote:
    >>
    >>> Iain sadly yes. There's a thread stuck holding the BUSY bit or a thread
    >>> failed to clear the bit properly
    >>> (maybe an error path). Data continues to arrives, but the transport
    >>> never gets put back on the queue
    >>> because it's BUSY bit is set. In other words, this is a different error
    >>> than the one we've been chasing.
    >>>
    >>> If I sent you a patch, could you rebuild the kernel and give it a whirl?
    >>> Also, can you give me a
    >>> kernel.org relative commit-id or tag for the kernel that you're using?
    >>>

    >> I sure could. I'm using the Debian kernel which is currently at 2.6.26.3
    >> (pkg version 2.6.26-4) although I have an update to 2.6.26.4 (via pkg
    >> 2.6.26-5) pending.
    >>
    >> If I'm going to build my own I'll start with current git
    >> (a551b98d5f6fce5897d497abd8bfb262efb33d2a) and repro there before trying
    >> your patch.
    >>

    >
    > FYI I've repro'd with
    > commit a551b98d5f6fce5897d497abd8bfb262efb33d2a
    > Merge: d1c6d2e... 50bed2e...
    > Author: Linus Torvalds
    > Date: Thu Sep 11 11:50:15 2008 -0700
    >
    > Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
    >
    > * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    > sg: disable interrupts inside sg_copy_buffer
    > which was the latest git a few days back.
    >
    > I'm going to start bisecting between v2.6.25 and v2.6.26. There's 173
    > commits in fs/nfs* net/sunrpc in that interval so with a day per test I
    > should have something next week...
    >
    >

    Iain:

    I'm assuming you'll do this in advance of any patch from me? I was
    simply going to add printk to the various shutdown paths and see if we
    could get some finer grained debug output since the generic transport
    debug output was too verbose.

    Let me know and I'll help if I can,
    Tom

    > Ian.
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Tue, 2008-09-16 at 06:38 -0500, Tom Tucker wrote:
    > Iain:


    I'm actually Ian. (I don't actually mind that much, the other way round
    is probably much more annoying for people called Iain).

    > I'm assuming you'll do this in advance of any patch from me?


    I was going to grind through it unless/until you sent a patch.

    > I was simply going to add printk to the various shutdown paths and see if we
    > could get some finer grained debug output since the generic transport
    > debug output was too verbose.
    >
    > Let me know and I'll help if I can,


    You mean logging on the client or server side? I'm currently bisecting
    client. I'm happy to add some extra prints there if you point me in the
    direction of the code paths of interest.

    Ian.
    --
    Ian Campbell
    Current Noise: Anathema - Leave No Trace

    I haven't lost my mind; I know exactly where I left it.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    Ian Campbell wrote:
    > On Tue, 2008-09-16 at 06:38 -0500, Tom Tucker wrote:
    >> Iain:

    >
    > I'm actually Ian. (I don't actually mind that much, the other way round
    > is probably much more annoying for people called Iain).
    >


    Oops, sorry Ian.

    >> I'm assuming you'll do this in advance of any patch from me?

    >
    > I was going to grind through it unless/until you sent a patch.
    >
    >> I was simply going to add printk to the various shutdown paths and see if we
    >> could get some finer grained debug output since the generic transport
    >> debug output was too verbose.
    >>
    >> Let me know and I'll help if I can,

    >
    > You mean logging on the client or server side? I'm currently bisecting
    > client. I'm happy to add some extra prints there if you point me in the
    > direction of the code paths of interest.
    >


    Ah. Ok. I'll send you a patch for the server.

    > Ian.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Tue, 2008-09-16 at 10:58 -0500, Tom Tucker wrote:
    > I'll send you a patch for the server.


    Thanks. It's running 2.6.25.7 although I can massage a more recent patch
    if needed. (I'm a bit less keen to run a variety of random kernels on
    that machine).

    Ian.

    --
    Ian Campbell
    Current Noise: The Ocean - Protoerozoic - Rhyacian - Untimely Meditations

    We may not return the affection of those who like us, but we always respect
    their good judgement.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    I've found that the problem was backported into the stable stream since
    I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
    is quite useful since there are only 3 relevant looking changesets in
    that range. I will bisect between these before confirming the culprit on
    mainline.

    I've had a private reply from someone who is seeing the same issue with
    2.6.26.5 against a server running 2.6.18 (Debian Etch) which I think
    largely discounts the server from being at fault.

    Ian.
    --
    Ian Campbell

    Everything ends badly. Otherwise it wouldn't end.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjYoeIACgkQM0+0qS9rzVn6gQCfZ4oGa0GgBJ nIR9iIh3FV7ePm
    NvQAoMoyKeeA5Rot5o+w+2uZHmdKfnxe
    =INM6
    -----END PGP SIGNATURE-----


  14. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Tue, 2008-09-23 at 08:59 +0100, Ian Campbell wrote:
    > I've found that the problem was backported into the stable stream since
    > I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
    > is quite useful since there are only 3 relevant looking changesets in
    > that range. I will bisect between these before confirming the culprit on
    > mainline.


    It reports:

    daedfbe2a67628a40076a6c75fb945c60f608a2e is first bad commit
    commit daedfbe2a67628a40076a6c75fb945c60f608a2e
    Author: Trond Myklebust
    Date: Wed Jun 11 17:39:04 2008 -0400

    NFS: Ensure we zap only the access and acl caches when setting new acls

    commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream

    ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
    acls.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    I'm just about to build f41f741838480aeaa3a189cff6e210503cf9c42d and the
    one before and try those.

    I'm not using ACLs as far as I am aware.

    Ian.

    --
    Ian Campbell

    If we were meant to get up early, God would have created us with alarm clocks.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjY0/UACgkQM0+0qS9rzVnQMQCfStP9cFVR2Zrx4l/lxXusHXDO
    zewAoI3TkgRnaeh7ceb/fbYOgrd08zcA
    =zBYW
    -----END PGP SIGNATURE-----


  15. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Tue, Sep 23, 2008 at 12:33:09PM +0100, Ian Campbell wrote:
    > On Tue, 2008-09-23 at 08:59 +0100, Ian Campbell wrote:
    > > I've found that the problem was backported into the stable stream since
    > > I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
    > > is quite useful since there are only 3 relevant looking changesets in
    > > that range. I will bisect between these before confirming the culprit on
    > > mainline.


    Could you double-check that this is reproduceable with this commit
    applied, and not reproduceable when it's not?

    I suppose it's not impossible that this could be triggering the problem
    in some very roundabout way, but it seems a bit out of left field--so I
    wonder whether one of the bisection points could have gotten marked good
    when it should have been bad, or vice-versa.

    > It reports:
    >
    > daedfbe2a67628a40076a6c75fb945c60f608a2e is first bad commit
    > commit daedfbe2a67628a40076a6c75fb945c60f608a2e
    > Author: Trond Myklebust
    > Date: Wed Jun 11 17:39:04 2008 -0400
    >
    > NFS: Ensure we zap only the access and acl caches when setting new acls
    >
    > commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream
    >
    > ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
    > acls.
    >
    > Signed-off-by: Trond Myklebust
    > Signed-off-by: Greg Kroah-Hartman
    >
    > I'm just about to build f41f741838480aeaa3a189cff6e210503cf9c42d and the
    > one before and try those.
    >
    > I'm not using ACLs as far as I am aware.


    I think commands like "ls" try to get posix acls these days, so it's
    possible that the nfs3_proc_getacl code at least might be getting
    called. Why that would matter I can't see.

    --b.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Tue, 2008-09-23 at 13:03 -0400, J. Bruce Fields wrote:
    > On Tue, Sep 23, 2008 at 12:33:09PM +0100, Ian Campbell wrote:
    > > On Tue, 2008-09-23 at 08:59 +0100, Ian Campbell wrote:
    > > > I've found that the problem was backported into the stable stream since
    > > > I cannot reproduce the issue with 2.6.26 but I can with 2.6.26.5. This
    > > > is quite useful since there are only 3 relevant looking changesets in
    > > > that range. I will bisect between these before confirming the culpriton
    > > > mainline.

    >
    > Could you double-check that this is reproduceable with this commit
    > applied, and not reproduceable when it's not?


    I've reproduced with exactly commit
    f41f741838480aeaa3a189cff6e210503cf9c42d on trunk and am now running
    2e96d2867245668dbdb973729288cf69b9fafa66 which is the changeset
    immediately before.

    > I suppose it's not impossible that this could be triggering the problem
    > in some very roundabout way, but it seems a bit out of left field--so I
    > wonder whether one of the bisection points could have gotten marked good
    > when it should have been bad, or vice-versa.


    It's possible, the good case is naturally quite hard to establish with
    100% certainty. I declared v2.6.26 OK after an uptime of 4 days and 19
    hours, compared with failure normally within 1-2 days. It's possible I
    was premature in doing so. I'll run 2e96d2867 for at least a full week
    before reporting back.

    Ian.
    --
    Ian Campbell

    Unix is mature OS, windows is still in diapers and they smell badly.
    -- Rafael Skodlar

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjdAaIACgkQM0+0qS9rzVlnhwCghXf5JG5La4 zjqAI59GFAu8nX
    1A4An0i4FpPd/3R4mGWQrDZNobtD/kBB
    =VM2U
    -----END PGP SIGNATURE-----


  17. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-09-26 at 16:37 +0100, Ian Campbell wrote:
    > It's possible, the good case is naturally quite hard to establish with
    > 100% certainty. I declared v2.6.26 OK after an uptime of 4 days and 19
    > hours, compared with failure normally within 1-2 days. It's possible I
    > was premature in doing so. I'll run 2e96d2867 for at least a full week
    > before reporting back.


    Or it'll reproduce after just a few hours :-/

    Looks like a false negative on v2.6.26 -- I'll retest from there.

    Ian.

    --
    Ian Campbell

    That secret you've been guarding, isn't.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjdJzwACgkQM0+0qS9rzVlhXgCfVfi+FgdOHW 3/5wkWUmhU4dqS
    Z9QAniJhdkWjeeg1WjgEWTK2hN7Yot/J
    =xvRm
    -----END PGP SIGNATURE-----


  18. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, Sep 26, 2008 at 07:17:33PM +0100, Ian Campbell wrote:
    > On Fri, 2008-09-26 at 16:37 +0100, Ian Campbell wrote:
    > > It's possible, the good case is naturally quite hard to establish with
    > > 100% certainty. I declared v2.6.26 OK after an uptime of 4 days and 19
    > > hours, compared with failure normally within 1-2 days. It's possible I
    > > was premature in doing so. I'll run 2e96d2867 for at least a full week
    > > before reporting back.

    >
    > Or it'll reproduce after just a few hours :-/
    >
    > Looks like a false negative on v2.6.26 -- I'll retest from there.


    OK, so apologies, but this has been a long thread, and maybe we could
    use a summary of the symptoms and the results so far. I think you said
    2.6.24 or .25 was the last you're *positive* was good?

    --b.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-09-26 at 23:54 -0400, J. Bruce Fields wrote:
    > OK, so apologies, but this has been a long thread, and maybe we could
    > use a summary of the symptoms and the results so far. I think you said
    > 2.6.24 or .25 was the last you're *positive* was good?


    2.6.24.x was good 2.6.25.y was not. The Debian packages don't include
    the stable release numbers in their version so I'm unsure of x and y,
    Judging from the changelog entries I think 2.6.24.3 and 2.6.25.10.

    Since I have been building my own kernels I have seen repros with:

    a551b98d5f6fce5897d497abd8bfb262efb33d2a Merge branch 'for-linus' of
    git://git.kernel.dk/linux-2.6-block (was tip of git at the time I tested
    it a while ago)

    c02a79dedc7f3c3d4fdbb5eb2000cacea5df4cde v2.6.26.5

    daedfbe2a67628a40076a6c75fb945c60f608a2e NFS: Ensure we zap only the
    access and acl caches when setting new acls (backport of
    f41f741838480aeaa3a189cff6e210503cf9c42d to 2.6.26.y stream, identified
    by bisect, possibly/probably incorrectly)

    f41f741838480aeaa3a189cff6e210503cf9c42d NFS: Ensure we zap only the
    access and acl caches when setting new acls (original trunk commit of
    above)

    2e96d2867245668dbdb973729288cf69b9fafa66 NFS: Fix a warning in
    nfs4_async_handle_error (commit before
    f41f741838480aeaa3a189cff6e210503cf9c42d)

    I did not see the problem with bce7f793daec3e65ec5c5705d2457b81fe7b5725
    (v2.6.26) in 4 days 19 hours. Due to the apparent mis-bisection leading
    to daedfbe2a67628a40076a6c75fb945c60f608a2e this result is in doubt, I'm
    currently retesting this kernel.

    The bisect was restricted to paths under fs/nfs* and net/sunrpc. I've
    gone over the range v2.6.26..daedfbe2a67628a40076a6c75fb945c60f608a2e
    and nothing jumps out at me. If my retest of 2.6.26 shows it to be OK
    again I'll bisect over the non-NFS changesets in that range.

    The symptoms are a hang of any access to an NFS mounted path, although
    the rest of the system continues to work.

    Other potentially interesting info cribbed from earlier postings:

    While the hang is occurring rpc_debug on the server gives nothing, on
    the client gives:
    [144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
    [144741.637997] 3439 0004 0080 -11 f3f48200 100003 f7770000 0 xprt_sending fa0ae88e fa0bddf4
    [144741.637997] 3438 0001 00a0 0 f77f2a00 100003 f77700d0 5000 xprt_pending fa0ae88e fa0bddf4

    The mount points are
    hopkins:/storage/music /storage/music nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys, mountproto=udp,addr=192.168.1.6 0 0
    hopkins:/storage/mythtv/recordings /var/lib/mythtv/recordings nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard ,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountp roto=udp,addr=192.168.1.6 0 0
    hopkins:/var/lib/mythvideo /var/lib/mythvideo nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard ,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountp roto=udp,addr=192.168.1.6 0 0
    hopkins:/storage/home/ijc /home/ijc nfs rw,vers=3,rsize=131072,wsize=131072,namlen=255,har d,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mou ntproto=udp,addr=192.168.1.6 0 0
    all of them seem to be effected simultaneously (but I'll check to make double sure next time it occurs).

    I've seen hung task backtraces from effected processes, e.g.
    Aug 4 06:27:28 iranon kernel: [137969.382277] INFO: task mythbackend:3161 blocked for more than 120 seconds.
    Aug 4 06:27:28 iranon kernel: [137969.382287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 4 06:27:28 iranon kernel: [137969.382291] mythbackend D 00005dfc 0 3161 1
    Aug 4 06:27:28 iranon kernel: [137969.382295] f2006c70 00000082 4b05cc7e 00005dfc f2006df0 c1c0e920 00000000 f7273390
    Aug 4 06:27:28 iranon kernel: [137969.382301] 0000c8b4 00000000 00000001 f7273398 00000282 00000202 f71b0ab0 f200ddf0
    Aug 4 06:27:28 iranon kernel: [137969.382306] c1c012bc fa18f0a6 fa18f0bf c02b45a7 f71b0ab0 00000000 f200de0c fa18f0a6
    Aug 4 06:27:28 iranon kernel: [137969.382311] Call Trace:
    Aug 4 06:27:28 iranon kernel: [137969.382347] [] nfs_wait_schedule+0x0/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382384] [] nfs_wait_schedule+0x19/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382399] [] __wait_on_bit_lock+0x2a/0x52
    Aug 4 06:27:28 iranon kernel: [137969.382407] [] nfs_wait_schedule+0x0/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382421] [] out_of_line_wait_on_bit_lock+0x5f/0x67
    Aug 4 06:27:28 iranon kernel: [137969.382429] [] wake_bit_function+0x0/0x3c
    Aug 4 06:27:28 iranon kernel: [137969.382441] [] __nfs_revalidate_inode+0xaa/0x211 [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382458] [] do_lookup+0x53/0x145
    Aug 4 06:27:28 iranon kernel: [137969.382466] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382472] [] __link_path_walk+0xa71/0xb65
    Aug 4 06:27:28 iranon kernel: [137969.382477] [] do_lookup+0x53/0x145
    Aug 4 06:27:28 iranon kernel: [137969.382483] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382492] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382496] [] path_walk+0x90/0x98
    Aug 4 06:27:28 iranon kernel: [137969.382502] [] nfs_getattr+0x8f/0xbe [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382520] [] nfs_getattr+0x0/0xbe [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382536] [] vfs_getattr+0x36/0x4d
    Aug 4 06:27:28 iranon kernel: [137969.382545] [] vfs_lstat_fd+0x27/0x39
    Aug 4 06:27:28 iranon kernel: [137969.382550] [] nfs_permission+0x0/0x129 [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382567] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382572] [] sys_faccessat+0x11e/0x15d
    Aug 4 06:27:28 iranon kernel: [137969.382582] [] sys_lstat64+0xf/0x23
    Aug 4 06:27:28 iranon kernel: [137969.382588] [] vfs_read+0xe3/0x11e
    Aug 4 06:27:28 iranon kernel: [137969.382598] [] sys_access+0xf/0x13
    Aug 4 06:27:28 iranon kernel: [137969.382603] [] sysenter_past_esp+0x6d/0xa5
    Aug 4 06:27:28 iranon kernel: [137969.382617] =======================

    Ian.
    --
    Ian Campbell

    Contemptuous lights flashed across the computer's console.
    -- Hitchhiker's Guide to the Galaxy

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkjeB/kACgkQM0+0qS9rzVknGwCeJXScpG/NonH+OrTWEHGJylGX
    04AAnj0KqzvL9vBzPVsMRhiLK2eqXCIb
    =oE/w
    -----END PGP SIGNATURE-----


+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3