NFS regression? Odd delays and lockups accessing an NFS export. - Kernel

This is a discussion on NFS regression? Odd delays and lockups accessing an NFS export. - Kernel ; Hi there, I've been using NFS here for years, lately there's something odd going on since about a month or so. Previously reported last month: http://www.gossamer-threads.com/list...1419?page=last Now with 2.6.27-rc3 on one of the client boxes I get a complete stall ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 59

Thread: NFS regression? Odd delays and lockups accessing an NFS export.

  1. NFS regression? Odd delays and lockups accessing an NFS export.

    Hi there,

    I've been using NFS here for years, lately there's something odd going on
    since about a month or so. Previously reported last month:
    http://www.gossamer-threads.com/list...1419?page=last

    Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    at odd times when accessing the server's exported directory, cannot
    see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    the server or client log files. Not easy to reproduce either.

    The server runs 2.6.26.2 at the moment.

    Server config, etc: http://bugsplatter.id.au/kernel/boxen/deltree/
    Client config, etc: http://bugsplatter.id.au/kernel/boxen/pooh/

    Grant.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, Aug 18, 2008 at 12:02:20PM +1000, Grant Coady wrote:
    > I've been using NFS here for years, lately there's something odd going on
    > since about a month or so. Previously reported last month:
    > http://www.gossamer-threads.com/list...1419?page=last
    >
    > Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    > at odd times when accessing the server's exported directory, cannot
    > see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    > the server or client log files. Not easy to reproduce either.


    I wonder if this is what I've been seeing. I've been otherwise too
    busy to properly report it, thinking that *someone* else must also be
    seeing it and it's being worked on, else it's a subtle configuration
    problem my end.

    I first started seeing this with 2.6.26 on the client end, 2.6.25.10 on
    the server end. Things last worked fine with that same server version
    and 2.6.25.10 on the client end. The affected export is:

    /home/users 192.168.1.161(rw,no_root_squash,sync,no_subtree_ch eck)

    using the in-kernel NFS server, not the userspace one.

    With the mount on the client being:

    192.168.1.162:/home/users /home/users nfs defaults,rw,nfsvers=3,rsize=8192,wsize=8192,nosuid ,nodev,soft,intr 0 0

    It used to have a 'nolock' in it, but I took that out and saw no
    difference. The one possible solid clue as to what's happening is this,
    after I turned on all the lock etc checking in a 2.6.26.2 kernel:

    Aug 12 16:10:28 emelia kernel: [ 361.851316] INFO: task firefox-bin:5716 blocked for more than 120 seconds.
    Aug 12 16:10:28 emelia kernel: [ 361.851326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 12 16:10:28 emelia kernel: [ 361.851332] firefox-bin D f7e0a018 5556 5716 5696
    Aug 12 16:10:28 emelia kernel: [ 361.851348] e24f3ddc 00000046 c2cc2e40 f7e0a018 c0640030 c0642e40 c0642e40 c0642e40
    Aug 12 16:10:28 emelia kernel: [ 361.851382] e2ec1018 e2ec1270 c2cc2e40 00000001 c2cc2e40 e24f3db8 00000046 c2c0b680
    Aug 12 16:10:28 emelia kernel: [ 361.851406] e2e40da0 c2c0b680 e2ec1270 fffef1e4 00000001 00000046 c2c0b670 c2c0b670
    Aug 12 16:10:28 emelia kernel: [ 361.851435] Call Trace:
    Aug 12 16:10:28 emelia kernel: [ 361.851449] [] nfs_wait_bit_killable+0x2a/0x2e
    Aug 12 16:10:28 emelia kernel: [ 361.851462] [] __wait_on_bit+0x36/0x5d
    Aug 12 16:10:28 emelia kernel: [ 361.851475] [] ? nfs_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:28 emelia kernel: [ 361.851586] [] out_of_line_wait_on_bit+0xac/0xb4
    Aug 12 16:10:28 emelia kernel: [ 361.851599] [] ? nfs_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:28 emelia kernel: [ 361.851613] [] ? wake_bit_function+0x0/0x43
    Aug 12 16:10:28 emelia kernel: [ 361.851632] [] nfs_wait_on_request+0x1f/0x26
    Aug 12 16:10:28 emelia kernel: [ 361.851642] [] nfs_sync_mapping_wait+0xde/0x27b
    Aug 12 16:10:28 emelia kernel: [ 361.851653] [] ? nfs_pageio_complete+0x8/0xa
    Aug 12 16:10:28 emelia kernel: [ 361.851673] [] __nfs_write_mapping+0x27/0x45
    Aug 12 16:10:28 emelia kernel: [ 361.851688] [] nfs_write_mapping+0x39/0x57
    Aug 12 16:10:28 emelia kernel: [ 361.851701] [] nfs_wb_all+0x10/0x12
    Aug 12 16:10:28 emelia kernel: [ 361.851713] [] nfs_file_flush+0x8b/0xc8
    Aug 12 16:10:28 emelia kernel: [ 361.851723] [] filp_close+0x31/0x5a
    Aug 12 16:10:28 emelia kernel: [ 361.851735] [] put_files_struct+0x68/0xaa
    Aug 12 16:10:28 emelia kernel: [ 361.851747] [] exit_files+0x37/0x3c
    Aug 12 16:10:28 emelia kernel: [ 361.851759] [] do_exit+0x21b/0x632
    Aug 12 16:10:28 emelia kernel: [ 361.851771] [] do_group_exit+0x5e/0x85
    Aug 12 16:10:28 emelia kernel: [ 361.851786] [] sys_exit_group+0x13/0x15
    Aug 12 16:10:28 emelia kernel: [ 361.851799] [] sysenter_past_esp+0x6a/0xb1
    Aug 12 16:10:28 emelia kernel: [ 361.851816] =======================
    Aug 12 16:10:28 emelia kernel: [ 361.851821] INFO: lockdep is turned off.
    Aug 12 16:10:30 emelia kernel: [ 363.999492] INFO: task famd:2760 blocked for more than 120 seconds.
    Aug 12 16:10:30 emelia kernel: [ 363.999501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 12 16:10:30 emelia kernel: [ 363.999507] famd D c093dd28 5684 2760 1
    Aug 12 16:10:30 emelia kernel: [ 363.999521] f570dbe8 00000046 00000000 c093dd28 c0640030 c0642e40 c0642e40 c0642e40
    Aug 12 16:10:30 emelia kernel: [ 363.999697] f6dcc018 f6dcc270 c2cc2e40 00000001 c2cc2e40 f570dbc4 00000046 c2c027b0
    Aug 12 16:10:30 emelia kernel: [ 363.999767] f6516da0 c2c027b0 f6dcc270 fffefe66 00000001 00000046 c2c027a0 c2c027a0
    Aug 12 16:10:30 emelia kernel: [ 363.999783] Call Trace:
    Aug 12 16:10:30 emelia kernel: [ 363.999796] [] rpc_wait_bit_killable+0x2a/0x2e
    Aug 12 16:10:30 emelia kernel: [ 363.999808] [] __wait_on_bit+0x36/0x5d
    Aug 12 16:10:30 emelia kernel: [ 363.999817] [] ? rpc_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:30 emelia kernel: [ 363.999828] [] out_of_line_wait_on_bit+0xac/0xb4
    Aug 12 16:10:30 emelia kernel: [ 363.999836] [] ? rpc_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:30 emelia kernel: [ 363.999846] [] ? wake_bit_function+0x0/0x43
    Aug 12 16:10:30 emelia kernel: [ 363.999857] [] __rpc_execute+0xcb/0x1e1
    Aug 12 16:10:30 emelia kernel: [ 363.999865] [] rpc_execute+0x1b/0x1e
    Aug 12 16:10:30 emelia kernel: [ 363.999872] [] rpc_run_task+0x43/0x49
    Aug 12 16:10:30 emelia kernel: [ 363.999881] [] rpc_call_sync+0x44/0x5f
    Aug 12 16:10:30 emelia kernel: [ 363.999891] [] nfs3_rpc_wrapper+0x17/0x4d
    Aug 12 16:10:30 emelia kernel: [ 363.999900] [] nfs3_proc_access+0xc9/0x121
    Aug 12 16:10:30 emelia kernel: [ 363.999919] [] nfs_do_access+0x129/0x27c
    Aug 12 16:10:30 emelia kernel: [ 363.999931] [] nfs_permission+0xdb/0x13f
    Aug 12 16:10:30 emelia kernel: [ 363.999938] [] ? nfs_permission+0x0/0x13f
    Aug 12 16:10:30 emelia kernel: [ 363.999947] [] permission+0x91/0xd0
    Aug 12 16:10:30 emelia kernel: [ 363.999956] [] vfs_permission+0x10/0x12
    Aug 12 16:10:30 emelia kernel: [ 363.999963] [] __link_path_walk+0x106/0xc1d
    Aug 12 16:10:30 emelia kernel: [ 363.999972] [] ? kernel_map_pages+0xff/0x116
    Aug 12 16:10:30 emelia kernel: [ 363.999986] [] path_walk+0x4c/0x9b
    Aug 12 16:10:30 emelia kernel: [ 363.999994] [] do_path_lookup+0x198/0x1e1
    Aug 12 16:10:30 emelia kernel: [ 364.000003] [] __user_walk_fd+0x2f/0x43
    Aug 12 16:10:30 emelia kernel: [ 364.000012] [] vfs_lstat_fd+0x16/0x3d
    Aug 12 16:10:30 emelia kernel: [ 364.000022] [] ? groups_free+0x34/0x38
    Aug 12 16:10:30 emelia kernel: [ 364.000034] [] vfs_lstat+0x11/0x13
    Aug 12 16:10:30 emelia kernel: [ 364.000040] [] sys_lstat64+0x14/0x28
    Aug 12 16:10:30 emelia kernel: [ 364.000047] [] ? groups_free+0x34/0x38
    Aug 12 16:10:30 emelia kernel: [ 364.000056] [] ? set_current_groups+0x13f/0x149
    Aug 12 16:10:30 emelia kernel: [ 364.000067] [] ? sys_setresuid+0x153/0x165
    Aug 12 16:10:30 emelia kernel: [ 364.000077] [] ? sysenter_past_esp+0xa5/0xb1
    Aug 12 16:10:30 emelia kernel: [ 364.000086] [] sysenter_past_esp+0x6a/0xb1
    Aug 12 16:10:30 emelia kernel: [ 364.000097] =======================
    Aug 12 16:10:30 emelia kernel: [ 364.000100] INFO: lockdep is turned off.
    Aug 12 16:10:33 emelia kernel: [ 366.146658] INFO: task bash:5640 blocked for more than 120 seconds.
    Aug 12 16:10:33 emelia kernel: [ 366.146667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 12 16:10:33 emelia kernel: [ 366.146672] bash D 00000000 6472 5640 5627
    Aug 12 16:10:33 emelia kernel: [ 366.146682] e2d4bad4 00000046 00000000 00000000 c0640030 c0642e40 c0642e40 c0642e40
    Aug 12 16:10:33 emelia kernel: [ 366.146699] e2cee018 e2cee270 c2c70e40 00000000 c2c70e40 28d8cec5 00000037 c2c02b70
    Aug 12 16:10:33 emelia kernel: [ 366.146713] e2e43da0 c2c02b70 e2cee270 00000046 00000001 00000046 c2c02b60 c2c02b60
    Aug 12 16:10:33 emelia kernel: [ 366.146728] Call Trace:
    Aug 12 16:10:33 emelia kernel: [ 366.146739] [] ? _spin_unlock_irqrestore+0x42/0x58
    Aug 12 16:10:33 emelia kernel: [ 366.146992] [] rpc_wait_bit_killable+0x2a/0x2e
    Aug 12 16:10:33 emelia kernel: [ 366.147001] [] __wait_on_bit+0x36/0x5d
    Aug 12 16:10:33 emelia kernel: [ 366.147010] [] ? rpc_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:33 emelia kernel: [ 366.147020] [] out_of_line_wait_on_bit+0xac/0xb4
    Aug 12 16:10:33 emelia kernel: [ 366.147028] [] ? rpc_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:33 emelia kernel: [ 366.147038] [] ? wake_bit_function+0x0/0x43
    Aug 12 16:10:33 emelia kernel: [ 366.147050] [] __rpc_execute+0xcb/0x1e1
    Aug 12 16:10:33 emelia kernel: [ 366.147058] [] rpc_execute+0x1b/0x1e
    Aug 12 16:10:33 emelia kernel: [ 366.147065] [] rpc_run_task+0x43/0x49
    Aug 12 16:10:33 emelia kernel: [ 366.147074] [] rpc_call_sync+0x44/0x5f
    Aug 12 16:10:33 emelia kernel: [ 366.147083] [] nfs3_rpc_wrapper+0x17/0x4d
    Aug 12 16:10:33 emelia kernel: [ 366.147093] [] nfs3_proc_getattr+0x58/0x7a
    Aug 12 16:10:33 emelia kernel: [ 366.147102] [] __nfs_revalidate_inode+0x122/0x26d
    Aug 12 16:10:33 emelia kernel: [ 366.147116] [] ? nfs_attribute_timeout+0x0/0xae
    Aug 12 16:10:33 emelia kernel: [ 366.147126] [] ? nfs_attribute_timeout+0x61/0xae
    Aug 12 16:10:33 emelia kernel: [ 366.147137] [] nfs_lookup_revalidate+0x245/0x471
    Aug 12 16:10:33 emelia kernel: [ 366.147160] [] ? __d_lookup+0x12a/0x146
    Aug 12 16:10:33 emelia kernel: [ 366.147171] [] do_lookup+0x115/0x154
    Aug 12 16:10:33 emelia kernel: [ 366.147180] [] __link_path_walk+0x816/0xc1d
    Aug 12 16:10:33 emelia kernel: [ 366.147190] [] ? kernel_map_pages+0xff/0x116
    Aug 12 16:10:33 emelia kernel: [ 366.147203] [] path_walk+0x4c/0x9b
    Aug 12 16:10:33 emelia kernel: [ 366.147212] [] do_path_lookup+0x198/0x1e1
    Aug 12 16:10:33 emelia kernel: [ 366.147220] [] __path_lookup_intent_open+0x45/0x76
    Aug 12 16:10:33 emelia kernel: [ 366.147230] [] path_lookup_open+0x10/0x12
    Aug 12 16:10:33 emelia kernel: [ 366.147238] [] do_filp_open+0x9e/0x6cc
    Aug 12 16:10:33 emelia kernel: [ 366.147252] [] ? get_unused_fd_flags+0xc3/0xcd
    Aug 12 16:10:33 emelia kernel: [ 366.147263] [] do_sys_open+0x40/0xb6
    Aug 12 16:10:33 emelia kernel: [ 366.147271] [] sys_open+0x1e/0x26
    Aug 12 16:10:33 emelia kernel: [ 366.147279] [] sysenter_past_esp+0x6a/0xb1
    Aug 12 16:10:33 emelia kernel: [ 366.147290] =======================
    Aug 12 16:10:33 emelia kernel: [ 366.147294] INFO: lockdep is turned off.
    Aug 12 16:10:41 emelia kernel: [ 374.736634] INFO: task gconfd-2:5492 blocked for more than 120 seconds.
    Aug 12 16:10:41 emelia kernel: [ 374.736644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 12 16:10:41 emelia kernel: [ 374.736648] gconfd-2 D e31a5dec 5132 5492 1
    Aug 12 16:10:41 emelia kernel: [ 374.736658] e31a5e48 00000046 f62d3b28 e31a5dec c0640030 c0642e40 c0642e40 c0642e40
    Aug 12 16:10:41 emelia kernel: [ 374.736674] e44e9018 e44e9270 c2c70e40 00000000 c2c70e40 e31a5e24 00000046 c2c05280
    Aug 12 16:10:41 emelia kernel: [ 374.736689] f40a9da0 c2c05280 e44e9270 ffff24ad 00000001 00000046 c2c05270 c2c05270
    Aug 12 16:10:41 emelia kernel: [ 374.736704] Call Trace:
    Aug 12 16:10:41 emelia kernel: [ 374.736716] [] nfs_wait_bit_killable+0x2a/0x2e
    Aug 12 16:10:41 emelia kernel: [ 374.736728] [] __wait_on_bit+0x36/0x5d
    Aug 12 16:10:41 emelia kernel: [ 374.736737] [] ? nfs_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:41 emelia kernel: [ 374.736747] [] out_of_line_wait_on_bit+0xac/0xb4
    Aug 12 16:10:41 emelia kernel: [ 374.736755] [] ? nfs_wait_bit_killable+0x0/0x2e
    Aug 12 16:10:41 emelia kernel: [ 374.736765] [] ? wake_bit_function+0x0/0x43
    Aug 12 16:10:41 emelia kernel: [ 374.736777] [] nfs_wait_on_request+0x1f/0x26
    Aug 12 16:10:41 emelia kernel: [ 374.736784] [] nfs_sync_mapping_wait+0xde/0x27b
    Aug 12 16:10:41 emelia kernel: [ 374.736791] [] ? nfs_pageio_complete+0x8/0xa
    Aug 12 16:10:41 emelia kernel: [ 374.736805] [] __nfs_write_mapping+0x27/0x45
    Aug 12 16:10:41 emelia kernel: [ 374.736813] [] nfs_write_mapping+0x39/0x57
    Aug 12 16:10:41 emelia kernel: [ 374.736910] [] nfs_wb_all+0x10/0x12
    Aug 12 16:10:41 emelia kernel: [ 374.736917] [] nfs_file_flush+0x8b/0xc8
    Aug 12 16:10:41 emelia kernel: [ 374.736926] [] filp_close+0x31/0x5a
    Aug 12 16:10:41 emelia kernel: [ 374.736934] [] sys_close+0x6a/0xa4
    Aug 12 16:10:41 emelia kernel: [ 374.736942] [] sysenter_past_esp+0x6a/0xb1
    Aug 12 16:10:41 emelia kernel: [ 374.736954] =======================
    Aug 12 16:10:41 emelia kernel: [ 374.736958] INFO: lockdep is turned off.

    Server and client kernel configs attached.
    --
    - Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
    Finger athan(at)fysh.org for PGP key
    "And it's me who is my enemy. Me who beats me up.
    Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFIqcSISEDmQuIYzh0RAiUgAJ4mdMtzAU89AJ9ttqqycD SGBj8KvACfb+2b
    1Roro7Wej1ajbXRF7kycoz0=
    =U/TY
    -----END PGP SIGNATURE-----


  3. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 2008-08-18 at 12:02 +1000, Grant Coady wrote:
    > Hi there,
    >
    > I've been using NFS here for years, lately there's something odd going on
    > since about a month or so. Previously reported last month:
    > http://www.gossamer-threads.com/list...1419?page=last
    >
    > Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    > at odd times when accessing the server's exported directory, cannot
    > see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    > the server or client log files. Not easy to reproduce either.
    >
    > The server runs 2.6.26.2 at the moment.
    >
    > Server config, etc: http://bugsplatter.id.au/kernel/boxen/deltree/
    > Client config, etc: http://bugsplatter.id.au/kernel/boxen/pooh/


    Please try to reproduce the hang, then do

    echo 0 >/proc/sys/sunrpc/rpc_debug

    and send the output from 'dmesg'...

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 2008-08-18 at 19:50 +0100, Athanasius wrote:
    > On Mon, Aug 18, 2008 at 12:02:20PM +1000, Grant Coady wrote:
    > > I've been using NFS here for years, lately there's something odd going on
    > > since about a month or so. Previously reported last month:
    > > http://www.gossamer-threads.com/list...1419?page=last
    > >
    > > Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    > > at odd times when accessing the server's exported directory, cannot
    > > see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    > > the server or client log files. Not easy to reproduce either.

    >
    > I wonder if this is what I've been seeing. I've been otherwise too
    > busy to properly report it, thinking that *someone* else must also be
    > seeing it and it's being worked on, else it's a subtle configuration
    > problem my end.


    Your lockdep trace basically shows that the rpc layer is blocking for
    some reason. Could you please try to reproduce the problem, and then do

    echo 0 >/proc/sys/sunrpc/rpc_debug

    ....and see what the output from 'dmesg' shows?

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    Athanasius wrote:
    > On Mon, Aug 18, 2008 at 12:02:20PM +1000, Grant Coady wrote:
    >> I've been using NFS here for years, lately there's something odd going on
    >> since about a month or so. Previously reported last month:
    >> http://www.gossamer-threads.com/list...1419?page=last
    >>
    >> Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    >> at odd times when accessing the server's exported directory, cannot
    >> see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    >> the server or client log files. Not easy to reproduce either.

    >
    > I wonder if this is what I've been seeing. I've been otherwise too
    > busy to properly report it, thinking that *someone* else must also be
    > seeing it and it's being worked on, else it's a subtle configuration
    > problem my end.
    >
    > I first started seeing this with 2.6.26 on the client end,


    I see you're running the atl1 driver on your client. I'm chasing an
    intermittent bug in that driver that seems to be affected by certain
    offload parameters. Try disabling TSO and see if things improve.

    ethtool -K eth0 tso off

    Let me know if it helps.

    Jay
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, Aug 18, 2008 at 02:37:45PM -0500, J. K. Cliburn wrote:
    > Athanasius wrote:
    > > I wonder if this is what I've been seeing. I've been otherwise too
    > >busy to properly report it, thinking that *someone* else must also be
    > >seeing it and it's being worked on, else it's a subtle configuration
    > >problem my end.
    > >
    > > I first started seeing this with 2.6.26 on the client end,

    >
    > I see you're running the atl1 driver on your client. I'm chasing an
    > intermittent bug in that driver that seems to be affected by certain
    > offload parameters. Try disabling TSO and see if things improve.
    >
    > ethtool -K eth0 tso off
    >
    > Let me know if it helps.


    Indeed it does, and indeed tso was on by default. I booted up once
    after first trying this, thinking I had things set to apply it
    automatically, but hadn't and ran into the exact same problem. This 2nd
    time around with the setting definitely applied things appear to be
    working normally. At least I can actually start Firefox up and have it
    work (lots of /home/users/... access over NFS to the server).

    thanks,

    -Ath
    --
    - Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
    Finger athan(at)fysh.org for PGP key
    "And it's me who is my enemy. Me who beats me up.
    Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQFIqgIkSEDmQuIYzh0RApx2AJ4m2Sh3BWGlE9PUygG4bO Q2Cs8kJwCfdr3J
    bOtWRC2AUJ8WOH5q4SDrkHU=
    =aw7P
    -----END PGP SIGNATURE-----


  7. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    Hi Trond,

    On Mon, 18 Aug 2008 15:20:58 -0400, Trond Myklebust wrote:

    >On Mon, 2008-08-18 at 12:02 +1000, Grant Coady wrote:
    >> Hi there,
    >>
    >> I've been using NFS here for years, lately there's something odd going on
    >> since about a month or so. Previously reported last month:
    >> http://www.gossamer-threads.com/list...1419?page=last
    >>
    >> Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    >> at odd times when accessing the server's exported directory, cannot
    >> see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    >> the server or client log files. Not easy to reproduce either.
    >>
    >> The server runs 2.6.26.2 at the moment.
    >>
    >> Server config, etc: http://bugsplatter.id.au/kernel/boxen/deltree/
    >> Client config, etc: http://bugsplatter.id.au/kernel/boxen/pooh/

    >
    >Please try to reproduce the hang, then do
    >
    > echo 0 >/proc/sys/sunrpc/rpc_debug
    >
    >and send the output from 'dmesg'...


    dmesg and logs from client machine showed nothing, dmesg from server:

    -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--

    (nothing, nada, zilch after what looks like the header line)

    Situation, first use of 'df' after reboot, client machine hung for many
    seconds before the /home/common export showed up, this is not repeatable.
    Mounting another export and doing 'df' again had no delays.

    Client is running 2.6.27-rc3-git5 (x86_64) at the moment, configs, etc
    at: http://bugsplatter.id.au/kernel/boxen/pooh64/

    Server as above. Am I supposed to have some debugging turned on in the
    ..configs?

    Grant.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 18 Aug 2008 15:20:58 -0400, Trond Myklebust wrote:

    >On Mon, 2008-08-18 at 12:02 +1000, Grant Coady wrote:
    >> Hi there,
    >>
    >> I've been using NFS here for years, lately there's something odd going on
    >> since about a month or so. Previously reported last month:
    >> http://www.gossamer-threads.com/list...1419?page=last
    >>
    >> Now with 2.6.27-rc3 on one of the client boxes I get a complete stall
    >> at odd times when accessing the server's exported directory, cannot
    >> see a pattern to it. Eventually recovers after a Ctrl-C. Nothing in
    >> the server or client log files. Not easy to reproduce either.
    >>
    >> The server runs 2.6.26.2 at the moment.
    >>
    >> Server config, etc: http://bugsplatter.id.au/kernel/boxen/deltree/
    >> Client config, etc: http://bugsplatter.id.au/kernel/boxen/pooh/

    >
    >Please try to reproduce the hang, then do
    >
    > echo 0 >/proc/sys/sunrpc/rpc_debug
    >
    >and send the output from 'dmesg'...


    It's not NFS, I just had WinXP box stall completely (no KB NumLock
    response) on a PuTTY (ssh) terminal to a linux box running 2.6.26.3
    for about five or ten seconds, when it came back to operation the
    PuTYY toolbar icon was flashing. Dunno which terminal as I had four
    open at the time, but they all to a server box running 2.6.26.3 with
    multiple Intel pro/100 (e100) NICs.

    I think this may be to do with the e100 network driver instead, as a
    few weeks ago I started compiling it in rather than as a module like
    I've been doing for years (in response to the firmware (microcode)
    being dropped on the floor last month issue). This box is the server
    (deltree) that all linux boxes mount a /home/common NFS export from.

    All machines have the Intel pro/100 NIC, going to recompile the lot
    back to modules, see if that fixes the problem.

    Grant.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Mon, 2008-08-18 at 15:20 -0400, Trond Myklebust wrote:
    > Please try to reproduce the hang, then do
    >
    > echo 0 >/proc/sys/sunrpc/rpc_debug
    >
    > and send the output from 'dmesg'...


    I've also been seeing some NFS related lockups, although I'm not sure if
    they are the same as the one in this thread or not. Client is 2.6.26
    (Debian's kernel) and Server is 2.6.25 (also a Debian kernel, but from
    backports.org).

    rpc_debug on the server gives nothing, on the client gives:
    [144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
    [144741.637997] 3439 0004 0080 -11 f3f48200 100003 f7770000 0 xprt_sending fa0ae88e fa0bddf4
    [144741.637997] 3438 0001 00a0 0 f77f2a00 100003 f77700d0 15000 xprt_pending fa0ae88e fa0bddf4

    There are no processes running with pid 3439 3438 (I don't think it's
    that sort of pid though).

    mounts points are:
    hopkins:/storage/music /storage/music nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys, mountproto=udp,addr=192.168.1.6 0 0
    hopkins:/storage/mythtv/recordings /var/lib/mythtv/recordings nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard ,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountp roto=udp,addr=192.168.1.6 0 0
    hopkins:/var/lib/mythvideo /var/lib/mythvideo nfs rw,nosuid,vers=3,rsize=32768,wsize=32768,namlen=25 5,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard ,intr,proto=tcp,timeo=600,retrans=2,sec=sys,mountp roto=udp,addr=192.168.1.6 0 0
    hopkins:/storage/home/ijc /home/ijc nfs rw,vers=3,rsize=131072,wsize=131072,namlen=255,har d,nointr,proto=tcp,timeo=600,retrans=2,sec=sys,mou ntproto=udp,addr=192.168.1.6 0 0
    and all seem to be effected.

    It hasn't happened this time (yet) but usually I get a hung task
    backtrace like this:
    Aug 4 06:27:28 iranon kernel: [137969.382277] INFO: task mythbackend:3161 blocked for more than 120 seconds.
    Aug 4 06:27:28 iranon kernel: [137969.382287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 4 06:27:28 iranon kernel: [137969.382291] mythbackend D 00005dfc 0 3161 1
    Aug 4 06:27:28 iranon kernel: [137969.382295] f2006c70 00000082 4b05cc7e 00005dfc f2006df0 c1c0e920 00000000 f7273390
    Aug 4 06:27:28 iranon kernel: [137969.382301] 0000c8b4 00000000 00000001 f7273398 00000282 00000202 f71b0ab0 f200ddf0
    Aug 4 06:27:28 iranon kernel: [137969.382306] c1c012bc fa18f0a6 fa18f0bf c02b45a7 f71b0ab0 00000000 f200de0c fa18f0a6
    Aug 4 06:27:28 iranon kernel: [137969.382311] Call Trace:
    Aug 4 06:27:28 iranon kernel: [137969.382347] [] nfs_wait_schedule+0x0/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382384] [] nfs_wait_schedule+0x19/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382399] [] __wait_on_bit_lock+0x2a/0x52
    Aug 4 06:27:28 iranon kernel: [137969.382407] [] nfs_wait_schedule+0x0/0x1e [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382421] [] out_of_line_wait_on_bit_lock+0x5f/0x67
    Aug 4 06:27:28 iranon kernel: [137969.382429] [] wake_bit_function+0x0/0x3c
    Aug 4 06:27:28 iranon kernel: [137969.382441] [] __nfs_revalidate_inode+0xaa/0x211 [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382458] [] do_lookup+0x53/0x145
    Aug 4 06:27:28 iranon kernel: [137969.382466] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382472] [] __link_path_walk+0xa71/0xb65
    Aug 4 06:27:28 iranon kernel: [137969.382477] [] do_lookup+0x53/0x145
    Aug 4 06:27:28 iranon kernel: [137969.382483] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382492] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382496] [] path_walk+0x90/0x98
    Aug 4 06:27:28 iranon kernel: [137969.382502] [] nfs_getattr+0x8f/0xbe [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382520] [] nfs_getattr+0x0/0xbe [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382536] [] vfs_getattr+0x36/0x4d
    Aug 4 06:27:28 iranon kernel: [137969.382545] [] vfs_lstat_fd+0x27/0x39
    Aug 4 06:27:28 iranon kernel: [137969.382550] [] nfs_permission+0x0/0x129 [nfs]
    Aug 4 06:27:28 iranon kernel: [137969.382567] [] mntput_no_expire+0x11/0x64
    Aug 4 06:27:28 iranon kernel: [137969.382572] [] sys_faccessat+0x11e/0x15d
    Aug 4 06:27:28 iranon kernel: [137969.382582] [] sys_lstat64+0xf/0x23
    Aug 4 06:27:28 iranon kernel: [137969.382588] [] vfs_read+0xe3/0x11e
    Aug 4 06:27:28 iranon kernel: [137969.382598] [] sys_access+0xf/0x13
    Aug 4 06:27:28 iranon kernel: [137969.382603] [] sysenter_past_esp+0x6d/0xa5
    Aug 4 06:27:28 iranon kernel: [137969.382617] =======================

    Sysrq-T from dmesg (therefore truncated a bit) is attached.

    Ian.

    --
    Ian Campbell

    There are no answers, only cross-references.
    -- Weiner


  10. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 11:23 +0100, Ian Campbell wrote:
    > On Mon, 2008-08-18 at 15:20 -0400, Trond Myklebust wrote:
    > > Please try to reproduce the hang, then do
    > >
    > > echo 0 >/proc/sys/sunrpc/rpc_debug
    > >
    > > and send the output from 'dmesg'...

    >
    > I've also been seeing some NFS related lockups, although I'm not sure if
    > they are the same as the one in this thread or not. Client is 2.6.26
    > (Debian's kernel) and Server is 2.6.25 (also a Debian kernel, but from
    > backports.org).
    >
    > rpc_debug on the server gives nothing, on the client gives:
    > [144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
    > [144741.637997] 3439 0004 0080 -11 f3f48200 100003 f7770000 0 xprt_sending fa0ae88e fa0bddf4
    > [144741.637997] 3438 0001 00a0 0 f77f2a00 100003 f77700d0 15000 xprt_pending fa0ae88e fa0bddf4


    That's probably also a networking device driver issue candidate: your
    RPC task is queued up waiting to be sent.

    What networking card+device driver are you using here?

    > There are no processes running with pid 3439 3438 (I don't think it's
    > that sort of pid though).


    The 'pid' is an internal RPC cookie that just serves to identify and
    track specific RPC requests.

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 11:08 -0700, Trond Myklebust wrote:
    > On Fri, 2008-08-22 at 11:23 +0100, Ian Campbell wrote:
    > > On Mon, 2008-08-18 at 15:20 -0400, Trond Myklebust wrote:
    > > > Please try to reproduce the hang, then do
    > > >
    > > > echo 0 >/proc/sys/sunrpc/rpc_debug
    > > >
    > > > and send the output from 'dmesg'...

    > >
    > > I've also been seeing some NFS related lockups, although I'm not sure if
    > > they are the same as the one in this thread or not. Client is 2.6.26
    > > (Debian's kernel) and Server is 2.6.25 (also a Debian kernel, but from
    > > backports.org).
    > >
    > > rpc_debug on the server gives nothing, on the client gives:
    > > [144741.637997] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
    > > [144741.637997] 3439 0004 0080 -11 f3f48200 100003 f7770000 0 xprt_sending fa0ae88e fa0bddf4
    > > [144741.637997] 3438 0001 00a0 0 f77f2a00 100003 f77700d0 15000 xprt_pending fa0ae88e fa0bddf4

    >
    > That's probably also a networking device driver issue candidate: your
    > RPC task is queued up waiting to be sent.
    >
    > What networking card+device driver are you using here?


    # ethtool -i eth0
    driver: e1000
    version: 7.3.20-k2-NAPI
    firmware-version: N/A
    bus-info: 0000:01:0a.0

    Adding CC's to peeps listed in MAINTAINERS

    I have to reboot the system now (or else there will be no TV to watch
    during my tea ;-). It will probably repro again quite soon though.

    > > There are no processes running with pid 3439 3438 (I don't think it's
    > > that sort of pid though).

    >
    > The 'pid' is an internal RPC cookie that just serves to identify and
    > track specific RPC requests.


    Right, thanks.

    Ian.

    --
    Ian Campbell

    Colors may fade.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkivAdIACgkQM0+0qS9rzVlFcQCffxm5vgStyX KuaFGnuguUtXdk
    PpYAoMD+hYRiMVPZa7ouxp5dZn/OP8BF
    =HAJQ
    -----END PGP SIGNATURE-----


  12. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, Aug 22, 2008 at 11:13 AM, Ian Campbell wrote:

    >> That's probably also a networking device driver issue candidate: your
    >> RPC task is queued up waiting to be sent.
    >>
    >> What networking card+device driver are you using here?

    >
    > # ethtool -i eth0
    > driver: e1000
    > version: 7.3.20-k2-NAPI
    > firmware-version: N/A
    > bus-info: 0000:01:0a.0

    There is nothing indicating that the NIC/driver is causing any sort of
    problem here, at least not with what has been presented so far. When
    the NFS mount isn't working is the networking still active and
    working?

    --
    Cheers,
    John
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 12:33 -0700, John Ronciak wrote:
    > On Fri, Aug 22, 2008 at 11:13 AM, Ian Campbell wrote:
    >
    > >> That's probably also a networking device driver issue candidate: your
    > >> RPC task is queued up waiting to be sent.
    > >>
    > >> What networking card+device driver are you using here?

    > >
    > > # ethtool -i eth0
    > > driver: e1000
    > > version: 7.3.20-k2-NAPI
    > > firmware-version: N/A
    > > bus-info: 0000:01:0a.0

    > There is nothing indicating that the NIC/driver is causing any sort of
    > problem here, at least not with what has been presented so far. When
    > the NFS mount isn't working is the networking still active and
    > working?


    So far as I can tell, yes. I can login via ssh so long as the user
    doesn't have NFS $HOME, I haven't tried much else and the box isn't
    locked up at the moment, I'd bet it's fine though.

    Ian.

    --
    Ian Campbell

    "On a normal ascii line, the only safe condition to detect is a 'BREAK'
    - everything else having been assigned functions by Gnu EMACS."
    (By Tarl Neustaedter)

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkivGsYACgkQM0+0qS9rzVk1qACeILsY9z8su3 G1T9vcBFtVDalF
    h6YAn2hHgJc4Zkd/AjoasBMl2qzt58ir
    =+jje
    -----END PGP SIGNATURE-----


  14. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, Aug 22, 2008 at 1:00 PM, Ian Campbell wrote:

    > So far as I can tell, yes. I can login via ssh so long as the user
    > doesn't have NFS $HOME, I haven't tried much else and the box isn't
    > locked up at the moment, I'd bet it's fine though.

    Then it's highly unlikely that the NIC/driver is causing your problem.
    We'll continue to monitor this thread waiting for any info that this
    is somehow NIC/driver related.

    Good luck.



    --
    Cheers,
    John
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 21:00 +0100, Ian Campbell wrote:
    > On Fri, 2008-08-22 at 12:33 -0700, John Ronciak wrote:
    > > On Fri, Aug 22, 2008 at 11:13 AM, Ian Campbell wrote:
    > >
    > > >> That's probably also a networking device driver issue candidate: your
    > > >> RPC task is queued up waiting to be sent.
    > > >>
    > > >> What networking card+device driver are you using here?
    > > >
    > > > # ethtool -i eth0
    > > > driver: e1000
    > > > version: 7.3.20-k2-NAPI
    > > > firmware-version: N/A
    > > > bus-info: 0000:01:0a.0

    > > There is nothing indicating that the NIC/driver is causing any sort of
    > > problem here, at least not with what has been presented so far. When
    > > the NFS mount isn't working is the networking still active and
    > > working?

    >
    > So far as I can tell, yes. I can login via ssh so long as the user
    > doesn't have NFS $HOME, I haven't tried much else and the box isn't
    > locked up at the moment, I'd bet it's fine though.


    ....and the server? Something is preventing that RPC payload from being
    delivered...

    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 14:23 -0700, Trond Myklebust wrote:
    > On Fri, 2008-08-22 at 21:00 +0100, Ian Campbell wrote:
    > > On Fri, 2008-08-22 at 12:33 -0700, John Ronciak wrote:
    > > > On Fri, Aug 22, 2008 at 11:13 AM, Ian Campbell wrote:
    > > >
    > > > >> That's probably also a networking device driver issue candidate: your
    > > > >> RPC task is queued up waiting to be sent.
    > > > >>
    > > > >> What networking card+device driver are you using here?
    > > > >
    > > > > # ethtool -i eth0
    > > > > driver: e1000
    > > > > version: 7.3.20-k2-NAPI
    > > > > firmware-version: N/A
    > > > > bus-info: 0000:01:0a.0
    > > > There is nothing indicating that the NIC/driver is causing any sort of
    > > > problem here, at least not with what has been presented so far. When
    > > > the NFS mount isn't working is the networking still active and
    > > > working?

    > >
    > > So far as I can tell, yes. I can login via ssh so long as the user
    > > doesn't have NFS $HOME, I haven't tried much else and the box isn't
    > > locked up at the moment, I'd bet it's fine though.

    >
    > ...and the server? Something is preventing that RPC payload from being
    > delivered...


    I can ssh to the server fine. The same server also serves my NFS home
    directory to the box I'm writing this from and I've not seen any trouble
    with this box at all, it's a 2.6.18-xen box.

    If I downgrade the problematic box to 2.6.24 then the hangs do not
    occur. They do occur with 2.6.25. (sorry that's a critical bit of
    information which I stupidly forgot to mention up til now).

    Ian.

    --
    Ian Campbell

    It is much easier to be critical than to be correct.
    -- Benjamin Disraeli

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkivMZEACgkQM0+0qS9rzVm+gACgwe5kMxefdR i4JPleL/HXxply
    qUoAnRtMiB4jG88FbVYyGZfhNkaXSywh
    =PLA2
    -----END PGP SIGNATURE-----


  17. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
    > I can ssh to the server fine. The same server also serves my NFS home
    > directory to the box I'm writing this from and I've not seen any trouble
    > with this box at all, it's a 2.6.18-xen box.


    OK... Are you able to reproduce the problem reliably?

    If so, can you provide me with a binary tcpdump or wireshark dump? If
    using tcpdump, then please use something like

    tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049

    Please also try to provide a netstat dump of the current TCP connections
    as soon as the hang occurs:

    netstat -t

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Fri, 2008-08-22 at 14:56 -0700, Trond Myklebust wrote:
    > On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
    > > I can ssh to the server fine. The same server also serves my NFS home
    > > directory to the box I'm writing this from and I've not seen any trouble
    > > with this box at all, it's a 2.6.18-xen box.

    >
    > OK... Are you able to reproduce the problem reliably?


    It usually happens in around a day, but I can't make it happen at will
    so that I can arrange to be present at the time. It has usually locked
    up over night in the past.

    > If so, can you provide me with a binary tcpdump or wireshark dump? If
    > using tcpdump, then please use something like
    >
    > tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049


    I'll try leaving this going overnight but using -C and -W to limit the
    size to the disk space available.

    > Please also try to provide a netstat dump of the current TCP connections
    > as soon as the hang occurs:
    >
    > netstat -t


    Will do it ASAP after it happens.

    Ian.
    --
    Ian Campbell

    revision 1.17.2.7
    date: 2001/05/31 21:32:44; author: branden; state: Exp; lines: +1 -1
    ARRRRGH!! GOT THE G** D*** SENSE OF A F******* TEST BACKWARDS!

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEABECAAYFAkivQKcACgkQM0+0qS9rzVmsKgCeMaRkuiWw6j zKAcu2lWWP2oGl
    N4AAoIBRSMwibWM9yIxIRXlyEt4siGKM
    =EAxU
    -----END PGP SIGNATURE-----


  19. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Sun, 2008-08-24 at 19:53 +0100, Ian Campbell wrote:
    > On Fri, 2008-08-22 at 14:56 -0700, Trond Myklebust wrote:
    > > On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
    > > > I can ssh to the server fine. The same server also serves my NFS home
    > > > directory to the box I'm writing this from and I've not seen any trouble
    > > > with this box at all, it's a 2.6.18-xen box.

    > >
    > > OK... Are you able to reproduce the problem reliably?
    > >
    > > If so, can you provide me with a binary tcpdump or wireshark dump? If
    > > using tcpdump, then please use something like
    > >
    > > tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049
    > >
    > > Please also try to provide a netstat dump of the current TCP connections
    > > as soon as the hang occurs:
    > >
    > > netstat -t

    >
    > Aug 24 18:08:59 iranon kernel: [168839.556017] nfs: server hopkins not responding, still trying
    > but I wasn't around until 19:38 to spot it.
    >
    > netstat when I got to it was:
    >
    > Proto Recv-Q Send-Q Local Address Foreign Address State
    > tcp 0 0 localhost.localdo:50891 localhost.localdom:6543 ESTABLISHED
    > tcp 1 0 iranon.hellion.org.:ssh azathoth.hellion.:52682 CLOSE_WAIT
    > tcp 0 0 localhost.localdom:6543 localhost.localdo:50893 ESTABLISHED
    > tcp 0 0 iranon.hellion.org.:837 hopkins.hellion.org:nfs FIN_WAIT2
    > tcp 0 0 localhost.localdom:6543 localhost.localdo:41831 ESTABLISHED
    > tcp 0 0 localhost.localdo:13666 localhost.localdo:59482 ESTABLISHED
    > tcp 0 0 localhost.localdo:34288 localhost.localdom:6545 ESTABLISHED
    > tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:48977 ESTABLISHED
    > tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:52683 ESTABLISHED
    > tcp 0 0 localhost.localdom:6545 localhost.localdo:34288 ESTABLISHED
    > tcp 0 0 localhost.localdom:6543 localhost.localdo:50891 ESTABLISHED
    > tcp 0 0 localhost.localdo:50893 localhost.localdom:6543 ESTABLISHED
    > tcp 0 0 localhost.localdo:41831 localhost.localdom:6543 ESTABLISHED
    > tcp 0 87 localhost.localdo:59482 localhost.localdo:13666 ESTABLISHED
    > tcp 1 0 localhost.localdom:6543 localhost.localdo:41830 CLOSE_WAIT
    >
    > (iranon is the problematic host .4, azathoth is my desktop machine .5, hopkins is the NFS server .6)
    >
    > tcpdumps are pretty big. I've attached the last 100 packets captured. If
    > you need more I can put the full file up somewhere.
    >
    > -rw-r--r-- 1 root root 1.3G Aug 24 17:57 dump.out0
    > -rw-r--r-- 1 root root 536M Aug 24 19:38 dump.out1
    >
    > Ian.


    >From the tcpdump, it looks as if the NFS server is failing to close the

    socket, when the client closes its side. You therefore end up getting
    stuck in the FIN_WAIT2 state (as netstat clearly shows above).

    Is the server keeping the client in this state for a very long period?

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: NFS regression? Odd delays and lockups accessing an NFS export.

    On Sun, 2008-08-24 at 15:17 -0400, Trond Myklebust wrote:
    > >From the tcpdump, it looks as if the NFS server is failing to close the

    > socket, when the client closes its side. You therefore end up getting
    > stuck in the FIN_WAIT2 state (as netstat clearly shows above).


    BTW: the RPC client is closing the socket because it detected no NFS
    activity for 5 minutes. Did you expect any NFS activity during this
    time?

    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast