Stale NFS mounts and UDP vs. TCP - NFS

This is a discussion on Stale NFS mounts and UDP vs. TCP - NFS ; Novell Linux: Linux version 2.6.5-7.201-bigsmp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #2 SMP Thu May 18 16:06:47 MDT 2006 Hello! I have a problem that I hope you can help me understand. We have a computer environment that consists of ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Stale NFS mounts and UDP vs. TCP

  1. Stale NFS mounts and UDP vs. TCP

    Novell Linux: Linux version 2.6.5-7.201-bigsmp (geeko@buildhost) (gcc
    version 3.3.3 (SuSE Linux)) #2 SMP Thu May 18 16:06:47 MDT 2006



    Hello!

    I have a problem that I hope you can help me understand.

    We have a computer environment that consists of using NIS and AMD. We were
    having a long standing issue of stale NFS mounts, and as it turns out, we
    had the following parameter set in our amd.conf file: nfs_proto = udp. We
    were thinking that if we switch from UDP to TCP, this would help get rid of
    the stale NFS mounts. The problem we are having is when the Linux machine is
    rebooted, it fails to automount a critical machine (boise1). After the
    reboot, in order to mount the machine boise1, we have to run the following
    command: amq -uf /net/boise1. After running amq, life is good. Here's what I
    observed. Is there a reason why boise1 is not being properly automounted
    after reboot?

    Any help with this would be greatly appreciated!

    From dmesg:

    RPC: Can't bind to reserved port (98).

    RPC: can't bind to reserved port.

    RPC: error 5 connecting to server boise1

    nfs_get_root: getattr error = 5

    nfs_read_super: get root inode failed



    After the amq -uf /net/boise1 command, I can run mount | grep boise1, and I
    get this:

    boise1:/ on /.automount/net/boise1 type nfs (nounmount,vers=3,proto=tcp)
    boise1:/local on /.automount/net/boise1/local type nfs
    (nounmount,vers=3,proto=tcp)

    Then I am able to cd into /net/boise1.

    Here's the output from the rpcinfo:

    program vers proto port

    100000 2 tcp 111 portmapper

    100000 2 udp 111 portmapper

    100007 2 udp 894 ypbind

    100007 1 udp 894 ypbind

    100007 2 tcp 897 ypbind

    100007 1 tcp 897 ypbind

    300019 1 tcp 859 amd

    300019 1 udp 860 amd

    100024 1 udp 32770 status

    100021 1 udp 32770 nlockmgr

    100021 3 udp 32770 nlockmgr

    100021 4 udp 32770 nlockmgr

    100024 1 tcp 32768 status

    100021 1 tcp 32768 nlockmgr

    100021 3 tcp 32768 nlockmgr

    100021 4 tcp 32768 nlockmgr

    100003 2 udp 2049 nfs

    100003 3 udp 2049 nfs

    100227 3 udp 2049 nfs_acl

    100003 2 tcp 2049 nfs

    100003 3 tcp 2049 nfs

    100227 3 tcp 2049 nfs_acl

    100005 1 udp 991 mountd

    100005 1 tcp 1006 mountd

    100005 2 udp 991 mountd

    100005 2 tcp 1006 mountd

    100005 3 udp 991 mountd

    100005 3 tcp 1006 mountd





    --

    Rick King



  2. Re: Stale NFS mounts and UDP vs. TCP


    In article <44d76821$1@usenet01.boi.hp.com>, "Rick King" writes:
    |> Novell Linux: Linux version 2.6.5-7.201-bigsmp (geeko@buildhost) (gcc
    |> version 3.3.3 (SuSE Linux)) #2 SMP Thu May 18 16:06:47 MDT 2006

    Both client and server?

    |> We have a computer environment that consists of using NIS and AMD. We were
    |> having a long standing issue of stale NFS mounts, and as it turns out, we
    |> had the following parameter set in our amd.conf file: nfs_proto = udp. We
    |> were thinking that if we switch from UDP to TCP, this would help get rid of
    |> the stale NFS mounts. The problem we are having is when the Linux machine is
    |> rebooted, it fails to automount a critical machine (boise1). After the
    |> reboot, in order to mount the machine boise1, we have to run the following
    |> command: amq -uf /net/boise1. After running amq, life is good. Here's what I
    |> observed. Is there a reason why boise1 is not being properly automounted
    |> after reboot?

    Those aren't the problems I am familiar with, but there are several others
    that have similar effects. I doubt that using TCP will help. None of
    the problems I hit ever got serious enough to force me into learning the
    details of NFS and tracking down the causes. I did, however, discover that
    real NFS experts are like hen's teeth - exactly like real TCP/IP ones :-(

    This is almost certainly part of the reason that all NFS implementations
    are unreliable and inefficient, and the Linux server is not one of the
    best ones. Its client is OK, nowadays, but wasn't 5 years back and still
    has its problems.


    Regards,
    Nick Maclaren.

  3. Re: Stale NFS mounts and UDP vs. TCP

    Rick King wrote:
    > Novell Linux: Linux version 2.6.5-7.201-bigsmp (geeko@buildhost) (gcc
    > version 3.3.3 (SuSE Linux)) #2 SMP Thu May 18 16:06:47 MDT 2006
    >
    >
    >
    > Hello!
    >
    > I have a problem that I hope you can help me understand.
    >
    > We have a computer environment that consists of using NIS and AMD. We were
    > having a long standing issue of stale NFS mounts, and as it turns out, we
    > had the following parameter set in our amd.conf file: nfs_proto = udp. We
    > were thinking that if we switch from UDP to TCP, this would help get rid of
    > the stale NFS mounts. The problem we are having is when the Linux machine is
    > rebooted, it fails to automount a critical machine (boise1). After the
    > reboot, in order to mount the machine boise1, we have to run the following
    > command: amq -uf /net/boise1. After running amq, life is good. Here's what I
    > observed. Is there a reason why boise1 is not being properly automounted
    > after reboot?
    >
    > Any help with this would be greatly appreciated!
    >
    > From dmesg:
    >
    > RPC: Can't bind to reserved port (98).
    >
    > RPC: can't bind to reserved port.
    >
    > RPC: error 5 connecting to server boise1
    >
    > nfs_get_root: getattr error = 5
    >
    > nfs_read_super: get root inode failed
    >
    >
    >
    > After the amq -uf /net/boise1 command, I can run mount | grep boise1, and I
    > get this:
    >
    > boise1:/ on /.automount/net/boise1 type nfs (nounmount,vers=3,proto=tcp)
    > boise1:/local on /.automount/net/boise1/local type nfs
    > (nounmount,vers=3,proto=tcp)
    >
    > Then I am able to cd into /net/boise1.
    >
    > Here's the output from the rpcinfo:
    >
    > program vers proto port
    >
    > 100000 2 tcp 111 portmapper
    >
    > 100000 2 udp 111 portmapper
    >
    > 100007 2 udp 894 ypbind
    >
    > 100007 1 udp 894 ypbind
    >
    > 100007 2 tcp 897 ypbind
    >
    > 100007 1 tcp 897 ypbind
    >
    > 300019 1 tcp 859 amd
    >
    > 300019 1 udp 860 amd
    >
    > 100024 1 udp 32770 status
    >
    > 100021 1 udp 32770 nlockmgr
    >
    > 100021 3 udp 32770 nlockmgr
    >
    > 100021 4 udp 32770 nlockmgr
    >
    > 100024 1 tcp 32768 status
    >
    > 100021 1 tcp 32768 nlockmgr
    >
    > 100021 3 tcp 32768 nlockmgr
    >
    > 100021 4 tcp 32768 nlockmgr
    >
    > 100003 2 udp 2049 nfs
    >
    > 100003 3 udp 2049 nfs
    >
    > 100227 3 udp 2049 nfs_acl
    >
    > 100003 2 tcp 2049 nfs
    >
    > 100003 3 tcp 2049 nfs
    >
    > 100227 3 tcp 2049 nfs_acl
    >
    > 100005 1 udp 991 mountd
    >
    > 100005 1 tcp 1006 mountd
    >
    > 100005 2 udp 991 mountd
    >
    > 100005 2 tcp 1006 mountd
    >
    > 100005 3 udp 991 mountd
    >
    > 100005 3 tcp 1006 mountd
    >
    >
    >
    >
    >
    > --
    >
    > Rick King



    At first, my suspicion is that your client couldn't reach boise1
    immediately after reboot. I've seen this before, due to spanning tree
    computation and so on. But the error about nfs_read_super seems to
    imply that the RPC MOUNT call has succeeded, which invalidates my
    theory.

    The best way to find out the problem is to take a packet trace on
    boise1 when your client is rebooting. If boise1 is a Solaris (as it
    seems), use `snoop'. From the trace file, filter only the traffic from
    the client. It's usually easy to spot any protocol level errors.

    Btw, TCP probably won't solve your problem. But you should use TCP
    instead of UDP anyways.

    Cheers,
    bc


+ Reply to Thread