Cluster Aliases goes away - VMS

This is a discussion on Cluster Aliases goes away - VMS ; Had an interesting problem pop up today. I restarted the Multinet server and the cluster aliases stopped working. Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz, OpenVMS AXP V7.3-2 All the patches up to around 31-MAR, the last ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: Cluster Aliases goes away

  1. Cluster Aliases goes away

    Had an interesting problem pop up today. I restarted the Multinet
    server and the cluster aliases stopped working.

    Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    OpenVMS AXP V7.3-2

    All the patches up to around 31-MAR, the last reboot.

    The OPCOM error message I received when trying to do a restart was:

    %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    Message from user MARTY on BLUE
    MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
    status 49

    Error 49? An odd error number? I tried to set the terminal width to
    132, thinking a digit or two got chomped. Nope.

    Eventually I rebooted (GASP) and it works fine now.

    So what is an error 49?

  2. Re: Cluster Aliases goes away


    look at errno.h (I don't recall where the bloody thing
    is stored anymore)

    EADDRNOTAVAIL - can't assign requested address.

    Patrick

    Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    > Had an interesting problem pop up today. I restarted the Multinet
    > server and the cluster aliases stopped working.
    >
    > Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    > OpenVMS AXP V7.3-2
    >
    > All the patches up to around 31-MAR, the last reboot.
    >
    > The OPCOM error message I received when trying to do a restart was:
    >
    > %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    > Message from user MARTY on BLUE
    > MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
    > status 49
    >
    > Error 49? An odd error number? I tried to set the terminal width to
    > 132, thinking a digit or two got chomped. Nope.
    >
    > Eventually I rebooted (GASP) and it works fine now.
    >
    > So what is an error 49?
    >
    >


  3. Re: Cluster Aliases goes away

    Patrick Mahan wrote:
    >
    > look at errno.h (I don't recall where the bloody thing
    > is stored anymore)
    >
    > EADDRNOTAVAIL - can't assign requested address.
    >
    > Patrick
    >
    > Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    >> Had an interesting problem pop up today. I restarted the Multinet
    >> server and the cluster aliases stopped working.
    >>
    >> Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    >> OpenVMS AXP V7.3-2
    >>
    >> All the patches up to around 31-MAR, the last reboot.
    >>
    >> The OPCOM error message I received when trying to do a restart was:
    >>
    >> %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    >> Message from user MARTY on BLUE
    >> MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
    >> status 49
    >>
    >> Error 49? An odd error number? I tried to set the terminal width to
    >> 132, thinking a digit or two got chomped. Nope.
    >>
    >> Eventually I rebooted (GASP) and it works fine now.
    >>
    >> So what is an error 49?
    >>
    >>


    OK, but what does _that_ mean? All of a sudden .238 stopped responding
    which made the name servers, web servers, time servers, and anyone else
    using that alias, stop responding. Anything that was looking to .238 to
    respond started chirping "host down".

    Can't assign address, why?

    And just an FYI, I restarted the Multinet server to roll the smtp.log
    file (to do some smtp debugging). Seems you have to rename the log file
    and then restart the main MU server to get it to stop writing the old
    log file.

  4. Re: Cluster Aliases goes away



    Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
    > Patrick Mahan wrote:
    >>
    >> look at errno.h (I don't recall where the bloody thing
    >> is stored anymore)
    >>
    >> EADDRNOTAVAIL - can't assign requested address.
    >>
    >> Patrick
    >>
    >> Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    >>> Had an interesting problem pop up today. I restarted the Multinet
    >>> server and the cluster aliases stopped working.
    >>>
    >>> Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    >>> OpenVMS AXP V7.3-2
    >>>
    >>> All the patches up to around 31-MAR, the last reboot.
    >>>
    >>> The OPCOM error message I received when trying to do a restart was:
    >>>
    >>> %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    >>> Message from user MARTY on BLUE
    >>> MultiNet Server: Cluster Alias: Failed to register alias
    >>> 172.17.17.238 status 49
    >>>
    >>> Error 49? An odd error number? I tried to set the terminal width to
    >>> 132, thinking a digit or two got chomped. Nope.
    >>>
    >>> Eventually I rebooted (GASP) and it works fine now.
    >>>
    >>> So what is an error 49?
    >>>
    >>>

    >
    > OK, but what does _that_ mean? All of a sudden .238 stopped responding
    > which made the name servers, web servers, time servers, and anyone else
    > using that alias, stop responding. Anything that was looking to .238 to
    > respond started chirping "host down".
    >
    > Can't assign address, why?
    >
    > And just an FYI, I restarted the Multinet server to roll the smtp.log
    > file (to do some smtp debugging). Seems you have to rename the log file
    > and then restart the main MU server to get it to stop writing the old
    > log file.
    >
    >


    It could be for many reasons: 1) failed to get the cluster lock setup, the
    address is invalid for your configuration (not sure what your subnet mask,
    node IP, etc are set too), 2) This address might already be in use on
    another node? 3) Arp cache was corrupted causing the IP address to fail validation?

    I have not played on a MulitNet system for close to 10 years now, so I apologize
    for not being more helpful. (but I did help develop and maintain it for close to
    8 years)

    Good luck,

    Patrick Mahan
    nee Window Washer

  5. RE: Cluster Aliases goes away

    49 is EADDRNOTAVAIL, which means that it could not assign the requested
    address. MultiNet 5.n uses the routing tables to determine which
    interface should get the cluster alias assigned to it; if it can not
    find an interface that has the proper routing for the desired address,
    then it will not assign it to an interface.

    -----Original Message-----
    From: Patrick Mahan [mailto:mahan@mahan.org]
    Sent: Friday, August 08, 2008 4:44 PM
    To: info-multinet@process.com
    Subject: Re: Cluster Aliases goes away



    Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
    > Patrick Mahan wrote:
    >>
    >> look at errno.h (I don't recall where the bloody thing is stored
    >> anymore)
    >>
    >> EADDRNOTAVAIL - can't assign requested address.
    >>
    >> Patrick
    >>
    >> Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    >>> Had an interesting problem pop up today. I restarted the Multinet
    >>> server and the cluster aliases stopped working.
    >>>
    >>> Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    >>> OpenVMS AXP V7.3-2
    >>>
    >>> All the patches up to around 31-MAR, the last reboot.
    >>>
    >>> The OPCOM error message I received when trying to do a restart was:
    >>>
    >>> %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    >>> Message from user MARTY on BLUE
    >>> MultiNet Server: Cluster Alias: Failed to register alias
    >>> 172.17.17.238 status 49
    >>>
    >>> Error 49? An odd error number? I tried to set the terminal width
    >>> to 132, thinking a digit or two got chomped. Nope.
    >>>
    >>> Eventually I rebooted (GASP) and it works fine now.
    >>>
    >>> So what is an error 49?
    >>>
    >>>

    >
    > OK, but what does _that_ mean? All of a sudden .238 stopped
    > responding which made the name servers, web servers, time servers, and


    > anyone else using that alias, stop responding. Anything that was
    > looking to .238 to respond started chirping "host down".
    >
    > Can't assign address, why?
    >
    > And just an FYI, I restarted the Multinet server to roll the smtp.log
    > file (to do some smtp debugging). Seems you have to rename the log
    > file and then restart the main MU server to get it to stop writing the


    > old log file.
    >
    >


    It could be for many reasons: 1) failed to get the cluster lock setup,
    the address is invalid for your configuration (not sure what your subnet
    mask, node IP, etc are set too), 2) This address might already be in use
    on another node? 3) Arp cache was corrupted causing the IP address to
    fail validation?

    I have not played on a MulitNet system for close to 10 years now, so I
    apologize for not being more helpful. (but I did help develop and
    maintain it for close to
    8 years)

    Good luck,

    Patrick Mahan
    nee Window Washer

  6. Re: Cluster Aliases goes away

    Richard Whalen wrote:
    > 49 is EADDRNOTAVAIL, which means that it could not assign the requested
    > address. MultiNet 5.n uses the routing tables to determine which
    > interface should get the cluster alias assigned to it; if it can not
    > find an interface that has the proper routing for the desired address,
    > then it will not assign it to an interface.


    Why would you think restarting the MU server to roll the SMTP logfile on
    a single node (for now) cluster cause this failure? How can it be
    prevented in the future?

    >
    > -----Original Message-----
    > From: Patrick Mahan [mailto:mahan@mahan.org]
    > Sent: Friday, August 08, 2008 4:44 PM
    > To: info-multinet@process.com
    > Subject: Re: Cluster Aliases goes away
    >
    >
    >
    > Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
    >> Patrick Mahan wrote:
    >>> look at errno.h (I don't recall where the bloody thing is stored
    >>> anymore)
    >>>
    >>> EADDRNOTAVAIL - can't assign requested address.
    >>>
    >>> Patrick
    >>>
    >>> Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    >>>> Had an interesting problem pop up today. I restarted the Multinet
    >>>> server and the cluster aliases stopped working.
    >>>>
    >>>> Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    >>>> OpenVMS AXP V7.3-2
    >>>>
    >>>> All the patches up to around 31-MAR, the last reboot.
    >>>>
    >>>> The OPCOM error message I received when trying to do a restart was:
    >>>>
    >>>> %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    >>>> Message from user MARTY on BLUE
    >>>> MultiNet Server: Cluster Alias: Failed to register alias
    >>>> 172.17.17.238 status 49
    >>>>
    >>>> Error 49? An odd error number? I tried to set the terminal width
    >>>> to 132, thinking a digit or two got chomped. Nope.
    >>>>
    >>>> Eventually I rebooted (GASP) and it works fine now.
    >>>>
    >>>> So what is an error 49?
    >>>>
    >>>>

    >> OK, but what does _that_ mean? All of a sudden .238 stopped
    >> responding which made the name servers, web servers, time servers, and

    >
    >> anyone else using that alias, stop responding. Anything that was
    >> looking to .238 to respond started chirping "host down".
    >>
    >> Can't assign address, why?
    >>
    >> And just an FYI, I restarted the Multinet server to roll the smtp.log
    >> file (to do some smtp debugging). Seems you have to rename the log
    >> file and then restart the main MU server to get it to stop writing the

    >
    >> old log file.
    >>
    >>

    >
    > It could be for many reasons: 1) failed to get the cluster lock setup,
    > the address is invalid for your configuration (not sure what your subnet
    > mask, node IP, etc are set too), 2) This address might already be in use
    > on another node? 3) Arp cache was corrupted causing the IP address to
    > fail validation?
    >
    > I have not played on a MulitNet system for close to 10 years now, so I
    > apologize for not being more helpful. (but I did help develop and
    > maintain it for close to
    > 8 years)
    >
    > Good luck,
    >
    > Patrick Mahan
    > nee Window Washer


  7. RE: Cluster Aliases goes away

    The MultiNet master server maintains the lock that says which system in
    the cluster is currently offering the alias address. The command file
    that is used to restart the master server releases ownership of the
    alias and lock, which allows another system to obtain it. When the
    master server has restarted it should be queued for ownership of the
    lock. It is necessary to tell the master server of the system that
    currently maintains the cluster alias to release it so that the cluster
    alias can roll over to another system.

    Was there another cluster member that could have possibly taken over the
    alias? The cluster alias was originally designed for UDP traffic.
    Changes were made such that most TCP traffic can now work on it as
    customers insisted on using it for TCP traffic even when told that it
    wasn't designed for it and that either DNS load balancing or round-robin
    DNS would be a better choice. BIND 9 has caused a number of
    difficulties for customers that are using DNS load balancing and work is
    being done to address these issues.


    -----Original Message-----
    From: Marty Kuhrt [mailto:marty@spamloop.kuhrt.net]
    Sent: Tuesday, August 19, 2008 6:38 PM
    To: info-multinet@process.com
    Subject: Re: Cluster Aliases goes away

    Richard Whalen wrote:
    > 49 is EADDRNOTAVAIL, which means that it could not assign the

    requested
    > address. MultiNet 5.n uses the routing tables to determine which
    > interface should get the cluster alias assigned to it; if it can not
    > find an interface that has the proper routing for the desired address,
    > then it will not assign it to an interface.


    Why would you think restarting the MU server to roll the SMTP logfile on

    a single node (for now) cluster cause this failure? How can it be
    prevented in the future?

    >
    > -----Original Message-----
    > From: Patrick Mahan [mailto:mahan@mahan.org]
    > Sent: Friday, August 08, 2008 4:44 PM
    > To: info-multinet@process.com
    > Subject: Re: Cluster Aliases goes away
    >
    >
    >
    > Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
    >> Patrick Mahan wrote:
    >>> look at errno.h (I don't recall where the bloody thing is stored
    >>> anymore)
    >>>
    >>> EADDRNOTAVAIL - can't assign requested address.
    >>>
    >>> Patrick
    >>>
    >>> Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
    >>>> Had an interesting problem pop up today. I restarted the Multinet
    >>>> server and the cluster aliases stopped working.
    >>>>
    >>>> Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    >>>> OpenVMS AXP V7.3-2
    >>>>
    >>>> All the patches up to around 31-MAR, the last reboot.
    >>>>
    >>>> The OPCOM error message I received when trying to do a restart was:
    >>>>
    >>>> %%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
    >>>> Message from user MARTY on BLUE
    >>>> MultiNet Server: Cluster Alias: Failed to register alias
    >>>> 172.17.17.238 status 49
    >>>>
    >>>> Error 49? An odd error number? I tried to set the terminal width
    >>>> to 132, thinking a digit or two got chomped. Nope.
    >>>>
    >>>> Eventually I rebooted (GASP) and it works fine now.
    >>>>
    >>>> So what is an error 49?
    >>>>
    >>>>

    >> OK, but what does _that_ mean? All of a sudden .238 stopped
    >> responding which made the name servers, web servers, time servers,

    and
    >
    >> anyone else using that alias, stop responding. Anything that was
    >> looking to .238 to respond started chirping "host down".
    >>
    >> Can't assign address, why?
    >>
    >> And just an FYI, I restarted the Multinet server to roll the smtp.log


    >> file (to do some smtp debugging). Seems you have to rename the log
    >> file and then restart the main MU server to get it to stop writing

    the
    >
    >> old log file.
    >>
    >>

    >
    > It could be for many reasons: 1) failed to get the cluster lock setup,
    > the address is invalid for your configuration (not sure what your

    subnet
    > mask, node IP, etc are set too), 2) This address might already be in

    use
    > on another node? 3) Arp cache was corrupted causing the IP address to
    > fail validation?
    >
    > I have not played on a MulitNet system for close to 10 years now, so I
    > apologize for not being more helpful. (but I did help develop and
    > maintain it for close to
    > 8 years)
    >
    > Good luck,
    >
    > Patrick Mahan
    > nee Window Washer


  8. Re: Cluster Aliases goes away

    Richard Whalen wrote:
    > The MultiNet master server maintains the lock that says which system in
    > the cluster is currently offering the alias address. The command file
    > that is used to restart the master server releases ownership of the
    > alias and lock, which allows another system to obtain it. When the
    > master server has restarted it should be queued for ownership of the
    > lock. It is necessary to tell the master server of the system that
    > currently maintains the cluster alias to release it so that the cluster
    > alias can roll over to another system.
    >
    > Was there another cluster member that could have possibly taken over the
    > alias?


    It was a single node cluster.

    > The cluster alias was originally designed for UDP traffic.
    > Changes were made such that most TCP traffic can now work on it as
    > customers insisted on using it for TCP traffic even when told that it
    > wasn't designed for it and that either DNS load balancing or round-robin
    > DNS would be a better choice. BIND 9 has caused a number of
    > difficulties for customers that are using DNS load balancing and work is
    > being done to address these issues.


    Are there any handy instructions on how to get MU DNS configured to do
    either?

    Thanks,
    Marty

  9. RE: Cluster Aliases goes away

    If you are running MASTER_SERVER-020_A052 or MASTER_SERVER-050_A051 (or
    later), then the cluster alias shutdown should work correctly in a
    single node cluster. Earlier patches will probably have a problem.

    DNS load balancing and round-robin DNS don't create an additional
    address, so you would have to create a PD interface for the address.

    http://www.process.com/tcpip/mndocs5...h10.htm#E55E67
    describes DNS load balancing.

    The name server will automatically do round-robin when you have multiple
    A records for a name, the O'Reilly DNS and BIND book gives the following
    example:
    Foo.bar.biz 60 IN A 192.1.1.1
    Foo.bar.biz 60 IN A 192.1.1.2
    Foo.bar.biz 60 IN A 192.1.1.3

    -----Original Message-----
    From: Marty Kuhrt [mailto:marty@spamloop.kuhrt.net]
    Sent: Wednesday, August 20, 2008 2:29 PM
    To: info-multinet@process.com
    Subject: Re: Cluster Aliases goes away

    Richard Whalen wrote:
    > The MultiNet master server maintains the lock that says which system

    in
    > the cluster is currently offering the alias address. The command file
    > that is used to restart the master server releases ownership of the
    > alias and lock, which allows another system to obtain it. When the
    > master server has restarted it should be queued for ownership of the
    > lock. It is necessary to tell the master server of the system that
    > currently maintains the cluster alias to release it so that the

    cluster
    > alias can roll over to another system.
    >
    > Was there another cluster member that could have possibly taken over

    the
    > alias?


    It was a single node cluster.

    > The cluster alias was originally designed for UDP traffic.
    > Changes were made such that most TCP traffic can now work on it as
    > customers insisted on using it for TCP traffic even when told that it
    > wasn't designed for it and that either DNS load balancing or

    round-robin
    > DNS would be a better choice. BIND 9 has caused a number of
    > difficulties for customers that are using DNS load balancing and work

    is
    > being done to address these issues.


    Are there any handy instructions on how to get MU DNS configured to do
    either?

    Thanks,
    Marty

  10. Re: Cluster Aliases goes away

    Richard Whalen wrote:
    > If you are running MASTER_SERVER-020_A052 or MASTER_SERVER-050_A051 (or
    > later), then the cluster alias shutdown should work correctly in a
    > single node cluster. Earlier patches will probably have a problem.
    >


    I'm running...

    Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
    OpenVMS AXP V7.3-2

    with the MASTER_SERVER-040_A052 patch (among others) so I should be OK,
    there.

    > DNS load balancing and round-robin DNS don't create an additional
    > address, so you would have to create a PD interface for the address.
    >
    > http://www.process.com/tcpip/mndocs5...h10.htm#E55E67
    > describes DNS load balancing.
    >
    > The name server will automatically do round-robin when you have multiple
    > A records for a name, the O'Reilly DNS and BIND book gives the following
    > example:
    > Foo.bar.biz 60 IN A 192.1.1.1
    > Foo.bar.biz 60 IN A 192.1.1.2
    > Foo.bar.biz 60 IN A 192.1.1.3
    >


    Thanks for the info, I'll look into it.

    I was previously under the impression that stateless TCP stuff could use
    the cluster alias address, while stateful stuff could not. I have my
    outside world MX record pointing to a NAT address that translates to the
    real machine that does mail. About the only other stuff this "cluster"
    does, accessible to the outside world, is HTTP. I have that outside
    world record pointing to a NAT address that translates to the cluster
    alias. That way, I thought, if I ran multiple machines with a shared
    Apache configuration, "the magic" would route it to whoever was available.

    Looks like I may need a re-think.

    Thanks, again.

    Any helpful hints and doc pointers appreciated.

    Marty

    > -----Original Message-----
    > From: Marty Kuhrt [mailto:marty@spamloop.kuhrt.net]
    > Sent: Wednesday, August 20, 2008 2:29 PM
    > To: info-multinet@process.com
    > Subject: Re: Cluster Aliases goes away
    >
    > Richard Whalen wrote:
    >> The MultiNet master server maintains the lock that says which system

    > in
    >> the cluster is currently offering the alias address. The command file
    >> that is used to restart the master server releases ownership of the
    >> alias and lock, which allows another system to obtain it. When the
    >> master server has restarted it should be queued for ownership of the
    >> lock. It is necessary to tell the master server of the system that
    >> currently maintains the cluster alias to release it so that the

    > cluster
    >> alias can roll over to another system.
    >>
    >> Was there another cluster member that could have possibly taken over

    > the
    >> alias?

    >
    > It was a single node cluster.
    >
    >> The cluster alias was originally designed for UDP traffic.
    >> Changes were made such that most TCP traffic can now work on it as
    >> customers insisted on using it for TCP traffic even when told that it
    >> wasn't designed for it and that either DNS load balancing or

    > round-robin
    >> DNS would be a better choice. BIND 9 has caused a number of
    >> difficulties for customers that are using DNS load balancing and work

    > is
    >> being done to address these issues.

    >
    > Are there any handy instructions on how to get MU DNS configured to do
    > either?
    >
    > Thanks,
    > Marty


  11. RE: Cluster Aliases goes away

    From the point of the network TCP is always stateful, though some TCP
    connections last longer than others.

    The current (for over 10 years) implementation of http keeps the
    connection to an ip address open after transferring the html file
    because the html file often references other files that need to be
    transferred from the same node. (I do not know what the criteria are
    for closing the connection.)

    I'm guessing, but I would say that the following scenario might have
    caused the problem.
    A connection was still open when the attempt to delete the address was
    made. This prevented the address from actually being deleted, but the
    lock was still released and the master server restarted. When the
    master server restarted it found that it could obtain the lock, so it
    tried to add the address to the list of those on the appropriate
    interface, but it found the address was already present and hence the
    unavailable error was returned.

+ Reply to Thread