What to do about DNS lookups when a site fails and there is afailover site - TCP-IP

This is a discussion on What to do about DNS lookups when a site fails and there is afailover site - TCP-IP ; Suppose I have many servers at a data centre and many clients connecting to those servers via FQDNs. If the entire site goes down then DNS will dish out IP addresses for those servers that no longer work. I have ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: What to do about DNS lookups when a site fails and there is afailover site

  1. What to do about DNS lookups when a site fails and there is afailover site

    Suppose I have many servers at a data centre and many clients
    connecting to those servers via FQDNs. If the entire site goes down
    then DNS will dish out IP addresses for those servers that no longer
    work. I have another site with machines available to spin up and act
    as alternative servers. These alternative servers are not running by
    default, i.e the failover is not a hot standby, it is a cold standby.
    How can I make it so that clients get directed to the servers at the
    failover site? It has been suggested that new DNS records could be
    dynamically added. I am really not sure about this and would welcome
    any guidance I can get. Is this even the right NG to discuss such
    things?

    For my particular problem the failover has to be done at this sort of
    level - it cannot be done by altering source code to make the clients
    and servers aware of a failover site. We are talking about large sites
    and large bodies of software. This is what makes me think some sort of
    routing approach is needed.

    Regards,

    Andrew Marlow

  2. Re: What to do about DNS lookups when a site fails and there is a failover site

    wrote in message
    news:f4b9c14f-aee4-489d-849e-d2adc6e098f4@a1g2000hsb.googlegroups.com...
    > Suppose I have many servers at a data centre and many clients
    > connecting to those servers via FQDNs. If the entire site goes down
    > then DNS will dish out IP addresses for those servers that no longer
    > work. I have another site with machines available to spin up and act
    > as alternative servers. These alternative servers are not running by
    > default, i.e the failover is not a hot standby, it is a cold standby.
    > How can I make it so that clients get directed to the servers at the
    > failover site?


    classic way is to balance at the DNS level, but use DNS that can test the
    target machines.

    i havent seen that used in anger for a while, (but dont get invovled in this
    much now) so not sure that is as useful mpw/

    you should be able to point the DNS at a traffic load balancer, and let that
    handle redirection.

    the more clever load balancers will test the target machine set (or you run
    a client there) and only send load to those servers that show they are up,
    running and can take some load.

    finally you can spread a pair of resilient load balancers across your sites
    as well - but depending on the kit you might need some dediacted comms links
    between them.

    It has been suggested that new DNS records could be
    > dynamically added. I am really not sure about this and would welcome
    > any guidance I can get. Is this even the right NG to discuss such
    > things?


    no idea - but on usenet you get suggestions for giving out interesting
    problems.....
    >
    > For my particular problem the failover has to be done at this sort of
    > level - it cannot be done by altering source code to make the clients
    > and servers aware of a failover site. We are talking about large sites
    > and large bodies of software. This is what makes me think some sort of
    > routing approach is needed.
    >
    > Regards,
    >
    > Andrew Marlow

    --
    Regards

    stephen_hope@xyzworld.com - replace xyz with ntl



  3. Re: What to do about DNS lookups when a site fails and there is a failover site

    On 2008-04-08 06:27:31 -0400, marlow.andrew@googlemail.com said:

    > Suppose I have many servers at a data centre and many clients
    > connecting to those servers via FQDNs. If the entire site goes down
    > then DNS will dish out IP addresses for those servers that no longer


    Classic data-center resiliency problem, which fortunately there are
    commerical solutions for (but, they are expensive). The
    "chain-of-information" that one winds up needing to protect / engineer
    to keep a server "up" to the rest of the world kind of goes like this:

    - The IP address the FQDN resolves to.
    - The SOA / originator / DNS server that "owns" the FQDN.

    Load-balancing will protect you at a single data center site, but not
    against site failures, and architecting globally resilient IP addresses
    has been a difficult task for many firms (from personal consultative
    experience) and winds up not being worth the effort sometimes.

    Round-robin DNS often comes up as a solution, but then every other
    client will experience a failure, etc., etc.

    Some folks choose to rely on a fail-over time linked to how fast they
    can make a DNS entry change, but then you're relying on the cached DNS
    entry from expiring on everyone's DNS server globally, no guarantee.
    Many sites set a TTL on their DNS entries of around 15 minutes, but
    this breaks the RFC standard, and for operations I run, I filter DNS
    entries and re-write the TTL on them if they are under 3600 seconds
    <1h> to stop some forms of DNS / host hacking.

    If you're trying to protect a corporate service against failure, say
    for foocorporation.com, like vpn, you could setup:

    vpn.foo-corporation.com (round-robin DNS)
    vpn-a.foo-corporation.com (real IP, data center 1)
    vpn-b.foo-corporation.com (real IP, data center 2)

    And just have the VPN software scroll through all of those to find a
    connection or tell users to us the "-a" or "-b" forms of the hostname.

    You might want to ask yourself what kind of data you're trying to keep
    available and how users use it, and "scroll down" a bit below the IP
    layer, as it's not the only place you can find resiliency. If we're
    talking about web data, global solutions like Akamai and others can
    give you location-sensitive content delivery, but this requires the
    co-operation of your web developers.

    When I've done data center designs before, I "scroll up and down" a bit
    to think about what I need to protect how. All too often, the network
    is seen as the right answer to a resiliency problem, often, it's not. I
    would really reject the notion that the software "can't be changed".
    Resiliency to a good level is built in to all parts of the system, not
    just one. The network can only do so much.

    /dmfh

    --

    __| |_ __ / _| |_ 01100100 01101101
    / _` | ' \| _| ' \ 01100110 01101000
    \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx


  4. Re: What to do about DNS lookups when a site fails and there is afailover site

    On 8 Apr, 20:52, "stephen" wrote:
    > wrote in message


    > > Suppose I have many servers at a data centre and many clients
    > > connecting to those servers via FQDNs. If the entire site goes down
    > > then DNS will dish out IP addresses for those servers that no longer
    > > work. I have another site with machines available to spin up and act
    > > as alternative servers. These alternative servers are not running by
    > > default, i.e the failover is not a hot standby, it is a cold standby.
    > > How can I make it so that clients get directed to the servers at the
    > > failover site?

    >
    > classic way is to balance at the DNS level, but use DNS that can test the
    > target machines.


    > you should be able to point the DNS at a traffic load balancer, and let that
    > handle redirection.


    Many thanks for trying to help but unfortunately this will not work.
    The alternative servers are not running by default, i.e the failover
    is not a hot standby, it is a cold standby. This means that you must
    not route to them until the first data centre has failed. Then when it
    has failed you must no longer route to it, you must route completely
    to the failover data centre instead.

    -Andrew Marlow


  5. Re: What to do about DNS lookups when a site fails and there is a failover site

    wrote in message
    news:3c592c00-e68a-44f2-bafa-26732ad71e4f@59g2000hsb.googlegroups.com...
    > On 8 Apr, 20:52, "stephen" wrote:
    > > wrote in message

    >
    > > > Suppose I have many servers at a data centre and many clients
    > > > connecting to those servers via FQDNs. If the entire site goes down
    > > > then DNS will dish out IP addresses for those servers that no longer
    > > > work. I have another site with machines available to spin up and act
    > > > as alternative servers. These alternative servers are not running by
    > > > default, i.e the failover is not a hot standby, it is a cold standby.
    > > > How can I make it so that clients get directed to the servers at the
    > > > failover site?

    > >
    > > classic way is to balance at the DNS level, but use DNS that can test

    the
    > > target machines.

    >
    > > you should be able to point the DNS at a traffic load balancer, and let

    that
    > > handle redirection.

    >
    > Many thanks for trying to help but unfortunately this will not work.
    > The alternative servers are not running by default, i.e the failover
    > is not a hot standby, it is a cold standby. This means that you must
    > not route to them until the first data centre has failed. Then when it
    > has failed you must no longer route to it, you must route completely
    > to the failover data centre instead.


    not quite.

    set the load balancer up as if the servers were up all the time.

    while they are off, they show up as dead.

    power them up for any reason such as a peak in use or a D/R test, or even a
    real fault and they begin to take part of the load.

    and if the primary servers go away you have all services at your D/R site.

    >
    > -Andrew Marlow

    --
    Regards

    stephen_hope@xyzworld.com - replace xyz with ntl



  6. Re: What to do about DNS lookups when a site fails and there is afailover site

    marlow.andrew@googlemail.com writes:

    > On 9 Apr, 03:18, Digital Mercenary For Honor
    > wrote:
    >> On 2008-04-08 06:27:31 -0400, marlow.and...@googlemail.com said:
    >>
    >> > Suppose I have many servers at a data centre and many clients
    >> > connecting to those servers via FQDNs. If the entire site goes down
    >> > then DNS will dish out IP addresses for those servers that no longer


    One suggestion I hadn't seen come up is using a routing protocol to
    solve this. With BGP, for example, you could use the same IP
    addresses on the primary and backup servers, and when the primary
    servers go down start announcing a route to them from the backup
    data center. You would need an ISP that was willing to cooperate with
    this if your announcement is small (less than /20). Your ISP might
    also have other options using whatever internal routing protocol they
    use.

    Hope this helps,

    ----Scott.

  7. Re: What to do about DNS lookups when a site fails and there is a failover site

    On 2008-04-10 15:40:35 -0400, Scott Gifford said:

    > One suggestion I hadn't seen come up is using a routing protocol to
    > solve this. With BGP, for example, you could use the same IP


    Scott - cool suggestion, I didn't remember this one. My "personal
    record" for getting BGPv4 failover to function, across an internal
    firewall structure, is seven seconds - cool solution, almost a decade
    old now for a financial client. Basically, the BGP conversation
    end-points were the HSRP addresses of the inside and outside firewall
    routers, etc.

    /dmfh

    --

    __| |_ __ / _| |_ 01100100 01101101
    / _` | ' \| _| ' \ 01100110 01101000
    \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx


  8. Re: What to do about DNS lookups when a site fails and there is afailover site

    On 10 Apr, 13:30, "stephen" wrote:
    > > > > How can I make it so that clients get directed to the servers at the
    > > > > failover site?


    > > > classic way is to balance at the DNS level, but use DNS that can test
    > > > the target machines.
    > > > you should be able to point the DNS at a traffic load balancer, and let
    > > > that handle redirection.


    > > Many thanks for trying to help but unfortunately this will not work.
    > > The alternative servers are not running by default, i.e the failover
    > > is not a hot standby, it is a cold standby.


    > not quite.
    >
    > set the load balancer up as if the servers were up all the time.


    Hmm, I am not sure what you mean by the load balancer. What process/
    facility would
    that be? Sounds a bit like dynamic DNS to me. Is that what you had in
    mind?

    > while they are off, they show up as dead.


    I am not sure how this is accomplised.
    Can you explain some more please?

  9. Re: What to do about DNS lookups when a site fails and there is a failover site

    wrote in message
    news:fe3318ba-78b0-4fe9-a6b9-dd5a949032b6@s50g2000hsb.googlegroups.com...
    > On 10 Apr, 13:30, "stephen" wrote:
    > > > > > How can I make it so that clients get directed to the servers at

    the
    > > > > > failover site?

    >
    > > > > classic way is to balance at the DNS level, but use DNS that can

    test
    > > > > the target machines.
    > > > > you should be able to point the DNS at a traffic load balancer, and

    let
    > > > > that handle redirection.

    >
    > > > Many thanks for trying to help but unfortunately this will not work.
    > > > The alternative servers are not running by default, i.e the failover
    > > > is not a hot standby, it is a cold standby.

    >
    > > not quite.
    > >
    > > set the load balancer up as if the servers were up all the time.

    >
    > Hmm, I am not sure what you mean by the load balancer. What process/
    > facility would
    > that be? Sounds a bit like dynamic DNS to me. Is that what you had in
    > mind?


    long time since i did this, and all the various supplier have different
    trade offs, so no concrete examples for you.

    use a resilient pair of load balancers that are clever enough to monitor the
    servers they are sending traffic to.

    These are basically NAT style gateways, but with some flavours you can run a
    hot standby pair, and with a WAN link between 2 of them, spread them between
    the main and backup sites.

    you propagate IP routes from them into your internet provider (or your own
    plumbing or whatever is on the WAN), so you have resilience against site and
    balancer failure.

    The balancers are then set to test the devices they are sending traffic to -
    various things from ping to dummy application transactions.

    The balancers then send individual sessions to devices that can accept them
    (ie up, lowest current load etc) - although getting too clever here tends to
    be self defeating.....

    So - if you have a few servers switched off, the balancers notice they dont
    respond and dont send any traffic to them.

    the better ones have an interface so you can take a box out of the balance
    set before you shut it down, so you can do gracefuly changeovers, remove
    servers from the set easily without disruption and so on.

    net result is that your backup servers can get sent traffic once they are
    up, but you dont black hole traffic when they are powered off.
    >
    > > while they are off, they show up as dead.

    >
    > I am not sure how this is accomplised.
    > Can you explain some more please?

    --
    Regards

    stephen_hope@xyzworld.com - replace xyz with ntl



  10. Re: What to do about DNS lookups when a site fails and there is a failover site

    In article ,
    Barry Margolin wrote:
    >In article
    ><07d9d877-d92d-4d36-ad5a-0c696cb713d2@m36g2000hse.googlegroups.com>,
    > marlow.andrew@googlemail.com wrote:
    >>
    >> > The industry leader here is F5 Networks with their
    >> > Global Traffic Manager...
    >> >
    >> > http://www.f5.com/products/big-ip/pr...l-traffic-mana...

    >>
    >> Hmm. Looks interesting but their web pages indicate it is based on
    >> load balancing.
    >> Doesn't that mean both data centres would have to be hot?
    >> My situation is that the backup data centre is a cold standby.

    >
    >I think it works by testing both datacenters, and load balancing among
    >the live ones. So if you shut down the servers in the backup
    >datacenter, it won't send any traffic to them. When the primary
    >datacenter goes down, and you power up the backup servers, the F5 will
    >start sending traffic there.


    Precisely. Don't confuse the LTM (load-balancer) with the
    GTM (Intelligent DNS) though they can work together.

    Basically the GTM does a behind-the-scenes test for
    availability based on a whole slew of configurable criteria
    and redirects accordingly. There's also several Linux apps
    that can do likewise.

    The GTM uses port hex F5 (of course) to send availability data
    between redundant units. Can also load balance (not for you)
    based on locale.

    Most larger financial centers I've seen use these.

    alan

+ Reply to Thread