What to do about DNS lookups when a site fails and there is afailover site - TCP-IP
This is a discussion on What to do about DNS lookups when a site fails and there is afailover site - TCP-IP ; Suppose I have many servers at a data centre and many clients
connecting to those servers via FQDNs. If the entire site goes down
then DNS will dish out IP addresses for those servers that no longer
work. I have ...
-
What to do about DNS lookups when a site fails and there is afailover site
Suppose I have many servers at a data centre and many clients
connecting to those servers via FQDNs. If the entire site goes down
then DNS will dish out IP addresses for those servers that no longer
work. I have another site with machines available to spin up and act
as alternative servers. These alternative servers are not running by
default, i.e the failover is not a hot standby, it is a cold standby.
How can I make it so that clients get directed to the servers at the
failover site? It has been suggested that new DNS records could be
dynamically added. I am really not sure about this and would welcome
any guidance I can get. Is this even the right NG to discuss such
things?
For my particular problem the failover has to be done at this sort of
level - it cannot be done by altering source code to make the clients
and servers aware of a failover site. We are talking about large sites
and large bodies of software. This is what makes me think some sort of
routing approach is needed.
Regards,
Andrew Marlow
-
Re: What to do about DNS lookups when a site fails and there is a failover site
wrote in message
news:f4b9c14f-aee4-489d-849e-d2adc6e098f4@a1g2000hsb.googlegroups.com...
> Suppose I have many servers at a data centre and many clients
> connecting to those servers via FQDNs. If the entire site goes down
> then DNS will dish out IP addresses for those servers that no longer
> work. I have another site with machines available to spin up and act
> as alternative servers. These alternative servers are not running by
> default, i.e the failover is not a hot standby, it is a cold standby.
> How can I make it so that clients get directed to the servers at the
> failover site?
classic way is to balance at the DNS level, but use DNS that can test the
target machines.
i havent seen that used in anger for a while, (but dont get invovled in this
much now) so not sure that is as useful mpw/
you should be able to point the DNS at a traffic load balancer, and let that
handle redirection.
the more clever load balancers will test the target machine set (or you run
a client there) and only send load to those servers that show they are up,
running and can take some load.
finally you can spread a pair of resilient load balancers across your sites
as well - but depending on the kit you might need some dediacted comms links
between them.
It has been suggested that new DNS records could be
> dynamically added. I am really not sure about this and would welcome
> any guidance I can get. Is this even the right NG to discuss such
> things?
no idea - but on usenet you get suggestions for giving out interesting
problems.....
>
> For my particular problem the failover has to be done at this sort of
> level - it cannot be done by altering source code to make the clients
> and servers aware of a failover site. We are talking about large sites
> and large bodies of software. This is what makes me think some sort of
> routing approach is needed.
>
> Regards,
>
> Andrew Marlow
--
Regards
stephen_hope@xyzworld.com - replace xyz with ntl
-
Re: What to do about DNS lookups when a site fails and there is a failover site
On 2008-04-08 06:27:31 -0400, marlow.andrew@googlemail.com said:
> Suppose I have many servers at a data centre and many clients
> connecting to those servers via FQDNs. If the entire site goes down
> then DNS will dish out IP addresses for those servers that no longer
Classic data-center resiliency problem, which fortunately there are
commerical solutions for (but, they are expensive). The
"chain-of-information" that one winds up needing to protect / engineer
to keep a server "up" to the rest of the world kind of goes like this:
- The IP address the FQDN resolves to.
- The SOA / originator / DNS server that "owns" the FQDN.
Load-balancing will protect you at a single data center site, but not
against site failures, and architecting globally resilient IP addresses
has been a difficult task for many firms (from personal consultative
experience) and winds up not being worth the effort sometimes.
Round-robin DNS often comes up as a solution, but then every other
client will experience a failure, etc., etc.
Some folks choose to rely on a fail-over time linked to how fast they
can make a DNS entry change, but then you're relying on the cached DNS
entry from expiring on everyone's DNS server globally, no guarantee.
Many sites set a TTL on their DNS entries of around 15 minutes, but
this breaks the RFC standard, and for operations I run, I filter DNS
entries and re-write the TTL on them if they are under 3600 seconds
<1h> to stop some forms of DNS / host hacking.
If you're trying to protect a corporate service against failure, say
for foocorporation.com, like vpn, you could setup:
vpn.foo-corporation.com (round-robin DNS)
vpn-a.foo-corporation.com (real IP, data center 1)
vpn-b.foo-corporation.com (real IP, data center 2)
And just have the VPN software scroll through all of those to find a
connection or tell users to us the "-a" or "-b" forms of the hostname.
You might want to ask yourself what kind of data you're trying to keep
available and how users use it, and "scroll down" a bit below the IP
layer, as it's not the only place you can find resiliency. If we're
talking about web data, global solutions like Akamai and others can
give you location-sensitive content delivery, but this requires the
co-operation of your web developers.
When I've done data center designs before, I "scroll up and down" a bit
to think about what I need to protect how. All too often, the network
is seen as the right answer to a resiliency problem, often, it's not. I
would really reject the notion that the software "can't be changed".
Resiliency to a good level is built in to all parts of the system, not
just one. The network can only do so much.
/dmfh
--
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx
-
Re: What to do about DNS lookups when a site fails and there is afailover site
On 8 Apr, 20:52, "stephen" wrote:
> wrote in message
> > Suppose I have many servers at a data centre and many clients
> > connecting to those servers via FQDNs. If the entire site goes down
> > then DNS will dish out IP addresses for those servers that no longer
> > work. I have another site with machines available to spin up and act
> > as alternative servers. These alternative servers are not running by
> > default, i.e the failover is not a hot standby, it is a cold standby.
> > How can I make it so that clients get directed to the servers at the
> > failover site?
>
> classic way is to balance at the DNS level, but use DNS that can test the
> target machines.
> you should be able to point the DNS at a traffic load balancer, and let that
> handle redirection.
Many thanks for trying to help but unfortunately this will not work.
The alternative servers are not running by default, i.e the failover
is not a hot standby, it is a cold standby. This means that you must
not route to them until the first data centre has failed. Then when it
has failed you must no longer route to it, you must route completely
to the failover data centre instead.
-Andrew Marlow
-
Re: What to do about DNS lookups when a site fails and there is a failover site
wrote in message
news:3c592c00-e68a-44f2-bafa-26732ad71e4f@59g2000hsb.googlegroups.com...
> On 8 Apr, 20:52, "stephen" wrote:
> > wrote in message
>
> > > Suppose I have many servers at a data centre and many clients
> > > connecting to those servers via FQDNs. If the entire site goes down
> > > then DNS will dish out IP addresses for those servers that no longer
> > > work. I have another site with machines available to spin up and act
> > > as alternative servers. These alternative servers are not running by
> > > default, i.e the failover is not a hot standby, it is a cold standby.
> > > How can I make it so that clients get directed to the servers at the
> > > failover site?
> >
> > classic way is to balance at the DNS level, but use DNS that can test
the
> > target machines.
>
> > you should be able to point the DNS at a traffic load balancer, and let
that
> > handle redirection.
>
> Many thanks for trying to help but unfortunately this will not work.
> The alternative servers are not running by default, i.e the failover
> is not a hot standby, it is a cold standby. This means that you must
> not route to them until the first data centre has failed. Then when it
> has failed you must no longer route to it, you must route completely
> to the failover data centre instead.
not quite.
set the load balancer up as if the servers were up all the time.
while they are off, they show up as dead.
power them up for any reason such as a peak in use or a D/R test, or even a
real fault and they begin to take part of the load.
and if the primary servers go away you have all services at your D/R site.
>
> -Andrew Marlow
--
Regards
stephen_hope@xyzworld.com - replace xyz with ntl
-
Re: What to do about DNS lookups when a site fails and there is afailover site
marlow.andrew@googlemail.com writes:
> On 9 Apr, 03:18, Digital Mercenary For Honor
> wrote:
>> On 2008-04-08 06:27:31 -0400, marlow.and...@googlemail.com said:
>>
>> > Suppose I have many servers at a data centre and many clients
>> > connecting to those servers via FQDNs. If the entire site goes down
>> > then DNS will dish out IP addresses for those servers that no longer
One suggestion I hadn't seen come up is using a routing protocol to
solve this. With BGP, for example, you could use the same IP
addresses on the primary and backup servers, and when the primary
servers go down start announcing a route to them from the backup
data center. You would need an ISP that was willing to cooperate with
this if your announcement is small (less than /20). Your ISP might
also have other options using whatever internal routing protocol they
use.
Hope this helps,
----Scott.
-
Re: What to do about DNS lookups when a site fails and there is a failover site
On 2008-04-10 15:40:35 -0400, Scott Gifford said:
> One suggestion I hadn't seen come up is using a routing protocol to
> solve this. With BGP, for example, you could use the same IP
Scott - cool suggestion, I didn't remember this one. My "personal
record" for getting BGPv4 failover to function, across an internal
firewall structure, is seven seconds - cool solution, almost a decade
old now for a financial client. Basically, the BGP conversation
end-points were the HSRP addresses of the inside and outside firewall
routers, etc.
/dmfh
--
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx
-
Re: What to do about DNS lookups when a site fails and there is afailover site
On 10 Apr, 13:30, "stephen" wrote:
> > > > How can I make it so that clients get directed to the servers at the
> > > > failover site?
> > > classic way is to balance at the DNS level, but use DNS that can test
> > > the target machines.
> > > you should be able to point the DNS at a traffic load balancer, and let
> > > that handle redirection.
> > Many thanks for trying to help but unfortunately this will not work.
> > The alternative servers are not running by default, i.e the failover
> > is not a hot standby, it is a cold standby.
> not quite.
>
> set the load balancer up as if the servers were up all the time.
Hmm, I am not sure what you mean by the load balancer. What process/
facility would
that be? Sounds a bit like dynamic DNS to me. Is that what you had in
mind?
> while they are off, they show up as dead.
I am not sure how this is accomplised.
Can you explain some more please?
-
Re: What to do about DNS lookups when a site fails and there is a failover site
wrote in message
news:fe3318ba-78b0-4fe9-a6b9-dd5a949032b6@s50g2000hsb.googlegroups.com...
> On 10 Apr, 13:30, "stephen" wrote:
> > > > > How can I make it so that clients get directed to the servers at
the
> > > > > failover site?
>
> > > > classic way is to balance at the DNS level, but use DNS that can
test
> > > > the target machines.
> > > > you should be able to point the DNS at a traffic load balancer, and
let
> > > > that handle redirection.
>
> > > Many thanks for trying to help but unfortunately this will not work.
> > > The alternative servers are not running by default, i.e the failover
> > > is not a hot standby, it is a cold standby.
>
> > not quite.
> >
> > set the load balancer up as if the servers were up all the time.
>
> Hmm, I am not sure what you mean by the load balancer. What process/
> facility would
> that be? Sounds a bit like dynamic DNS to me. Is that what you had in
> mind?
long time since i did this, and all the various supplier have different
trade offs, so no concrete examples for you.
use a resilient pair of load balancers that are clever enough to monitor the
servers they are sending traffic to.
These are basically NAT style gateways, but with some flavours you can run a
hot standby pair, and with a WAN link between 2 of them, spread them between
the main and backup sites.
you propagate IP routes from them into your internet provider (or your own
plumbing or whatever is on the WAN), so you have resilience against site and
balancer failure.
The balancers are then set to test the devices they are sending traffic to -
various things from ping to dummy application transactions.
The balancers then send individual sessions to devices that can accept them
(ie up, lowest current load etc) - although getting too clever here tends to
be self defeating.....
So - if you have a few servers switched off, the balancers notice they dont
respond and dont send any traffic to them.
the better ones have an interface so you can take a box out of the balance
set before you shut it down, so you can do gracefuly changeovers, remove
servers from the set easily without disruption and so on.
net result is that your backup servers can get sent traffic once they are
up, but you dont black hole traffic when they are powered off.
>
> > while they are off, they show up as dead.
>
> I am not sure how this is accomplised.
> Can you explain some more please?
--
Regards
stephen_hope@xyzworld.com - replace xyz with ntl
-
Re: What to do about DNS lookups when a site fails and there is a failover site
In article ,
Barry Margolin wrote:
>In article
><07d9d877-d92d-4d36-ad5a-0c696cb713d2@m36g2000hse.googlegroups.com>,
> marlow.andrew@googlemail.com wrote:
>>
>> > The industry leader here is F5 Networks with their
>> > Global Traffic Manager...
>> >
>> > http://www.f5.com/products/big-ip/pr...l-traffic-mana...
>>
>> Hmm. Looks interesting but their web pages indicate it is based on
>> load balancing.
>> Doesn't that mean both data centres would have to be hot?
>> My situation is that the backup data centre is a cold standby.
>
>I think it works by testing both datacenters, and load balancing among
>the live ones. So if you shut down the servers in the backup
>datacenter, it won't send any traffic to them. When the primary
>datacenter goes down, and you power up the backup servers, the F5 will
>start sending traffic there.
Precisely. Don't confuse the LTM (load-balancer) with the
GTM (Intelligent DNS) though they can work together.
Basically the GTM does a behind-the-scenes test for
availability based on a whole slew of configurable criteria
and redirects accordingly. There's also several Linux apps
that can do likewise.
The GTM uses port hex F5 (of course) to send availability data
between redundant units. Can also load balance (not for you)
based on locale.
Most larger financial centers I've seen use these.
alan