Link failover with ping

This is a discussion on Link failover with ping within the Connectivity forums, part of the Systems category; Hello all! I`m finding docs/examples about this subject. After some googling, I only found basic scripts. For me, these simple scripts available only works with excellent links (dedicated ones). The ...

Go Back   Unix Linux Forum > Technologies & Tools > Systems > Connectivity

FixUnix.com - Unix Linux Forums

Unix Content Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 07-12-2008, 10:32 AM
Default Link failover with ping

Hello all!

I`m finding docs/examples about this subject. After some googling,
I only found basic scripts. For me, these simple scripts available
only works with excellent links (dedicated ones).

The load balancing part of the problem already was solved with
LARTC howto. DGD patch for linux was not a good experience
because it only detects the first hop fail and 90% of these fails
happens after the first hop (e.g. local router as gateway
connected to remote ISP).

I searched for some linux distro with this feature (good failover)
bult-in, but didnt found anything interesting.

I would appreciate pointers to docs, distros (for use as starting
point), examples or even academic research about the topic.

Thank you,
Tom Lobato
Reply With Quote
  #2  
Old 07-18-2008, 10:14 AM
Default Re: Link failover with ping

On 2008-07-12 10:32:43 -0400, Tom Lobato said:

> I searched for some linux distro with this feature (good failover)
> bult-in, but didnt found anything interesting.



What failure cases, specifically, are you looking to address? I'm
curious, and based on that, I might be able to toss in $0.02 cents on
where to rummage.

/dmfh

--
_ __ _
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx

Reply With Quote
  #3  
Old 07-19-2008, 09:47 AM
Default Re: Link failover with ping

On Jul 18, 11:14 am, Digital Mercenary For Honor
wrote:
> On 2008-07-12 10:32:43 -0400, Tom Lobato said:
>
> > I searched for some linux distro with this feature (good failover)
> > bult-in, but didnt found anything interesting.

>
> What failure cases, specifically, are you looking to address? I'm
> curious, and based on that, I might be able to toss in $0.02 cents on
> where to rummage.
>
> /dmfh


hello dhfh!
I`m doing multihoming with linux. Lartc gave me how to do load
balancing, with

default proto static
nexthop via X.X.X.X dev eth1 weight 1
nexthop via Y.Y.Y.Y dev eth2 weight 1

but a good feature would be failover, when one of the links goes down,
the system detects it and removes it from the routes.

I`m doing failover with ping. I`m pinging external hosts periodically
and if ping response does not come back, it judges the link is bad
(actually,
it makes some retries, taking about 4-5 minutes before removes the bad
route).

The problem:
When network load is too high (upload and/or download) ping responses
takes
6, 9, 12 seconds to come back, and sometimes never comes back. So the
tester thinks erroneously the link is bad and removes route to ISP.

I`m searching best practices or cases for improve the tester so it can
make
more reliable tests.

Well, I`m doing my homework, trying to use shaping and policy to
priorize ICMP
traffic to "own the queue" (http://lartc.org/howto/
lartc.qdisc.html#LARTC.QDISC.EXPLAIN).
But would be nice to hear experiences and best practices.


Tom Lobato
Reply With Quote
  #4  
Old 08-01-2008, 11:39 PM
Default Re: Link failover with ping

On 2008-07-19 09:47:26 -0400, Tom Lobato said:

> but a good feature would be failover, when one of the links goes down,
> the system detects it and removes it from the routes.


Believe it or not, link failure is the least likely failure type you're
going to encounter. More often you'll experience the failure of the
upstream ISP, or "something wrong in the cloud" over this or that link.

> 6, 9, 12 seconds to come back, and sometimes never comes back. So the
> tester thinks erroneously the link is bad and removes route to ISP.


Instead of using ICMP, consider using TCP (which has re-transmission,
etc.), which could be something simple, like using the common wget
utility to get a small webpage from some site, and seeing what the
shell error return code is from the fetch or otherwise trapping errors
from that attempt and making a decision on them. This way, you're also
basing decisions on a real, practical test of a simulated user
experience. If you want to delve into coding, you could probably whip
something up with PERL.

Also consider using the monit utility with this - monit can test ping
hosts, servers, do sample DNS queries, sample http browses, mysql
server fetches, and based on the outcome, fire off a script (which
could change your routing table).

Off the top of my head, I can't remember for every case how to force an
application to route over a particular link - there's LSSR, loose
source-routing, but for security reasons this is usually disabled at
the host and support for it at the network level isn't there either.
From your gateway host though, you might be able to tell wget or your
PERL application to bind to interface(x) to test and go from there - a
tcpdump against that interface can confirm your test is working through
that "outbound" interface.

Lastly, if these Linux routers are also firewalls, consider the issues
with NAT, if in use and firewall states that may need to change or be
flushed after you change your routing table - you may have recovered
Internet connectivity, but how useful would that be to you if all the
TCP connections from users needed to be re-established? For generic
Internet surfing and connectivity, fine - for real-time and other
applications this needs to be thought through.

Hope this helps a little.

/dmfh

--
_ __ _
__| |_ __ / _| |_ 01100100 01101101
/ _` | ' \| _| ' \ 01100110 01101000
\__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx

Reply With Quote
  #5  
Old 08-02-2008, 10:01 AM
Default Re: Link failover with ping


hi /dmfh!
thank you for the book mail


On Aug 2, 12:39 am, Digital Mercenary For Honor
wrote:
> On 2008-07-19 09:47:26 -0400, Tom Lobato said:
>
> > but a good feature would be failover, when one of the links goes down,
> > the system detects it and removes it from the routes.

>
> Believe it or not, link failure is the least likely failure type you're
> going to encounter. More often you'll experience the failure of the
> upstream ISP, or "something wrong in the cloud" over this or that link.


well, but won`t this "failure of the upstream ISP" result in
unpingable external hosts?
or tcp tests without response?
Or the failure you speak is bad link (say slow) but with pings
returning good?


> > 6, 9, 12 seconds to come back, and sometimes never comes back. So the
> > tester thinks erroneously the link is bad and removes route to ISP.

>
> Instead of using ICMP, consider using TCP (which has re-transmission,
> etc.), which could be something simple, like using the common wget
> utility to get a small webpage from some site, and seeing what the
> shell error return code is from the fetch or otherwise trapping errors
> from that attempt and making a decision on them. This way, you're also
> basing decisions on a real, practical test of a simulated user
> experience. If you want to delve into coding, you could probably whip
> something up with PERL.


great, I`ll try TCP.
and yes, perl é excellent for this. But all my scripts until now
regarding link
failover was made in bash, with only net statistics made with perl/
gnuplot.
Anyway I could use perl scripts (if needed) called by the existing
bash scripts.


> Also consider using the monit utility with this - monit can test ping
> hosts, servers, do sample DNS queries, sample http browses, mysql
> server fetches, and based on the outcome, fire off a script (which
> could change your routing table).


good, I`ll try monit.


> Off the top of my head, I can't remember for every case how to force an
> application to route over a particular link - there's LSSR, loose
> source-routing, but for security reasons this is usually disabled at
> the host and support for it at the network level isn't there either.
> From your gateway host though, you might be able to tell wget or your
> PERL application to bind to interface(x) to test and go from there - a
> tcpdump against that interface can confirm your test is working through
> that "outbound" interface.


in really, I`m using linux MARKs/CONNMARKs with tc/iptables. Based
on lartc howto.


> Lastly, if these Linux routers are also firewalls, consider the issues
> with NAT, if in use and firewall states that may need to change or be
> flushed after you change your routing table - you may have recovered
> Internet connectivity, but how useful would that be to you if all the
> TCP connections from users needed to be re-established? For generic
> Internet surfing and connectivity, fine - for real-time and other
> applications this needs to be thought through.


But with netfilter/iptables/linux, flushing and loading all rules does
drop
existing connections? I don`t think so, but may be wrong.

I will pack all my scripts and son will post a link here.



Thank you!
Tom
Reply With Quote
Reply

Thread Tools


All times are GMT -5. The time now is 12:43 AM.

In an effort to better serve ads to our visitors, cookies are used on Fixunix.com. For more information, check out our Privacy Policy.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Ad Management by RedTyger