| Unix Content | Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
|
| Hello all! I`m finding docs/examples about this subject. After some googling, I only found basic scripts. For me, these simple scripts available only works with excellent links (dedicated ones). The load balancing part of the problem already was solved with LARTC howto. DGD patch for linux was not a good experience because it only detects the first hop fail and 90% of these fails happens after the first hop (e.g. local router as gateway connected to remote ISP). I searched for some linux distro with this feature (good failover) bult-in, but didnt found anything interesting. I would appreciate pointers to docs, distros (for use as starting point), examples or even academic research about the topic. Thank you, Tom Lobato |
|
#2
|
| On 2008-07-12 10:32:43 -0400, Tom Lobato > I searched for some linux distro with this feature (good failover) > bult-in, but didnt found anything interesting. What failure cases, specifically, are you looking to address? I'm curious, and based on that, I might be able to toss in $0.02 cents on where to rummage. /dmfh -- _ __ _ __| |_ __ / _| |_ 01100100 01101101 / _` | ' \| _| ' \ 01100110 01101000 \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx |
|
#3
|
| On Jul 18, 11:14 am, Digital Mercenary For Honor > On 2008-07-12 10:32:43 -0400, Tom Lobato > > > I searched for some linux distro with this feature (good failover) > > bult-in, but didnt found anything interesting. > > What failure cases, specifically, are you looking to address? I'm > curious, and based on that, I might be able to toss in $0.02 cents on > where to rummage. > > /dmfh hello dhfh! I`m doing multihoming with linux. Lartc gave me how to do load balancing, with default proto static nexthop via X.X.X.X dev eth1 weight 1 nexthop via Y.Y.Y.Y dev eth2 weight 1 but a good feature would be failover, when one of the links goes down, the system detects it and removes it from the routes. I`m doing failover with ping. I`m pinging external hosts periodically and if ping response does not come back, it judges the link is bad (actually, it makes some retries, taking about 4-5 minutes before removes the bad route). The problem: When network load is too high (upload and/or download) ping responses takes 6, 9, 12 seconds to come back, and sometimes never comes back. So the tester thinks erroneously the link is bad and removes route to ISP. I`m searching best practices or cases for improve the tester so it can make more reliable tests. Well, I`m doing my homework, trying to use shaping and policy to priorize ICMP traffic to "own the queue" (http://lartc.org/howto/ lartc.qdisc.html#LARTC.QDISC.EXPLAIN). But would be nice to hear experiences and best practices. Tom Lobato |
|
#4
|
| On 2008-07-19 09:47:26 -0400, Tom Lobato > but a good feature would be failover, when one of the links goes down, > the system detects it and removes it from the routes. Believe it or not, link failure is the least likely failure type you're going to encounter. More often you'll experience the failure of the upstream ISP, or "something wrong in the cloud" over this or that link. > 6, 9, 12 seconds to come back, and sometimes never comes back. So the > tester thinks erroneously the link is bad and removes route to ISP. Instead of using ICMP, consider using TCP (which has re-transmission, etc.), which could be something simple, like using the common wget utility to get a small webpage from some site, and seeing what the shell error return code is from the fetch or otherwise trapping errors from that attempt and making a decision on them. This way, you're also basing decisions on a real, practical test of a simulated user experience. If you want to delve into coding, you could probably whip something up with PERL. Also consider using the monit utility with this - monit can test ping hosts, servers, do sample DNS queries, sample http browses, mysql server fetches, and based on the outcome, fire off a script (which could change your routing table). Off the top of my head, I can't remember for every case how to force an application to route over a particular link - there's LSSR, loose source-routing, but for security reasons this is usually disabled at the host and support for it at the network level isn't there either. From your gateway host though, you might be able to tell wget or your PERL application to bind to interface(x) to test and go from there - a tcpdump against that interface can confirm your test is working through that "outbound" interface. Lastly, if these Linux routers are also firewalls, consider the issues with NAT, if in use and firewall states that may need to change or be flushed after you change your routing table - you may have recovered Internet connectivity, but how useful would that be to you if all the TCP connections from users needed to be re-established? For generic Internet surfing and connectivity, fine - for real-time and other applications this needs to be thought through. Hope this helps a little. /dmfh -- _ __ _ __| |_ __ / _| |_ 01100100 01101101 / _` | ' \| _| ' \ 01100110 01101000 \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx |
|
#5
|
| hi /dmfh! thank you for the book mail ![]() On Aug 2, 12:39 am, Digital Mercenary For Honor > On 2008-07-19 09:47:26 -0400, Tom Lobato > > > but a good feature would be failover, when one of the links goes down, > > the system detects it and removes it from the routes. > > Believe it or not, link failure is the least likely failure type you're > going to encounter. More often you'll experience the failure of the > upstream ISP, or "something wrong in the cloud" over this or that link. well, but won`t this "failure of the upstream ISP" result in unpingable external hosts? or tcp tests without response? Or the failure you speak is bad link (say slow) but with pings returning good? > > 6, 9, 12 seconds to come back, and sometimes never comes back. So the > > tester thinks erroneously the link is bad and removes route to ISP. > > Instead of using ICMP, consider using TCP (which has re-transmission, > etc.), which could be something simple, like using the common wget > utility to get a small webpage from some site, and seeing what the > shell error return code is from the fetch or otherwise trapping errors > from that attempt and making a decision on them. This way, you're also > basing decisions on a real, practical test of a simulated user > experience. If you want to delve into coding, you could probably whip > something up with PERL. great, I`ll try TCP. and yes, perl é excellent for this. But all my scripts until now regarding link failover was made in bash, with only net statistics made with perl/ gnuplot. Anyway I could use perl scripts (if needed) called by the existing bash scripts. > Also consider using the monit utility with this - monit can test ping > hosts, servers, do sample DNS queries, sample http browses, mysql > server fetches, and based on the outcome, fire off a script (which > could change your routing table). good, I`ll try monit. > Off the top of my head, I can't remember for every case how to force an > application to route over a particular link - there's LSSR, loose > source-routing, but for security reasons this is usually disabled at > the host and support for it at the network level isn't there either. > From your gateway host though, you might be able to tell wget or your > PERL application to bind to interface(x) to test and go from there - a > tcpdump against that interface can confirm your test is working through > that "outbound" interface. in really, I`m using linux MARKs/CONNMARKs with tc/iptables. Based on lartc howto. > Lastly, if these Linux routers are also firewalls, consider the issues > with NAT, if in use and firewall states that may need to change or be > flushed after you change your routing table - you may have recovered > Internet connectivity, but how useful would that be to you if all the > TCP connections from users needed to be re-established? For generic > Internet surfing and connectivity, fine - for real-time and other > applications this needs to be thought through. But with netfilter/iptables/linux, flushing and loading all rules does drop existing connections? I don`t think so, but may be wrong. I will pack all my scripts and son will post a link here. Thank you! Tom |