Link failover with ping - Connectivity

This is a discussion on Link failover with ping - Connectivity ; Hello all! I`m finding docs/examples about this subject. After some googling, I only found basic scripts. For me, these simple scripts available only works with excellent links (dedicated ones). The load balancing part of the problem already was solved with ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: Link failover with ping

  1. Link failover with ping

    Hello all!

    I`m finding docs/examples about this subject. After some googling,
    I only found basic scripts. For me, these simple scripts available
    only works with excellent links (dedicated ones).

    The load balancing part of the problem already was solved with
    LARTC howto. DGD patch for linux was not a good experience
    because it only detects the first hop fail and 90% of these fails
    happens after the first hop (e.g. local router as gateway
    connected to remote ISP).

    I searched for some linux distro with this feature (good failover)
    bult-in, but didnt found anything interesting.

    I would appreciate pointers to docs, distros (for use as starting
    point), examples or even academic research about the topic.

    Thank you,
    Tom Lobato

  2. Re: Link failover with ping

    On 2008-07-12 10:32:43 -0400, Tom Lobato said:

    > I searched for some linux distro with this feature (good failover)
    > bult-in, but didnt found anything interesting.



    What failure cases, specifically, are you looking to address? I'm
    curious, and based on that, I might be able to toss in $0.02 cents on
    where to rummage.

    /dmfh

    --
    _ __ _
    __| |_ __ / _| |_ 01100100 01101101
    / _` | ' \| _| ' \ 01100110 01101000
    \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx


  3. Re: Link failover with ping

    On Jul 18, 11:14 am, Digital Mercenary For Honor
    wrote:
    > On 2008-07-12 10:32:43 -0400, Tom Lobato said:
    >
    > > I searched for some linux distro with this feature (good failover)
    > > bult-in, but didnt found anything interesting.

    >
    > What failure cases, specifically, are you looking to address? I'm
    > curious, and based on that, I might be able to toss in $0.02 cents on
    > where to rummage.
    >
    > /dmfh


    hello dhfh!
    I`m doing multihoming with linux. Lartc gave me how to do load
    balancing, with

    default proto static
    nexthop via X.X.X.X dev eth1 weight 1
    nexthop via Y.Y.Y.Y dev eth2 weight 1

    but a good feature would be failover, when one of the links goes down,
    the system detects it and removes it from the routes.

    I`m doing failover with ping. I`m pinging external hosts periodically
    and if ping response does not come back, it judges the link is bad
    (actually,
    it makes some retries, taking about 4-5 minutes before removes the bad
    route).

    The problem:
    When network load is too high (upload and/or download) ping responses
    takes
    6, 9, 12 seconds to come back, and sometimes never comes back. So the
    tester thinks erroneously the link is bad and removes route to ISP.

    I`m searching best practices or cases for improve the tester so it can
    make
    more reliable tests.

    Well, I`m doing my homework, trying to use shaping and policy to
    priorize ICMP
    traffic to "own the queue" (http://lartc.org/howto/
    lartc.qdisc.html#LARTC.QDISC.EXPLAIN).
    But would be nice to hear experiences and best practices.


    Tom Lobato

  4. Re: Link failover with ping

    On 2008-07-19 09:47:26 -0400, Tom Lobato said:

    > but a good feature would be failover, when one of the links goes down,
    > the system detects it and removes it from the routes.


    Believe it or not, link failure is the least likely failure type you're
    going to encounter. More often you'll experience the failure of the
    upstream ISP, or "something wrong in the cloud" over this or that link.

    > 6, 9, 12 seconds to come back, and sometimes never comes back. So the
    > tester thinks erroneously the link is bad and removes route to ISP.


    Instead of using ICMP, consider using TCP (which has re-transmission,
    etc.), which could be something simple, like using the common wget
    utility to get a small webpage from some site, and seeing what the
    shell error return code is from the fetch or otherwise trapping errors
    from that attempt and making a decision on them. This way, you're also
    basing decisions on a real, practical test of a simulated user
    experience. If you want to delve into coding, you could probably whip
    something up with PERL.

    Also consider using the monit utility with this - monit can test ping
    hosts, servers, do sample DNS queries, sample http browses, mysql
    server fetches, and based on the outcome, fire off a script (which
    could change your routing table).

    Off the top of my head, I can't remember for every case how to force an
    application to route over a particular link - there's LSSR, loose
    source-routing, but for security reasons this is usually disabled at
    the host and support for it at the network level isn't there either.
    From your gateway host though, you might be able to tell wget or your
    PERL application to bind to interface(x) to test and go from there - a
    tcpdump against that interface can confirm your test is working through
    that "outbound" interface.

    Lastly, if these Linux routers are also firewalls, consider the issues
    with NAT, if in use and firewall states that may need to change or be
    flushed after you change your routing table - you may have recovered
    Internet connectivity, but how useful would that be to you if all the
    TCP connections from users needed to be re-established? For generic
    Internet surfing and connectivity, fine - for real-time and other
    applications this needs to be thought through.

    Hope this helps a little.

    /dmfh

    --
    _ __ _
    __| |_ __ / _| |_ 01100100 01101101
    / _` | ' \| _| ' \ 01100110 01101000
    \__,_|_|_|_|_| |_||_| dmfh(-2)dmfh.cx


  5. Re: Link failover with ping


    hi /dmfh!
    thank you for the book mail


    On Aug 2, 12:39 am, Digital Mercenary For Honor
    wrote:
    > On 2008-07-19 09:47:26 -0400, Tom Lobato said:
    >
    > > but a good feature would be failover, when one of the links goes down,
    > > the system detects it and removes it from the routes.

    >
    > Believe it or not, link failure is the least likely failure type you're
    > going to encounter. More often you'll experience the failure of the
    > upstream ISP, or "something wrong in the cloud" over this or that link.


    well, but won`t this "failure of the upstream ISP" result in
    unpingable external hosts?
    or tcp tests without response?
    Or the failure you speak is bad link (say slow) but with pings
    returning good?


    > > 6, 9, 12 seconds to come back, and sometimes never comes back. So the
    > > tester thinks erroneously the link is bad and removes route to ISP.

    >
    > Instead of using ICMP, consider using TCP (which has re-transmission,
    > etc.), which could be something simple, like using the common wget
    > utility to get a small webpage from some site, and seeing what the
    > shell error return code is from the fetch or otherwise trapping errors
    > from that attempt and making a decision on them. This way, you're also
    > basing decisions on a real, practical test of a simulated user
    > experience. If you want to delve into coding, you could probably whip
    > something up with PERL.


    great, I`ll try TCP.
    and yes, perl excellent for this. But all my scripts until now
    regarding link
    failover was made in bash, with only net statistics made with perl/
    gnuplot.
    Anyway I could use perl scripts (if needed) called by the existing
    bash scripts.


    > Also consider using the monit utility with this - monit can test ping
    > hosts, servers, do sample DNS queries, sample http browses, mysql
    > server fetches, and based on the outcome, fire off a script (which
    > could change your routing table).


    good, I`ll try monit.


    > Off the top of my head, I can't remember for every case how to force an
    > application to route over a particular link - there's LSSR, loose
    > source-routing, but for security reasons this is usually disabled at
    > the host and support for it at the network level isn't there either.
    > From your gateway host though, you might be able to tell wget or your
    > PERL application to bind to interface(x) to test and go from there - a
    > tcpdump against that interface can confirm your test is working through
    > that "outbound" interface.


    in really, I`m using linux MARKs/CONNMARKs with tc/iptables. Based
    on lartc howto.


    > Lastly, if these Linux routers are also firewalls, consider the issues
    > with NAT, if in use and firewall states that may need to change or be
    > flushed after you change your routing table - you may have recovered
    > Internet connectivity, but how useful would that be to you if all the
    > TCP connections from users needed to be re-established? For generic
    > Internet surfing and connectivity, fine - for real-time and other
    > applications this needs to be thought through.


    But with netfilter/iptables/linux, flushing and loading all rules does
    drop
    existing connections? I don`t think so, but may be wrong.

    I will pack all my scripts and son will post a link here.



    Thank you!
    Tom

+ Reply to Thread