Inexplicable sudden halt - TCP-IP

This is a discussion on Inexplicable sudden halt - TCP-IP ; A strange thing happened on my home LAN the other day, and fortunately I was capturing the traffic with Ethereal at the time so I thought I'd be able to diagnose the problem. My LAN is connected to the Internet ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Inexplicable sudden halt

  1. Inexplicable sudden halt

    A strange thing happened on my home LAN the other day, and fortunately I
    was capturing the traffic with Ethereal at the time so I thought I'd be
    able to diagnose the problem.

    My LAN is connected to the Internet via an ADSL router (a Draytek Vigor)
    and the connection is distributed to some of the clients by a wi-fi
    bridge (an Apple AirPort Express) which is itself connected to one of
    the wired ports of the Draytek router. A Windows XP host on the AirPort
    wi-fi side was file-sharing on the Gnutella network when, all of a
    sudden, incoming traffic just ceased. All that Ethereal picked up after
    the sudden halt was:

    * initially several TCP retransmissions by the WinXP host

    * then lots of attempted SYN packets sent out by the WinXP host
    to Gnutella peers but all of which went unanswered. This behaviour
    lasted for at least an hour, at which point I stopped capturing.

    * some DNS and NBNS name query packets (UDP) which also went answered

    * a few ARP packets (as normal) going back and forth between the
    WinXP host and the Draytek router (via the AirPort bridge)

    * a few incoming TCP RST packets, the first of which was about
    10 mins after the sudden halt of all other incoming IP traffic
    and the last of which was nearly 50 mins after.

    Now, just to add extra confusion to the mix, I have a Mac connected to a
    separate wi-fi network which the Draytek itself provides, and I could
    access the Internet over the same ADSL connection just fine from the Mac
    while the WinXP machine was having the problems described above. Here is
    a rough sketch of the topology:


    Mac wi-fi Windows wi-fi

    \|/ \|/
    | |
    +---------+ +-----+ +---------+
    ADSL-------| Draytek |--------| Hub |--------| AirPort |
    +---------+ +-----+ +---------+
    NAT router, | bridge only
    DHCP server, |
    & ADSL modem |
    Network
    monitoring
    point


    Ideas I had for what could cause the sudden halt were that:

    1) the ISP was blocking Gnutella traffic

    2) the router had a NAT table malfunction

    3) the router had a "switching" table malfunction

    However, everything is back to normal today and I have not even touched
    the router, let alone reset it. So this poses a number of questions:

    1) If the ISP was filtering:
    a) why did the RST packets get through?
    b) how can it block at the TCP layer without first seeing
    the Gnutella application payload?
    c) why would the ISP only block for one day?

    2) If it was router NAT or switching malfunction:
    a) why didn't new connections repopulate the relevant tables
    and everything quickly recover?
    b) why did everything work fine from the Mac?
    c) why is it now working without needing a reset?

    Could there be any other explanation for the sudden halt of incoming
    traffic that I witnessed? I'm baffled as to what would cause this and
    would greatly appreciate any fresh ideas.

    I could also provide samples of the captured traffic if anyone thinks it
    would help.

    --
    James Taylor

  2. Re: Inexplicable sudden halt

    On 2006-08-05 04:45:51 -0400, usenet@oakseed.demon.co.uk.invalid (James
    Taylor) said:

    > A strange thing happened on my home LAN the other day, and fortunately I
    > was capturing the traffic with Ethereal at the time so I thought I'd be
    > able to diagnose the problem.
    >


    I'll take a guess here, but my guess is based on the principle that
    your ISP is performining some kind of bridging, perhaps ATM bridging
    beween your Dratek router or something else where the connection
    between yourself and the ISP is not a true L3 routed hop, requiring
    some kind of L2 knowledge of your local equipment.

    It does sound like you lost connectivity - if you did, the TCP RST's
    are not so surprising, as the applications you were connected to might
    have had their TCP/IP stacks start sending RST's after you became
    non-responsive - if you match the before / after packet traces you have
    of the event, if the ports and TCP sequence numbers match, that sort of
    mates up.

    If you connected your Mac after the outage began, this could have had
    the affect of creating a new ARP entry or otherwise entering new
    information into an L2 table somewhere that enabled traffic to flow.
    Your post seemed to indicate that the Macintosh WiFi was on a different
    port or the Draytek router provides the WiFi itself - if so, it could
    be you locked up on the hub portion of the Draytek and the rest of the
    unit kept functioning - so-so likelyhood.

    Your Dratek router might have just locked up in microcode, and freed
    itself after a while, unlikely I think since my experience with
    microcode issues is devices will just stall / stop, and require manual
    intervention (reboot / hammer / expletives, etc.)

    If you have a hub connected to the Draytek router, is it a shared hub?
    Could you have perhaps experienced a storm of traffic that created
    enough collisions to overload either unit (the hub or the Draytek) for
    a while?

    Are the IP addresses that appear publically for the Mac WiFi and the
    Windows WiFi different, or do they NAT to the same address?

    Lots of guesswork here, but I'm never one to turn down a troubleshoot.

    /dmfh

    ----
    __| |_ __ / _| |_ ____ __
    dmfh @ / _` | ' \| _| ' \ _ / _\ \ /
    \__,_|_|_|_|_| |_||_| (_) \__/_\_\
    ----



  3. Re: Inexplicable sudden halt

    DMFH wrote:

    > James Taylor wrote:
    >
    > > A strange thing happened on my home LAN the other day, and fortunately
    > > I was capturing the traffic with Ethereal at the time so I thought
    > > I'd be able to diagnose the problem.

    >
    > I'll take a guess here, but my guess is based on the principle that
    > your ISP is performining some kind of bridging, perhaps ATM bridging
    > beween your Dratek router or something else where the connection
    > between yourself and the ISP is not a true L3 routed hop, requiring
    > some kind of L2 knowledge of your local equipment.


    First of all, I think you for following up my post. This problem is
    truly baffling to me, and I'm glad to see I'm not alone. I was wondering
    if *anyone* was going to be brave enough to throw some ideas into the
    pot, so I'm grateful for your interest. Thanks.

    I'm not sure I understand what you mean by "true L3 routed hop" in this
    context. Surely, whatever the L2 protocols and medium may be, or
    whatever higher lever tunnelling may be in place, the IP layer remains,
    as always, a routed protocol. Right?

    > It does sound like you lost connectivity - if you did, the TCP RST's
    > are not so surprising, as the applications you were connected to might
    > have had their TCP/IP stacks start sending RST's after you became
    > non-responsive - if you match the before / after packet traces you have
    > of the event, if the ports and TCP sequence numbers match, that sort of
    > mates up.


    Yes, I will do a bit more analysis along those lines to check this, but
    I suspect you're correct in thinking the RST's are quite normal, given
    that none of the outgoing packets from the WinXP host were sent out. I
    still don't understand why the problem only affected the Draytek's wired
    ports and not its wi-fi side.

    > If you connected your Mac after the outage began, this could have had
    > the affect of creating a new ARP entry or otherwise entering new
    > information into an L2 table somewhere that enabled traffic to flow.


    No, the Mac was permanently connected and operating normally on the
    Internet.

    > Your post seemed to indicate that the Macintosh WiFi was on a different
    > port or the Draytek router provides the WiFi itself


    Yes, the Mac wi-fi is provided by the Draytek itself.

    > - if so, it could
    > be you locked up on the hub portion of the Draytek and the rest of the
    > unit kept functioning - so-so likelyhood.


    This is my suspicion too, but I find the idea that something so low
    level could crash *and* then recover itself rather far fetched.

    > Your Dratek router might have just locked up in microcode, and freed
    > itself after a while, unlikely I think since my experience with
    > microcode issues is devices will just stall / stop, and require manual
    > intervention (reboot / hammer / expletives, etc.)


    Yes, quite. ;-)

    By the way, is "microcode" the right term? I think of microcode as being
    the firmware built into a CISC CPU to help it decode its instructions,
    but maybe I'm out of date.

    > If you have a hub connected to the Draytek router, is it a shared hub?
    > Could you have perhaps experienced a storm of traffic that created
    > enough collisions to overload either unit (the hub or the Draytek) for
    > a while?


    No the hub is just for monitoring the connection between Airport and
    Draytek. I can monitor on the Draytek itself because it is a switch, not
    a hub, and so I wouldn't get to see all the traffic.

    Of course, I suppose the ISP might have had some kind of DDOS attack for
    just a few hours, but then I don't see why the Mac had no trouble
    connecting out while the WinXP machine did.

    > Are the IP addresses that appear publically for the Mac WiFi and the
    > Windows WiFi different, or do they NAT to the same address?


    Yes, the Draytek is the DHCP server and NAT device for the whole
    network, while the Airport merely bridges between the wired and wireless
    networks, so there is just one pool of internal private addresses for
    hosts on my network (regardless of which wi-fi access point they're
    connected to) and they all share the same public IP via NAT.

    > Lots of guesswork here, but I'm never one to turn down a troubleshoot.



    That's the spirit! Do you think you'd be able to learn more from the
    capture files? I'd be happy to make them available for download,
    analysis and discussion, especially if people here are interested in
    seeing if they can deduce the cause of the problem.

    --
    James Taylor

  4. Re: Inexplicable sudden halt

    Hi James,

    1. Blocking Gnutella is not difficult. ISPs can just close UDP and TCP
    6346 , 6347.

    2. However, as you correctly pointed out, it is unlikely they blocked
    Gnutella because your connection opened up shortly afterwards and the
    RSTs got through .

    3. My guess is you experienced a transient problem with the NAT tables
    on the Draytek. The NAT table could have been filled up due to tons of
    outgoing connections from Gnuetella (quite a common occurance with some
    Game Servers). When the NAT table on the Draytek fills up - what
    happens next is implementation dependent.

    When the NAT table fills up, in most cases the router stops accepting
    new requests, but will allow existing connections will continue. In
    some cases, the router will stop passing traffic altogether and will
    wait for the NAT table entries themselves to timeout. As and when space
    is made due to entries timing out new connections may be accepted.

    The NAT entry timeouts are quite large, I think it may even be a few
    hours so it is almost sure to work fine the next day (after 12 hours
    maybe ?)

    Now, the question is why did the NAT work from the MAC ? That may be an
    implementation issue. Like you said, the firmware may treat the wired
    side and the wireless side separately.

    How big are the capture files ? We love to do packet digging. Two
    things to look out for in the capture (1) what happens to individual
    TCP connections and (2) what is the "NAT load" placed by the XP box ?
    You can send the files to me if it is less than 10M.

    Best Regards,
    Vivek Rajan
    Unleash Networks
    http://www.unleashnetworks.com




    James Taylor wrote:
    > DMFH wrote:
    >
    > > James Taylor wrote:
    > >
    > > > A strange thing happened on my home LAN the other day, and fortunately
    > > > I was capturing the traffic with Ethereal at the time so I thought
    > > > I'd be able to diagnose the problem.

    > >
    > > I'll take a guess here, but my guess is based on the principle that
    > > your ISP is performining some kind of bridging, perhaps ATM bridging
    > > beween your Dratek router or something else where the connection
    > > between yourself and the ISP is not a true L3 routed hop, requiring
    > > some kind of L2 knowledge of your local equipment.

    >
    > First of all, I think you for following up my post. This problem is
    > truly baffling to me, and I'm glad to see I'm not alone. I was wondering
    > if *anyone* was going to be brave enough to throw some ideas into the
    > pot, so I'm grateful for your interest. Thanks.
    >
    > I'm not sure I understand what you mean by "true L3 routed hop" in this
    > context. Surely, whatever the L2 protocols and medium may be, or
    > whatever higher lever tunnelling may be in place, the IP layer remains,
    > as always, a routed protocol. Right?
    >
    > > It does sound like you lost connectivity - if you did, the TCP RST's
    > > are not so surprising, as the applications you were connected to might
    > > have had their TCP/IP stacks start sending RST's after you became
    > > non-responsive - if you match the before / after packet traces you have
    > > of the event, if the ports and TCP sequence numbers match, that sort of
    > > mates up.

    >
    > Yes, I will do a bit more analysis along those lines to check this, but
    > I suspect you're correct in thinking the RST's are quite normal, given
    > that none of the outgoing packets from the WinXP host were sent out. I
    > still don't understand why the problem only affected the Draytek's wired
    > ports and not its wi-fi side.
    >
    > > If you connected your Mac after the outage began, this could have had
    > > the affect of creating a new ARP entry or otherwise entering new
    > > information into an L2 table somewhere that enabled traffic to flow.

    >
    > No, the Mac was permanently connected and operating normally on the
    > Internet.
    >
    > > Your post seemed to indicate that the Macintosh WiFi was on a different
    > > port or the Draytek router provides the WiFi itself

    >
    > Yes, the Mac wi-fi is provided by the Draytek itself.
    >
    > > - if so, it could
    > > be you locked up on the hub portion of the Draytek and the rest of the
    > > unit kept functioning - so-so likelyhood.

    >
    > This is my suspicion too, but I find the idea that something so low
    > level could crash *and* then recover itself rather far fetched.
    >
    > > Your Dratek router might have just locked up in microcode, and freed
    > > itself after a while, unlikely I think since my experience with
    > > microcode issues is devices will just stall / stop, and require manual
    > > intervention (reboot / hammer / expletives, etc.)

    >
    > Yes, quite. ;-)
    >
    > By the way, is "microcode" the right term? I think of microcode as being
    > the firmware built into a CISC CPU to help it decode its instructions,
    > but maybe I'm out of date.
    >
    > > If you have a hub connected to the Draytek router, is it a shared hub?
    > > Could you have perhaps experienced a storm of traffic that created
    > > enough collisions to overload either unit (the hub or the Draytek) for
    > > a while?

    >
    > No the hub is just for monitoring the connection between Airport and
    > Draytek. I can monitor on the Draytek itself because it is a switch, not
    > a hub, and so I wouldn't get to see all the traffic.
    >
    > Of course, I suppose the ISP might have had some kind of DDOS attack for
    > just a few hours, but then I don't see why the Mac had no trouble
    > connecting out while the WinXP machine did.
    >
    > > Are the IP addresses that appear publically for the Mac WiFi and the
    > > Windows WiFi different, or do they NAT to the same address?

    >
    > Yes, the Draytek is the DHCP server and NAT device for the whole
    > network, while the Airport merely bridges between the wired and wireless
    > networks, so there is just one pool of internal private addresses for
    > hosts on my network (regardless of which wi-fi access point they're
    > connected to) and they all share the same public IP via NAT.
    >
    > > Lots of guesswork here, but I'm never one to turn down a troubleshoot.

    >
    >
    > That's the spirit! Do you think you'd be able to learn more from the
    > capture files? I'd be happy to make them available for download,
    > analysis and discussion, especially if people here are interested in
    > seeing if they can deduce the cause of the problem.
    >
    > --
    > James Taylor



  5. Re: Inexplicable sudden halt

    VivekRajan wrote:

    > Hi James,


    Hi, and thanks for contributing to my conundrum. I'm very grateful for
    your insights.

    > 1. Blocking Gnutella is not difficult. ISPs can just close UDP and TCP
    > 6346 , 6347.


    That's useful to know, thanks. I might be able to use this to set up my
    router to block Gnutella. Ideally I like to be able to allow download
    but block upload of files. Do you happen to know whether that's easy to
    do?

    > 2. However, as you correctly pointed out, it is unlikely they blocked
    > Gnutella because your connection opened up shortly afterwards and the
    > RSTs got through.


    So, based on the symptoms I've described, you seem confident that the
    ISP was *not* responsible for blocking the traffic, at least not
    deliberately. However, is it conceivable that a temporary configuration
    mistake at the ISP could have caused this?

    > 3. My guess is you experienced a transient problem with the NAT tables
    > on the Draytek. The NAT table could have been filled up due to tons of
    > outgoing connections from Gnuetella (quite a common occurance with some
    > Game Servers). When the NAT table on the Draytek fills up - what
    > happens next is implementation dependent.



    So it's probably time to contact the manufacturer, Draytek, to discover
    whether the observed behaviour is expected.

    > When the NAT table fills up, in most cases the router stops accepting
    > new requests, but will allow existing connections [to] continue.


    Yes, that seems sensible. However, in my situation the WinXP host's
    *existing* connections also stopped suddenly, and I'm therefore not sure
    whether a full NAT table can be the whole answer.

    > In some cases, the router will stop passing traffic altogether and will
    > wait for the NAT table entries themselves to timeout.


    Ug! That's just nasty. Can you name any NAT devices that behave that
    way? (So I can avoid purchasing them in future.)

    Do you think my Draytek could be one of those that behave as badly as
    this? If so, then surely it would have either continued receiving
    packets on existing connections or not received the RST packets. It
    seems unlikely that this is what happened given that the Draytek
    received the RST packets but did not receive normal packets on those
    same existing connections. Is there any way to explain this?

    > The NAT entry timeouts are quite large, I think it may even be a few
    > hours so it is almost sure to work fine the next day (after 12 hours
    > maybe ?)


    That certainly fits the observed behaviour.

    > Now, the question is why did the NAT work from the MAC ? That may be an
    > implementation issue. Like you said, the firmware may treat the wired
    > side and the wireless side separately.


    Maybe it would be possible to have limits on the number of NAT entries
    that each local host can fill up, but I don't see why anyone would think
    it desirable to do this given that it would be hard to predict how many
    entries to allow each host when you can't know in advance how many hosts
    are on the local network. Maybe it's just crudely divided into wired and
    wireless sides and each side gets half the NAT table. Is that likely, do
    you think?

    > How big are the capture files? We love to do packet digging.


    Oh great! I'd be very grateful if you'd take a look at the capture file.
    I'll try to limit it to the most relevant 10MB or less, upload it
    somewhere, and then post or email a link to it.

    I notice that your company produces a packet analysis tool called
    Unsniff. Is it any better than Ethereal/WireShark at this sort of
    diagnoses? Is there a version for the Mac or Linux? Is there a command
    line version for calling it from scripts?

    > Two things to look out for in the capture (1) what happens to individual
    > TCP connections and (2) what is the "NAT load" placed by the XP box ?


    I'm in the process of writing some scripts to parse tethereal output
    (the command line version of Ethereal) so that I can analyse these
    things. Can I assume that, by NAT load, you mean maximum simultaneous
    TCP connections? Should I include UDP too? What should the timeout be?
    Do RSTs and FINs allow me to regard the connection closed and the NAT
    entry freed immediately, or do they hang around in that situation too?

    Thanks again for your help.

    --
    James Taylor

  6. Re: Inexplicable sudden halt

    James Taylor wrote:

    > A Windows XP host [...] was file-sharing on the Gnutella
    > network when, all of a sudden, incoming traffic just ceased.
    > All that Ethereal picked up after the sudden halt was:
    >
    > * initially several TCP retransmissions by the WinXP host
    >
    > * then lots of attempted SYN packets sent out by the WinXP host
    > to Gnutella peers but all of which went unanswered. This behaviour
    > lasted for at least an hour, at which point I stopped capturing.
    >
    > * some DNS and NBNS name query packets (UDP) which also went answered
    >
    > * a few ARP packets (as normal) going back and forth between the
    > WinXP host and the Draytek router (via the AirPort bridge)
    >
    > * a few incoming TCP RST packets, the first of which was about
    > 10 mins after the sudden halt of all other incoming IP traffic
    > and the last of which was nearly 50 mins after.


    I've just established that these RST packets were all related to SYNs
    sent from the WinXP host during the period after the network "sudden
    halt" had commenced and before I stopped capturing. This is curious
    because 2346 SYNs were sent out during this period and only 18 RSTs came
    back. I cannot tell whether this lends credence to the theory that this
    problem was a NAT table exhaustion. It seems to me that if the NAT table
    were full, then SYNs would only get through the NAT to the outside world
    if a NAT entry became available to record it, in which case the return
    SYN+ACK would have got back through the NAT and the connection would
    continue as normal, whereas in actual fact the SYN+ACK never came, just
    the RST. Then I'm suspicious that the RSTs may have been spoofed by
    something rather than being genuine replies from remote hosts. What do
    you guys think?

    --
    James Taylor

  7. Re: Inexplicable sudden halt

    Hi James,

    > I've just established that these RST packets were all related to SYNs
    > sent from the WinXP host during the period after the network "sudden
    > halt" had commenced and before I stopped capturing. This is curious
    > because 2346 SYNs were sent out during this period and only 18 RSTs came


    This reinforces our suspicion that NAT overflow is the culprit. That is
    quite a lot of SYNs.

    I looked around in Google for NAT related problems with software like
    Limewire or BitTorrent. Check out
    http://community.boredofstudies.org/...-overtime.html


    > back. I cannot tell whether this lends credence to the theory that this
    > problem was a NAT table exhaustion. It seems to me that if the NAT table
    > were full, then SYNs would only get through the NAT to the outside world
    > if a NAT entry became available to record it, in which case the return
    > SYN+ACK would have got back through the NAT and the connection would
    > continue as normal, whereas in actual fact the SYN+ACK never came, just
    > the RST. Then I'm suspicious that the RSTs may have been spoofed by
    > something rather than being genuine replies from remote hosts. What do
    > you guys think?


    A firewall may send RSTs. I doubt if a NAT router will take that route
    especially under low resource conditions.

    Regards,
    Vivek Rajan
    Unleash Networks


  8. Re: Inexplicable sudden halt

    > > 1. Blocking Gnutella is not difficult. ISPs can just close UDP and TCP
    > > 6346 , 6347.

    >
    > That's useful to know, thanks. I might be able to use this to set up my
    > router to block Gnutella. Ideally I like to be able to allow download
    > but block upload of files. Do you happen to know whether that's easy to
    > do?
    >


    That could be tough. I have no idea how to setup a firewall to block
    gnutella like that. Perhaps you could try their mailing lists ?

    > > 2. However, as you correctly pointed out, it is unlikely they blocked
    > > Gnutella because your connection opened up shortly afterwards and the
    > > RSTs got through.

    >
    > So, based on the symptoms I've described, you seem confident that the
    > ISP was *not* responsible for blocking the traffic, at least not
    > deliberately. However, is it conceivable that a temporary configuration
    > mistake at the ISP could have caused this?
    >


    It is unlikely the ISP had a configuration problem because your MAC
    worked fine all along. So from the viewpoint of the connection between
    your premises and the ISP, everything is fine.

    So the next question is whether the ISP deliberately blocked Gnutella
    traffic for a few hours. That is a possibility, the root cause which
    might have led to your NAT tables overflowing. If your ISP has
    bandwidth problems they might resort to this trick !

    One way to test whether they block:

    1. The next time you observe this NAT table behavior, just close
    limewire and switch off the router.
    2. Turn the router back on and restart limewire
    3. If the same situation repeats, you can be sure they block it.




    > > When the NAT table fills up, in most cases the router stops accepting
    > > new requests, but will allow existing connections [to] continue.

    >
    > Yes, that seems sensible. However, in my situation the WinXP host's
    > *existing* connections also stopped suddenly, and I'm therefore not sure
    > whether a full NAT table can be the whole answer.
    >
    > > In some cases, the router will stop passing traffic altogether and will
    > > wait for the NAT table entries themselves to timeout.

    >
    > Ug! That's just nasty. Can you name any NAT devices that behave that
    > way? (So I can avoid purchasing them in future.)
    >


    Worse ! Some routers may even lock up entirely requiring a reset. I
    posted a URL in my other reply where someone was complaining about
    this.

    > Do you think my Draytek could be one of those that behave as badly as
    > this? If so, then surely it would have either continued receiving
    > packets on existing connections or not received the RST packets. It
    > seems unlikely that this is what happened given that the Draytek
    > received the RST packets but did not receive normal packets on those
    > same existing connections. Is there any way to explain this?
    >


    I am no expert, I just remember reading long back that atleast some
    implementations of NAT stopped processing and waited for the overload
    to clear. Today, I tried to google for it and came up empty handed.
    Perhaps I was wrong.


    On second thought, I think it is unlikely a good router like Draytek
    will stop passing good traffic altogether. The fact that you are not
    seeing traffic on already established connections, may be a further
    indication that the ISP blocked traffic. This of course led to the NAT
    table overflow.



    > > How big are the capture files? We love to do packet digging.

    >
    > Oh great! I'd be very grateful if you'd take a look at the capture file.
    > I'll try to limit it to the most relevant 10MB or less, upload it
    > somewhere, and then post or email a link to it.
    >
    > I notice that your company produces a packet analysis tool called
    > Unsniff. Is it any better than Ethereal/WireShark at this sort of
    > diagnoses? Is there a version for the Mac or Linux? Is there a command
    > line version for calling it from scripts?
    >


    You are welcome to download it and try it. It is a new product released
    in May. Unsniff Network analyzer is designed to analyze things other
    than just link layer packets. This allows you to see entire reassembled
    PDUs, and entire TCP streams, as first class entities.It is fully
    scriptable using Ruby or VBScript - so we could write detailed offline
    analysis tools. Please visit
    http://www.unleashnetworks.com/scripting.html for more.

    Yes, it wil help in this type of analysis, because here you are not
    really interested in looking at the bit wise contents of each link
    layer packet. If you wanted to do that Ethereal/Wireshark is the
    current king. You are really interested in the fate of entire TCP
    connections that went "dead". Unsniff will let you look at entire TCP
    connections in real time, so you can pick any one which went "dead" and
    draw a ladder diagram or open up individual segments of that
    connection.

    If you wanted to probe deeper, you could write an analysis script using
    Ruby that would plot (1) TCP NAT load (outstanding tcp connections) (2)
    UDP NAT load (3) active connections (4) silent connections and so
    forth. That would give you the ultimate view of what happened.

    Unfortunately, there is no version for the MAC or Linux.

    > > Two things to look out for in the capture (1) what happens to individual
    > > TCP connections and (2) what is the "NAT load" placed by the XP box ?

    >
    > I'm in the process of writing some scripts to parse tethereal output
    > (the command line version of Ethereal) so that I can analyse these
    > things. Can I assume that, by NAT load, you mean maximum simultaneous
    > TCP connections? Should I include UDP too? What should the timeout be?
    > Do RSTs and FINs allow me to regard the connection closed and the NAT
    > entry freed immediately, or do they hang around in that situation too?
    >


    You could try to plot TCP NAT load placed by the XP box = Num open
    connections
    UDP NAT = num sessions (ie. unique UDP tuples ) You could also try to
    plot number of active connections Vs number of silent connections.
    These tricky part is getting these timeout values, so you can either
    use one from Ciscos website or just model them as another variable.
    Maybe you can observe how the behavior varies with different timeouts.

    For Ciscos default timeouts visit :
    http://www.ciscopress.com/articles/a...&seqNum=5&rl=1

    Another source -
    http://www.dslreports.com/faq/9454


    Best Regards,
    Vivek Rajan
    Unleash Networks

    > Thanks again for your help.
    >
    > --
    > James Taylor



  9. Re: Inexplicable sudden halt

    VivekRajan wrote:

    > It is unlikely the ISP had a configuration problem because your MAC
    > worked fine all along. So from the viewpoint of the connection between
    > your premises and the ISP, everything is fine.


    Yes, but the Mac doesn't have LimeWire installed and so I didn't test
    Gnutella from the Mac, I only tested basic web and email access. It may
    still be the case that the ISP identified Gnutella hosts from the
    traffic that was swamping their network, and then blocked them en masse
    by IP address and port. They certainly weren't blocking at the
    application layer because they didn't wait until they saw a packet with
    Gnutella payload before blocking, the SYNs were "nipped in the bud".

    > So the next question is whether the ISP deliberately blocked Gnutella
    > traffic for a few hours. That is a possibility, the root cause which
    > might have led to your NAT tables overflowing. If your ISP has
    > bandwidth problems they might resort to this trick !


    Yes, I really think this is the most likely explanation I've heard so
    far: ISP blocking of Gnutella causing router NAT overflow in turn
    resulting in several hours denial of service of all IP protocols; a
    situation which only affected the WinXP machine because the Draytek
    router reserved NAT entries separately for its wi-fi side where the Mac
    was (although I've yet to check that it is actually the case that the
    Draytek does this).

    > One way to test whether they block:
    >
    > 1. The next time you observe this NAT table behavior, just close
    > limewire and switch off the router.
    > 2. Turn the router back on and restart limewire
    > 3. If the same situation repeats, you can be sure they block it.


    Would it be sufficient just to reset the router? I think LimeWire would
    recover once the connection was re-established and new attempts to
    connect to Gnutella peers ought to be successful until the NAT table
    fills up again. If, however, the ISP is blocking then it would be
    immediately obvious from the fact that no Gnutella traffic could get
    through at all.

    > On second thought, I think it is unlikely a good router like Draytek
    > will stop passing good traffic altogether. The fact that you are not
    > seeing traffic on already established connections, may be a further
    > indication that the ISP blocked traffic. This of course led to the NAT
    > table overflow.


    Agreed.

    > > I notice that your company produces a packet analysis tool called
    > > Unsniff. Is it any better than Ethereal/WireShark at this sort of
    > > diagnoses? Is there a version for the Mac or Linux? Is there a command
    > > line version for calling it from scripts?

    >
    > You are welcome to download it and try it.

    [snip]
    > Unfortunately, there is no version for the Mac or Linux.


    Thanks, it looks good, but I'd only be interested in running it on the
    Mac or Linux, or *anything* other than Windows in fact. Conducting any
    kind of network security monitoring from a Windows machine is a bit like
    going to war in a tank made of wet cardboard. Or to put it another way,
    if you're going to defend the lambs from the wolf, you don't appoint one
    of the lambs to do the job.

    > You could try to plot TCP NAT load placed by the XP box = Num open
    > connections
    > UDP NAT = num sessions (ie. unique UDP tuples ) You could also try to
    > plot number of active connections Vs number of silent connections.
    > These tricky part is getting these timeout values, so you can either
    > use one from Ciscos website or just model them as another variable.
    > Maybe you can observe how the behavior varies with different timeouts.


    Phew, that's a fair amount of work. It's an interesting project, but I
    think I'll have to make a judgement about how much time to spend on
    this. I need to focus on what's important, and the main thing is to
    understand whether the problem could happen again, and how to cure it
    when it does, or how to avoid it in the first place. I therefore need to
    establish whether the ISP was culpable, and to what degree the router
    failed in its duty.

    --
    James Taylor

  10. Re: Inexplicable sudden halt

    VivekRajan wrote:

    > James Taylor wrote:
    >
    > > I'm suspicious that the RSTs may have been spoofed by something
    > > rather than being genuine replies from remote hosts.

    >
    > A firewall may send RSTs. I doubt if a NAT router will take that route
    > especially under low resource conditions.


    I wasn't thinking of the router, I was thinking of the ISP. In fact I'm
    becoming increasingly suspicious of the ISP the more I think about this
    problem.

    --
    James Taylor

+ Reply to Thread