Linux ppp + MegaPOP dialup change = mrru related LCP timeout - PPP

This is a discussion on Linux ppp + MegaPOP dialup change = mrru related LCP timeout - PPP ; I have been using Newsguy's dialup ISP service for sometime now and have been happy with it, until last month when my Linux box could no longer establish a ppp connection. The symptom was that pppd errored out with "LCP: ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 21

Thread: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

  1. Linux ppp + MegaPOP dialup change = mrru related LCP timeout




    I have been using Newsguy's dialup ISP service for sometime now and
    have been happy with it, until last month when my Linux box could no
    longer establish a ppp connection. The symptom was that pppd errored
    out with "LCP: timeout sending Config-Requests". I had been running
    pppd version 2.4.1, upgrading to 2.4.3 did not help.

    The strange thing is that I can connect to the Atlanta numbers
    (e.g., 678-538-1522) without issue, but the problem does show up on
    the Augusta, Georgia line (706-849-0578). The Augusta line is on the
    MegaPOP network which is owned by Starnet (www.starnetinc.com).

    So, I enabled my pppd's debug option to see what was going on. With
    the Atlanta number, all is well:


    rcvd [LCP ConfReq id=0x1
    ]
    sent [LCP ConfAck id=0x1
    ]


    But, there is a problem with the Augusta number:


    sent [LCP ConfReq id=0x1 ]
    rcvd [LCP ConfReq id=0x1

    ]
    sent [LCP ConfRej id=0x1 ]


    After which point the host (peer from my machine's perspective) seems to
    ignore my machine's ConfRej message and simply reissues its previous
    ConfReq options until my machine times-out. I am not sure if the ppp
    software these ISP's use is even capable of full negotiation (that
    might be too much to ask). However, unless they do negotiate, they
    should not default to multilink operation ( above) which
    is for use with multiple modems (so as to get a ppp multiline
    connection over 56Kbps).

    The behavior of the Augusta ppp host seems to me to be a violation
    of ppp standards. Specifically, section 5.1.1 of RFC1990
    (http://www.ietf.org/rfc/rfc1990.txt) states:


    The presence of this [mrru] LCP option indicates that the system
    sending it implements the PPP Multilink Protocol. If not rejected,
    the system will construe all packets received on this link as being
    able to be processed by a common protocol machine with any other
    packets received from the same peer on any other link on which
    this option has been accepted.

  2. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In comp.protocols.ppp Michael Shell wrote:

    ....

    > sent [LCP ConfReq id=0x1 ]
    > rcvd [LCP ConfReq id=0x1
    >
    > ]
    > sent [LCP ConfRej id=0x1 ]


    > After which point the host (peer from my machine's perspective) seems to
    > ignore my machine's ConfRej message and simply reissues its previous
    > ConfReq options until my machine times-out.


    ....

    > 4. To see if there are any workarounds (which may involve pppd code hacks).


    > For the record, my /etc/ppp/options contains the lines:


    > nodetach
    > modem
    > crtscts
    > defaultroute
    > asyncmap 0
    > mtu 1500
    > mru 1500
    > noipdefault
    > lock
    > noauth
    > usepeerdns
    > noccp
    > lcp-echo-interval 30
    > lcp-echo-failure 4
    > noipx



    > I am running Linux kernel 2.6.8.1 with the options:


    > <*> PPP (point-to-point protocol) support
    > [ ] PPP multilink support (EXPERIMENTAL)
    >[*] PPP filtering
    > <*> PPP support for async serial ports
    > < > PPP support for sync tty ports
    > < > PPP Deflate compression
    > < > PPP BSD-Compress compression
    > < > PPP over Ethernet (EXPERIMENTAL)


    Try compiling multilink support above and then use the multilink
    option; it *might* be a workaround since my ISP, also via a regular
    landline, will negotiate MP and be happy with just one MP connection.

    I won't try to give detailed answers to your other questions. But the
    ISP's PPP implementation is simply broken in my eyes. My ISP will
    also accept a Configure-Reject of mrru and complete PPP negotiations.
    In addition it will complete negotiations when using the nomultilink
    option. I believe this is as it should be and that, generally, all
    your conclusions are correct.

    --
    Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
    PPP-Q&A links, downloads: http://ckite.no-ip.net/
    /* The generation of random numbers is too important to be left
    to chance. */

  3. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
    [...]
    > sent [LCP ConfReq id=0x1
    > ]
    > rcvd [LCP ConfReq id=0x1
    >
    > ]
    > sent [LCP ConfRej id=0x1 ]
    >
    >
    > After which point the host (peer from my machine's perspective) seems
    > to ignore my machine's ConfRej message and simply reissues its
    > previous ConfReq options until my machine times-out. I am not sure if

    [...]
    Isn't there a possibility that a zero ACCM can't be used here. This POP
    asks for 0x0A0000 whereas the other one asks for zero. It could be that
    your Config-Reject is getting lost because of ACCM problems and that is
    why the peer appears to be ignoring it. Does the peer respond to your
    Config-Request?


    [...]
    > One final rant is the irony of knowing that a lot of these ISP's were
    > built using Linux - at least they should return the favor by testing
    > their systems with Linux clients and not relying on the fact that
    > MS Windows clients can login as proof that everything is OK.
    >

    Won't it be funny if it turns out to be a ACCM issue?
    --
    Alan J. McFarlane
    http://www.alanjmcf.me.uk/
    Please follow-up in the newsgroup for the benefit of all.



  4. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In comp.protocols.ppp Alan McFarlane wrote:
    > In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
    > [...]
    >> sent [LCP ConfReq id=0x1
    >> ]
    >> rcvd [LCP ConfReq id=0x1
    >>
    >> ]
    >> sent [LCP ConfRej id=0x1 ]
    >>
    >>
    >> After which point the host (peer from my machine's perspective) seems
    >> to ignore my machine's ConfRej message and simply reissues its
    >> previous ConfReq options until my machine times-out. I am not sure if

    > [...]
    > Isn't there a possibility that a zero ACCM can't be used here. This POP
    > asks for 0x0A0000 whereas the other one asks for zero. It could be that
    > your Config-Reject is getting lost because of ACCM problems and that is
    > why the peer appears to be ignoring it. Does the peer respond to your
    > Config-Request?


    It's not an ACCM problem. At this point ACCM has not been negotiated
    and all Control Characters are escaped.

    --
    Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
    PPP-Q&A links, downloads: http://ckite.no-ip.net/
    /* 97.3% of all statistics are made up. */

  5. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In article news:nc8krc.218.ln@corncob.inetport.com, Clifford Kite wrote:
    > In comp.protocols.ppp Alan McFarlane
    > wrote:
    >> In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
    >> [...]

    [...]
    > It's not an ACCM problem. At this point ACCM has not been negotiated
    > and all Control Characters are escaped.
    >

    Ahh yes, apologies.
    --
    Alan J. McFarlane
    http://www.alanjmcf.me.uk/
    Please follow-up in the newsgroup for the benefit of all.



  6. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    On Thu, 6 Jan 2005 11:43:07 -0600
    Clifford Kite wrote:


    > Try compiling multilink support above and then use the multilink
    > option; it *might* be a workaround since my ISP, also via a regular
    > landline, will negotiate MP and be happy with just one MP connection.



    Thanks for the help, trying this, I get:


    CONNECT 45333/ARQ/V90/LAPM/V42BIS

    Connected!
    Serial connection established.
    using channel 1
    Starting negotiation on /dev/modem
    sent [LCP ConfReq id=0x1 ]
    rcvd [LCP ConfReq id=0x1 ]
    sent [LCP ConfAck id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    rcvd [LCP ConfReq id=0x2 ]
    sent [LCP ConfAck id=0x2 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    sent [LCP ConfReq id=0x1 ]
    LCP: timeout sending Config-Requests
    Connection terminated.



    What gets me is that the host never seems to alter its behavior
    regardless of what my machine sends - ConfRej of ConfAck. It is as
    if it never sees any of the data sent from my machine. I tried the
    asyncmap 0xa0000 option just for the heck of it, but it did not help.

    I note that the ppp standard has a lot of conditions for silently
    dropping packets. Could it be that they have a buggy ppp host that
    sees all Linux pppd generated LCP packets as being invalid? I have
    no idea how robust LCP packets are (7 bit, etc.). If so, I wonder
    how MS Windows does it differently.

    Another possibility is a modem firmware problem. That is, *after* my
    particular modem connects, their end never sees any of the data sent
    from my modem. I am using a TI chipset based 56Kbps hardware modem
    which I have never had a problem with. I can connect to my backup ISP
    without trouble, but with that Augusta number I have not been able to
    connect for a month (I would think that I would eventually get a "good"
    modem after dozens of tries). I tried connecting at 14.4Kbps, but this
    did not change anything. I even had the gall to bring my now ancient
    Hayes 2400 external modem out of mothball, but I could hear from the
    tones that modern modems have long since forgotten about the pre-14.4K
    days. (IMHO, 9600bps was the last time everything worked as it should.
    Of course, I would be able to check this with minicom if they still
    offered a "login: " prompt (which they don't).

    Yet another possibility is something related to this bogus "high speed
    dialup" (aka the AOL runner) feature everyone is offering. I sure hope
    that they do not require special bits for this to be sent during ppp
    negotiation.

    Now I am beginning to wonder if what they told me about MS Windows XP
    clients being able to connect is really true. Maybe that line is
    totally hosed and they are covering it up.


    Mike Shell

  7. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In comp.protocols.ppp Michael Shell wrote:
    > On Thu, 6 Jan 2005 11:43:07 -0600
    > Clifford Kite wrote:


    >> Try compiling multilink support above and then use the multilink
    >> option; it *might* be a workaround since my ISP, also via a regular
    >> landline, will negotiate MP and be happy with just one MP connection.


    ....

    > sent [LCP ConfReq id=0x1 ]
    > sent [LCP ConfReq id=0x1 ]
    > sent [LCP ConfReq id=0x1 ]
    > sent [LCP ConfReq id=0x1 ]
    > LCP: timeout sending Config-Requests
    > Connection terminated.


    > What gets me is that the host never seems to alter its behavior
    > regardless of what my machine sends - ConfRej of ConfAck. It is as
    > if it never sees any of the data sent from my machine. I tried the
    > asyncmap 0xa0000 option just for the heck of it, but it did not help.


    Okay, I focused on MP because it appears you use the same host and
    device file for each connection. There is only one other thing I know
    about that can cause the peer not to "hear" any of your LCP requests,
    given that a good serial connection is established and knowing that
    pppd is sending and receiving valid LCP requests.

    If the type of UART configured for the device file differs from the
    actual UART type then that would cause the problem. I still don't
    see how it's possible in this case since you can connect to the other
    POP, and seem to have no problem connecting to both until recently.
    But that's all I have left to suggest.

    (A FYI - the most common UART is a 16550A and configuring the device
    file for a 16550 won't work even though the package/manual for the
    serial device may say 16550. The UART type can be changed using the
    setserial program.)

    > I note that the ppp standard has a lot of conditions for silently
    > dropping packets. Could it be that they have a buggy ppp host that
    > sees all Linux pppd generated LCP packets as being invalid? I have
    > no idea how robust LCP packets are (7 bit, etc.). If so, I wonder
    > how MS Windows does it differently.


    I'm no longer sure it's a buggy peer. PPP is a standard and though
    PPP implementations vary they should be compatible enough to provide
    a connection (cell-phones excepted).

    > Another possibility is a modem firmware problem. That is, *after* my
    > particular modem connects, their end never sees any of the data sent
    > from my modem.


    Since it was able to connect to the troublesome POP previously, I don't
    see how firmware could be the problem unless something broke. I'd expect
    the other POP connection would also fail if that happened.

    ....

    > Yet another possibility is something related to this bogus "high speed
    > dialup" (aka the AOL runner) feature everyone is offering. I sure hope
    > that they do not require special bits for this to be sent during ppp
    > negotiation.


    I *think* that is accomplished with a server at the ISP that caches
    web pages and client software provided by the ISP to MS clients.

    --
    Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
    PPP-Q&A links, downloads: http://ckite.no-ip.net/

  8. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout



    OK, I decided to boot with MS Windows 2000 (same machine) and see if I
    could connect with that. Indeed, I could - the byte-level details of the
    log file are at the end of this post.

    Manually decoding the bytes in the MS log to a pppd-like format, I came up
    with this:


    sent [LCP ConfReq id=0x00 len=0x32 ]
    sent [LCP ConfReq id=0x01 len=0x32 ]
    rcvd [LCP ConfReq id=0x01 len=0x2c ]
    sent [LCP ConfAck id=0x01 len=0x2c ]
    rcvd [LCP ConfRej id=0x01 len=0x07 ]
    sent [LCP ConfReq id=0x02 len=0x2f
    rcvd [LCP ConfAck id=0x02 len=0x2f


    What the heck is going on?! This is the exact same hardware, so now I
    don't think it is a modem firmware issue. The 0D 03 06 LCP option
    from Windows 2000 is strange. My, possibly incorrect, interpretation of
    this is that it is the callback option (0x0d=13) of Section 2.3 of RFC1570.
    However, the operation code of 6 is strange in that RFC1570 only lists
    up to number 4. Furthermore, why in the heck would MS Windows be requesting
    a callback anyway?! The host does wakeup to it and reject it after which
    all is well. I have no idea if pppd can be configured to issue this
    strange option - I would try it if I could.

    The $10,000 question is why does the host seem to see the Windows 2000
    generated LCP packets, but not those from Linux's pppd? Remember, I can
    connect to other numbers just fine under Linux with the same setup,
    options and dialscripts, so the serial line/modem cannot be broken.


    I tried using a pppd option:

    endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98 .e9.00.00.00.00

    so as to more closely mimic MS Windows, but the host didn't respond any
    differently to it. Ditto for resetting the modem to factory defaults and
    trying the same mrru (1614) as MS Windows.

    I only see two possibilities:

    1. Something is going wrong at the byte level that causes the host
    to silently drop pppd's ConfAck and ConfRej's. I am assuming that my
    pppd would put something in the debug output if it received and
    dropped something improper from the host. I want to look at the
    bytelevel conversation between pppd and the host to see if anything
    differs from the LCP bytes MS Windows sends. What is the best way to
    eavesdrop on the conversation that flows through /dev/modem?

    2. That callback 0x06 invokes some special MS witchcraft.



    What a creepy situation!


    Mike




    Windows 2000 ppp log file details are as follows:
    -----
    ..
    ..
    [1072] 20:13:57:356: [1072] 20:13:57:356: [1072] 20:13:57:356: [1072] 20:13:57:356: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
    [1072] 20:13:57:356: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
    [1072] 20:13:57:356: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:57:356:
    [1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=0,EventType=3,fAuth=0
    [1072] 20:13:59:359: Recv timeout event received for portid=0,Id=0,Protocol=c021,fAuth=0
    [1072] 20:13:59:359: NotifyCaller(hPort=5, dwMsgId=9)
    [1072] 20:13:59:359: [1072] 20:13:59:359: [1072] 20:13:59:359: [1072] 20:13:59:359: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
    [1072] 20:13:59:359: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
    [1072] 20:13:59:359: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:359:
    [1072] 20:13:59:359: InsertInTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
    [1016] 20:13:59:509: Packet received (46 bytes) for hPort 5
    [1072] 20:13:59:509: >PPP packet received at 01/08/2005 01:13:59:509
    [1072] 20:13:59:509: >Protocol = LCP, Type = Configure-Req, Length = 0x2e, Id = 0x1, Port = 5
    [1072] 20:13:59:509: >C0 21 01 01 00 2C 01 04 05 DD 02 06 00 0A 00 00 |.!...,..........|
    [1016] 20:13:59:519: Packet received (9 bytes) for hPort 5
    [1072] 20:13:59:509: >03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
    [1072] 20:13:59:509: >05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: [1072] 20:13:59:519: [1072] 20:13:59:519: [1072] 20:13:59:519: <03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
    [1072] 20:13:59:519: <05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: >PPP packet received at 01/08/2005 01:13:59:519
    [1072] 20:13:59:519: >Protocol = LCP, Type = Configure-Reject, Length = 0x9, Id = 0x1, Port = 5
    [1072] 20:13:59:519: >C0 21 04 01 00 07 0D 03 06 00 00 00 00 00 00 00 |.!..............|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: RemoveFromTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:59:519: [1072] 20:13:59:519: [1072] 20:13:59:519: [1072] 20:13:59:519: <15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
    [1072] 20:13:59:519: [1072] 20:13:59:519: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: InsertInTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
    [1016] 20:13:59:700: Packet received (49 bytes) for hPort 5
    [1072] 20:13:59:700: >PPP packet received at 01/08/2005 01:13:59:700
    [1072] 20:13:59:700: >Protocol = LCP, Type = Configure-Ack, Length = 0x31, Id = 0x2, Port = 5
    [1072] 20:13:59:700: >C0 21 02 02 00 2F 02 06 00 00 00 00 05 06 3A 0B |.!.../........:.|
    [1072] 20:13:59:700: >15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
    [1072] 20:13:59:700: >B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 00 00 00 |.-.G.....Px.....|
    [1072] 20:13:59:700: >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:700:
    [1072] 20:13:59:700: RemoveFromTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:59:700: FsmThisLayerUp called for protocol = c021, port = 5
    [1072] 20:13:59:700: LCP Local Options-------------
    [1072] 20:13:59:700: MRU=1500,ACCM=0,Auth=0,MagicNumber=973804942,PFC=O N,ACFC=ON
    [1072] 20:13:59:700: Recv Framing = PPP Multilink,SSHF=OFF,MRRU=1614,LinkDiscrim=0,BAP=OFF
    [1072] 20:13:59:700: ED Class = 1, ED Value = 1c793bb12d8c47d09bfca8ca507898e900000000
    [1072] 20:13:59:700: LCP Remote Options-------------
    [1072] 20:13:59:700: MRU=1501,ACCM=655360,Auth=c023,MagicNumber=1220272 450,PFC=ON,ACFC=ON
    [1072] 20:13:59:700: Send Framing = PPP Multilink,SSHF=OFF,MRRU=1524,LinkDiscrim=0
    [1072] 20:13:59:700: ED Class = 1, ED Value = 776463342d6c6e73310000000000000000000000
    [1072] 20:13:59:700: LCP Configured successfully
    ..
    ..
    ---
    EOM

  9. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    In comp.protocols.ppp Michael Shell wrote:

    > OK, I decided to boot with MS Windows 2000 (same machine) and see if I
    > could connect with that. Indeed, I could - the byte-level details of the
    > log file are at the end of this post.


    > Manually decoding the bytes in the MS log to a pppd-like format, I came up
    > with this:



    > sent [LCP ConfReq id=0x00 len=0x32 ]
    > sent [LCP ConfReq id=0x01 len=0x32 ]
    > rcvd [LCP ConfReq id=0x01 len=0x2c ]
    > sent [LCP ConfAck id=0x01 len=0x2c ]
    > rcvd [LCP ConfRej id=0x01 len=0x07 ]
    > sent [LCP ConfReq id=0x02 len=0x2f
    > rcvd [LCP ConfAck id=0x02 len=0x2f



    > What the heck is going on?! This is the exact same hardware, so now I
    > don't think it is a modem firmware issue. The 0D 03 06 LCP option
    > from Windows 2000 is strange. My, possibly incorrect, interpretation of
    > this is that it is the callback option (0x0d=13) of Section 2.3 of RFC1570.


    You are correct, it is the call-back option.

    > However, the operation code of 6 is strange in that RFC1570 only lists
    > up to number 4. Furthermore, why in the heck would MS Windows be requesting
    > a callback anyway?! The host does wakeup to it and reject it after which
    > all is well. I have no idea if pppd can be configured to issue this
    > strange option - I would try it if I could.


    It's a MS thing (used google to find this):

    http://www.microsoft.com/resources/d...dura_tools.asp

    Search the page for CBCP, read about the 6, then search for CBCP again
    for a section titled CBCP. You may be able to make more sense of it
    than I could. My read of it is that the 6 tells the peer that the
    request is to use MS CBCP to negotiate call-back after authentication.

    Pppd has an undocumented option named `callback' that might generate
    what the MS side of your host generated. But you'll likely have to
    edit pppd/Makefile in the pppd source, uncomment the second line below:

    # Enable Microsoft proprietary Callback Control Protocol
    #CBCP=y

    and recompile. I don't have it compiled into pppd here and so can't
    readily test the option. It may or may not take a call-back number
    as a value (callback ) - I'm not a PPP implementor and my C
    reading skill is low.

    > The $10,000 question is why does the host seem to see the Windows 2000
    > generated LCP packets, but not those from Linux's pppd? Remember, I can
    > connect to other numbers just fine under Linux with the same setup,
    > options and dialscripts, so the serial line/modem cannot be broken.


    I've been hoping that James Carlson would participate but his last
    post here was on Monday. He's a regular poster here and the one most
    likely to come up with an answer. It is indeed a "creepy" problem;
    I've never seen anything quite like it.

    BTW, your tenacity and your manual translation of raw hex to Linux
    PPP-ese are admirable. Also impressive is knowing what information
    could be useful in finding a solution to the problem and trying to
    imitate the MS requests before posting - sometimes that works.

    --
    Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
    PPP-Q&A links, downloads: http://ckite.no-ip.net/

  10. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout




    Well, I finally solved the mystery - and it took some doing to uncover
    it. I used serial port sniffers under both the Linux (slsnif) and
    Windows to see exactly what each was sending over the link.

    Everything looked great with each PPP frame, which just deepened the
    riddle.

    I even made sure that Linux was using the exact same modem reset and
    initialization strings that MS Windows was using - to no avail. BTW,
    for future reference, Clifford's advice on the pppd code did indeed
    allow me to enable the option under Linux, but
    unfortunately this did not change anything either.

    I then noticed that the bad host was not even sending me a TermAck to my
    TermReq when I used control-c to prematurely shutdown a connect attempt.
    LCP Termination Requests are very simple packets - there isn't much that
    can go wrong with them.

    So I decided to take another look at my connect script. The final part
    of my chat script went like this:



    TIMEOUT 50 \
    SAY "\nWaiting for Connection..." \
    ECHO ON \
    "ONNEC" "\c" \
    "\n" "\r\n" \
    SAY "\nConnected!\n"



    What could possibly go wrong here you might ask? Plenty, if the PPP
    host has a framing parser so fragile that it cannot withstand a leading
    carriage return and/or line feed before the PPP negotiation sequence
    begins!

    That's right folks, the initial \r\n permanently broke the host's PPP
    frame receiver code! After that happens, you can send all the properly
    formed LCP packets you want and the host will never see any of them!
    But, it will continue to send out its own ConfReq's. The person who
    wrote that crappy PPP code outta be run out of town.

    The reason I even have this initial new line in there was that some
    time in the past, some ISP's PPP or login code would not "wake up"
    until it received a CR or LF after connect.

    For the record, you can't do any of these:


    "\n" "\r\c" \
    "\n" "\n\c" \
    "\n" "\r\n\c" \


    However, you can do these:


    "\n" "\c" \
    "\n" "\N\c" \
    "\n" "\s\c" \


    So, nulls and spaces don't hang the receiver, but CR and/or LF does.

    I decided to put in a little delay for good measure, so:


    "\n" "\p\c" \


    is what I use now and all is well.


    I am pretty sure this will help somebody out in the future. Can you
    imagine what those poor souls with modems that happen to output a
    spurious new line just after initial connect will go through?!

    I'd sure like to know the name and version of this fragile PPP
    software so that people can be warned about it. Geeezzzz.



    Thanks for all your help and advice,

    Mike Shell


  11. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    Michael Shell writes:
    > What could possibly go wrong here you might ask? Plenty, if the PPP
    > host has a framing parser so fragile that it cannot withstand a leading
    > carriage return and/or line feed before the PPP negotiation sequence
    > begins!


    Typically, dial-in servers attempt to detect what protocol the peer is
    using automatically. If the server sees a carriage return, then it
    assumes that it's a human at a regular tty, not a machine using PPP.

    It's obviously not the best way to do things. A better way than this
    is to spit out a text message welcoming the user (which will just be
    discarded by any PPP-speaking peer), but _continuously_ look for PPP
    data on input and switch modes when appropriate, rather than switching
    on the first one or two characters. Doing that right takes a little
    more than a minute's thought, though, so it's often not done.

    Plus, there's the Windows-effect to consider: most ISP equipment these
    days is designed for the least-common-denominator. If it works with
    Windows DUN, then that's "good enough." It doesn't have to work well
    anywhere else.

    (The same is unfortunately true of a lot of consumer gear these days.)

    > That's right folks, the initial \r\n permanently broke the host's PPP
    > frame receiver code! After that happens, you can send all the properly
    > formed LCP packets you want and the host will never see any of them!
    > But, it will continue to send out its own ConfReq's. The person who
    > wrote that crappy PPP code outta be run out of town.


    I don't think it's the server that's bad. The chat script was bad.

    > I'd sure like to know the name and version of this fragile PPP
    > software so that people can be warned about it. Geeezzzz.


    Ask the ISP. But it's likely that it's one of the many commercial
    versions, and you just suffered from having a bad chat script.

    --
    James Carlson, IP Systems Group
    Sun Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
    MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677

  12. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    On 20 Jan 2005 09:21:33 -0500
    James Carlson wrote:

    > Typically, dial-in servers attempt to detect what protocol the peer is
    > using automatically. If the server sees a carriage return, then it
    > assumes that it's a human at a regular tty, not a machine using PPP.



    That I can understand, but remember that the ISP in question continued
    to send valid PPP ConfReq requests, but ignored all my PPP ConfAck
    responses. Something is obviously broken on their end. There is no text
    based login with their system. The very reason ISPs went to pure PPP login
    and skipped the text based login altogether is because of the difficulties
    of handling tech support for all the other different types of
    login/Login/username, text based configurations. Going deaf after the first
    CRLF kind of defeats the purpose of the default PPP approach because it is,
    IMHO, unsafe to trust the first few characters after the initial connect -
    there is always the possibility that the client will still be chatting with
    the modem or the modem itself may issue a CR at first connect (I've never
    personally seen this, but it would not surprise me in the least if some
    modems did just that). The PPP protocol was designed to handle all types
    of these kinds of initial missteps.


    > I don't think it's the server that's bad. The chat script was bad.



    I agree that my end did something that it should not have. However, remember
    that the ISP continued to send valid ConfReq requests - and so this is a
    PPP protocol issue (because it happened within PPP negotiation) and I
    don't think the ISP is allowed to do this according to the PPP standards -
    invalid PPP data should be silently discarded and then one should resume
    scanning for valid PPP config requests - the latter of which was not done.



    Mike Shell

  13. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    Michael Shell writes:

    > On 20 Jan 2005 09:21:33 -0500
    > James Carlson wrote:
    >
    > > Typically, dial-in servers attempt to detect what protocol the peer is
    > > using automatically. If the server sees a carriage return, then it
    > > assumes that it's a human at a regular tty, not a machine using PPP.

    >
    >
    > That I can understand, but remember that the ISP in question continued
    > to send valid PPP ConfReq requests, but ignored all my PPP ConfAck
    > responses. Something is obviously broken on their end.


    If I remember correctly, your LCP negotiation started off strangely,
    with one side (probably theirs) suggesting an asyncmap (ACCM) of
    0xa0000, and the other (probably yours) suggesting 0. That's
    technically legal per RFC 1662, but is often in practice a good
    indicator of bugs in the peer implementation, and usually results in a
    failure to negotiate that's remarkably similar to what you saw.

    The fix is to add "asyncmap 0xa0000" to your configuration, after some
    obligatory swearing at the people who built the bad implementation.

    > There is no text
    > based login with their system. The very reason ISPs went to pure PPP login
    > and skipped the text based login altogether is because of the difficulties
    > of handling tech support for all the other different types of
    > login/Login/username, text based configurations.


    Sure.

    > Going deaf after the first
    > CRLF kind of defeats the purpose of the default PPP approach because it is,
    > IMHO, unsafe to trust the first few characters after the initial connect -
    > there is always the possibility that the client will still be chatting with
    > the modem or the modem itself may issue a CR at first connect (I've never
    > personally seen this, but it would not surprise me in the least if some
    > modems did just that). The PPP protocol was designed to handle all types
    > of these kinds of initial missteps.


    I'm pretty sure I know something like that.

    > > I don't think it's the server that's bad. The chat script was bad.

    >
    >
    > I agree that my end did something that it should not have. However, remember
    > that the ISP continued to send valid ConfReq requests - and so this is a
    > PPP protocol issue (because it happened within PPP negotiation) and I
    > don't think the ISP is allowed to do this according to the PPP standards -
    > invalid PPP data should be silently discarded and then one should resume
    > scanning for valid PPP config requests - the latter of which was not done.


    I'm not sure I understand what you're saying here, and I don't see any
    specific error that is directly traceable to a violation of any of the
    standards.

    If the other side cannot hear your side due to communications errors
    (which is what I expect is going on here during the failure scenario),
    then it rightly should continue sending the same Configure-Request
    messages at each Restart timer expiry until the restart limit is
    reached.

    There's no way that any of the PPP documents can require the peer to
    do what it is unable to do. If the packets are getting garbled in
    transit (which I expect is true, given the symptoms), there's not much
    the peer can do but allow the connection to fail and hope the human
    can fix things.

    Now if the peer is switching the ACCM too early (before LCP is in
    Opened state) or if the implementor confused the transmit and receive
    directions for the escaping logic (altogether *way* too common), then
    that's indeed an implementation bug. The real issue, though, is the
    lack of interoperability, not the conformance (or lack thereof) with
    respect to the standards.

    --
    James Carlson, IP Systems Group
    Sun Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
    MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677

  14. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    On 21 Jan 2005 13:32:36 -0500
    James Carlson wrote:

    > If the other side cannot hear your side due to communications errors
    > (which is what I expect is going on here during the failure scenario),
    > then it rightly should continue sending the same Configure-Request
    > messages at each Restart timer expiry until the restart limit is
    > reached.




    See, this is what is so surprising about the whole thing and why
    nobody, including myself, suspected this type of bug triggered by
    the chat script. The comm link was/is fine and without error. However,
    when a leading CR and/or LF is sent to the host at the start of
    PPP negotiations, the host receiver will "lock-up" and never
    be able to see any of my PPP Conf Requests or Acks from that
    point on. However, the host will continue to transmit its own
    valid Conf Requests - indicating clearly that it is trying to
    establish a PPP connection. I can watch the whole thing unfold
    at the byte level using a serial line sniffer and I can
    reproduce the problem at will by sending one LF just prior to
    end of the chat script - as well as avoiding the problem and
    getting a good PPP negotiation by removing the spurious LF.

    I just know this bug is going to bite others and when it does
    it is a real bear to understand what the heck is going wrong.

    I did ask my ISP what software is being used, but it is
    unlikely that they'll ever tell me this. I'd sure like to
    know if anybody knows the make of crappy code that does
    this.



    Mike


  15. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    Michael Shell writes:

    >On 21 Jan 2005 13:32:36 -0500
    >James Carlson wrote:


    >> If the other side cannot hear your side due to communications errors
    >> (which is what I expect is going on here during the failure scenario),
    >> then it rightly should continue sending the same Configure-Request
    >> messages at each Restart timer expiry until the restart limit is
    >> reached.



    Seems to me that you have two options-- 1) Figure out what you can do to
    make the link work.
    2) Rant and rail against the rest of the world and how if only they did
    things better it would make it easier for you.



    >See, this is what is so surprising about the whole thing and why
    >nobody, including myself, suspected this type of bug triggered by
    >the chat script. The comm link was/is fine and without error. However,


    That kind of bug is EXTREMELY common. Yes, there are some pretty bad
    programmers out there.

    >when a leading CR and/or LF is sent to the host at the start of
    >PPP negotiations, the host receiver will "lock-up" and never
    >be able to see any of my PPP Conf Requests or Acks from that
    >point on. However, the host will continue to transmit its own
    >valid Conf Requests - indicating clearly that it is trying to


    As Carlson said, the reason may very very well have been that it demands an
    asyncmap of a000 and you did 0000. Nothing illegal, but it is very well
    known in the community that is a recipie for disaster. There is a badly
    written program out there ( by some organsation from Washington State) that
    breaks in that situation. Should it ? No. Does it? Yes. Should it be fixed?
    Yes. do you have the influence to get it done? Probably not.

    >establish a PPP connection. I can watch the whole thing unfold
    >at the byte level using a serial line sniffer and I can
    >reproduce the problem at will by sending one LF just prior to
    >end of the chat script - as well as avoiding the problem and
    >getting a good PPP negotiation by removing the spurious LF.


    So, remove it.


    >I just know this bug is going to bite others and when it does
    >it is a real bear to understand what the heck is going wrong.


    It has for many many years. So? Windows is set up not to trigger this bug.
    Do you think that MS is going to change things to make life for other
    operating systems easier? I could rather suspect it is there on purpose to
    make life as hard as possible for others.



    >I did ask my ISP what software is being used, but it is
    >unlikely that they'll ever tell me this. I'd sure like to
    >know if anybody knows the make of crappy code that does
    >this.


    What code do you think most ISPs run?



  16. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    On 22 Jan 2005 18:41:23 GMT
    unruh@string.physics.ubc.ca (Bill Unruh) wrote:


    > Seems to me that you have two options-- 1) Figure out what you can do to
    > make the link work. 2) Rant and rail against the rest of the world
    > and how if only they did things better it would make it easier for you.



    I did and I am not. I simply wanted to track down the source of the
    problem for the benefit of future readers of this thread - a lesser poster
    would not have followed-up after he got his system working.


    > As Carlson said, the reason may very very well have been that
    > it demands an asyncmap of a000 and you did 0000.



    This is a bit misleading because it implies a configuration problem
    with pppd rather than a *single* unescaped LF at the very start of
    PPP negotiation. If you had read the entire thread from the beginning
    before posting, you would have seen that a different pppd asymcmap
    setting was one of the very first things we checked, for on Jan 6th,
    Clifford Kite wrote:


    : It's not an ACCM problem. At this point ACCM has not been negotiated
    : and all Control Characters are escaped.


    Indeed, setting my asyncmap to match that asked of the host had no
    effect. Of course, the origin of my single unescaped LF was "outside"
    of pppd, but we would not expect this single rouge LF to hang the
    entire receiver of the host for the remainder of the call.


    > Do you think that MS is going to change things to make life for other
    > operating systems easier?



    Of course not.


    > I could rather suspect it is there on purpose to make life as hard
    > as possible for others.



    I agree that they often do just that, but somehow I don't feel that this
    is the case here. As I tried to point out several times, there might
    be some *hardware* running Windows that could be bitten by this bug.


    > What code do you think most ISPs run?



    I don't know as after using dozens of dialup numbers from several
    different ISPs over half a decade with the exact same chat scripts, this
    is the first time I've run into a problem quite like this and it does not
    occur with the several other dialup numbers that I have tried - which is
    why I am/was so curious about it.

    Have some heart. It took me a lot of effort to track down this "simple"
    problem even with the generous help and advice of other posters; and I am
    not a newbie. I am sure that the info here will help somebody out of a
    jam in the future - maybe even somebody running MS Windows.



    Mike


  17. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    Michael Shell writes:

    >On 22 Jan 2005 18:41:23 GMT
    >unruh@string.physics.ubc.ca (Bill Unruh) wrote:



    >> Seems to me that you have two options-- 1) Figure out what you can do to
    >> make the link work. 2) Rant and rail against the rest of the world
    >> and how if only they did things better it would make it easier for you.



    >I did and I am not. I simply wanted to track down the source of the
    >problem for the benefit of future readers of this thread - a lesser poster
    >would not have followed-up after he got his system working.



    >> As Carlson said, the reason may very very well have been that
    >> it demands an asyncmap of a000 and you did 0000.



    >This is a bit misleading because it implies a configuration problem
    >with pppd rather than a *single* unescaped LF at the very start of
    >PPP negotiation. If you had read the entire thread from the beginning
    >before posting, you would have seen that a different pppd asymcmap
    >setting was one of the very first things we checked, for on Jan 6th,
    >Clifford Kite wrote:



    >: It's not an ACCM problem. At this point ACCM has not been negotiated
    >: and all Control Characters are escaped.



    >Indeed, setting my asyncmap to match that asked of the host had no
    >effect. Of course, the origin of my single unescaped LF was "outside"
    >of pppd, but we would not expect this single rouge LF to hang the
    >entire receiver of the host for the remainder of the call.


    I agree that one would not expect it. On the other hand, I spent some time
    trying to understand what actually happened in the real world when I wrote
    www.theory.physics.ubc.ca/ppp-linux.html
    I came to the conclusion that the ways ISPs had of screwing up were
    infinite. Most of them are ways that should not happen, that properly
    written/set up pppd's would not do those things, but nevertheless they
    did. Perhaps I am just too cynical.




  18. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout


    h@string.physics.ubc.ca wrote:
    > Michael Shell writes:
    > >On 22 Jan 2005 18:41:23 GMT
    > >unruh@string.physics.ubc.ca (Bill Unruh) wrote:
    > >> Seems to me that you have two options-- 1) Figure out what you can do to
    > >> make the link work. 2) Rant and rail against the rest of the world
    > >> and how if only they did things better it would make it easier for you.

    > >I did and I am not. I simply wanted to track down the source of the
    > >problem for the benefit of future readers of this thread - a lesser poster
    > >would not have followed-up after he got his system working.
    > >> As Carlson said, the reason may very very well have been that
    > >> it demands an asyncmap of a000 and you did 0000.

    > >This is a bit misleading because it implies a configuration problem
    > >with pppd rather than a *single* unescaped LF at the very start of
    > >PPP negotiation. If you had read the entire thread from the beginning
    > >before posting, you would have seen that a different pppd asymcmap
    > >setting was one of the very first things we checked, for on Jan 6th,
    > >Clifford Kite wrote:
    > >: It's not an ACCM problem. At this point ACCM has not been negotiated
    > >: and all Control Characters are escaped.
    > >Indeed, setting my asyncmap to match that asked of the host had no
    > >effect. Of course, the origin of my single unescaped LF was "outside"
    > >of pppd, but we would not expect this single rouge LF to hang the
    > >entire receiver of the host for the remainder of the call.

    > I agree that one would not expect it. On the other hand, I spent some time
    > trying to understand what actually happened in the real world when I wrote
    > www.theory.physics.ubc.ca/ppp-linux.html
    > I came to the conclusion that the ways ISPs had of screwing up were
    > infinite. Most of them are ways that should not happen, that properly
    > written/set up pppd's would not do those things, but nevertheless they
    > did. Perhaps I am just too cynical.


    Great

    damon_w@fobsig.org


  19. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    Well - it seems I have almost the same problem here.
    I can do what I want, send what I want, the remote side is simply ignoring
    everything and sends

    LCP ConfReq id=0x3

    even if I offer (and ACK) a pap auth. To make sure I am not sending any
    junk, I patched serial_core.c, uart_write() to dump everythink I am
    sending. No linefeeds or cr's in my data stream.

    Needless to say, the card (Audiovox RTM-8000 CF) works flawlessly in my
    Sharp Zaurus (linux 2.4.20) and with the very Laptop under Windows.

    I have no idea and *any* help would be greatly appreciated.

    Michaela



  20. Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout

    "steyla" writes:

    >Well - it seems I have almost the same problem here.
    >I can do what I want, send what I want, the remote side is simply ignoring
    >everything and sends


    >LCP ConfReq id=0x3


    >even if I offer (and ACK) a pap auth. To make sure I am not sending any
    >junk, I patched serial_core.c, uart_write() to dump everythink I am
    >sending. No linefeeds or cr's in my data stream.


    >Needless to say, the card (Audiovox RTM-8000 CF) works flawlessly in my
    >Sharp Zaurus (linux 2.4.20) and with the very Laptop under Windows.


    >I have no idea and *any* help would be greatly appreciated.


    YOu give almost no information.
    That is indicative that the other end is not getting your messages, or they
    are garbled. Often this is indicated by the far side asking for a non-zero
    async map.
    iYou must use exactly the same asyncmap .

    If you want more help post the output exactly , cuta nad paste, including
    the time stamps, from the syslog logs ( daemon log )


    >Michaela




+ Reply to Thread
Page 1 of 2 1 2 LastLast