Linux ppp + MegaPOP dialup change = mrru related LCP timeout - PPP
This is a discussion on Linux ppp + MegaPOP dialup change = mrru related LCP timeout - PPP ; I have been using Newsguy's dialup ISP service for sometime now and
have been happy with it, until last month when my Linux box could no
longer establish a ppp connection. The symptom was that pppd errored
out with "LCP: ...
-
Linux ppp + MegaPOP dialup change = mrru related LCP timeout
I have been using Newsguy's dialup ISP service for sometime now and
have been happy with it, until last month when my Linux box could no
longer establish a ppp connection. The symptom was that pppd errored
out with "LCP: timeout sending Config-Requests". I had been running
pppd version 2.4.1, upgrading to 2.4.3 did not help.
The strange thing is that I can connect to the Atlanta numbers
(e.g., 678-538-1522) without issue, but the problem does show up on
the Augusta, Georgia line (706-849-0578). The Augusta line is on the
MegaPOP network which is owned by Starnet (www.starnetinc.com).
So, I enabled my pppd's debug option to see what was going on. With
the Atlanta number, all is well:
rcvd [LCP ConfReq id=0x1
]
sent [LCP ConfAck id=0x1
]
But, there is a problem with the Augusta number:
sent [LCP ConfReq id=0x1 ]
rcvd [LCP ConfReq id=0x1
]
sent [LCP ConfRej id=0x1 ]
After which point the host (peer from my machine's perspective) seems to
ignore my machine's ConfRej message and simply reissues its previous
ConfReq options until my machine times-out. I am not sure if the ppp
software these ISP's use is even capable of full negotiation (that
might be too much to ask). However, unless they do negotiate, they
should not default to multilink operation ( above) which
is for use with multiple modems (so as to get a ppp multiline
connection over 56Kbps).
The behavior of the Augusta ppp host seems to me to be a violation
of ppp standards. Specifically, section 5.1.1 of RFC1990
(http://www.ietf.org/rfc/rfc1990.txt) states:
The presence of this [mrru] LCP option indicates that the system
sending it implements the PPP Multilink Protocol. If not rejected,
the system will construe all packets received on this link as being
able to be processed by a common protocol machine with any other
packets received from the same peer on any other link on which
this option has been accepted.
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In comp.protocols.ppp Michael Shell wrote:
....
> sent [LCP ConfReq id=0x1 ]
> rcvd [LCP ConfReq id=0x1
>
> ]
> sent [LCP ConfRej id=0x1 ]
> After which point the host (peer from my machine's perspective) seems to
> ignore my machine's ConfRej message and simply reissues its previous
> ConfReq options until my machine times-out.
....
> 4. To see if there are any workarounds (which may involve pppd code hacks).
> For the record, my /etc/ppp/options contains the lines:
> nodetach
> modem
> crtscts
> defaultroute
> asyncmap 0
> mtu 1500
> mru 1500
> noipdefault
> lock
> noauth
> usepeerdns
> noccp
> lcp-echo-interval 30
> lcp-echo-failure 4
> noipx
> I am running Linux kernel 2.6.8.1 with the options:
> <*> PPP (point-to-point protocol) support
> [ ] PPP multilink support (EXPERIMENTAL)
>[*] PPP filtering
> <*> PPP support for async serial ports
> < > PPP support for sync tty ports
> < > PPP Deflate compression
> < > PPP BSD-Compress compression
> < > PPP over Ethernet (EXPERIMENTAL)
Try compiling multilink support above and then use the multilink
option; it *might* be a workaround since my ISP, also via a regular
landline, will negotiate MP and be happy with just one MP connection.
I won't try to give detailed answers to your other questions. But the
ISP's PPP implementation is simply broken in my eyes. My ISP will
also accept a Configure-Reject of mrru and complete PPP negotiations.
In addition it will complete negotiations when using the nomultilink
option. I believe this is as it should be and that, generally, all
your conclusions are correct.
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
/* The generation of random numbers is too important to be left
to chance. */
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
[...]
> sent [LCP ConfReq id=0x1
> ]
> rcvd [LCP ConfReq id=0x1
>
> ]
> sent [LCP ConfRej id=0x1 ]
>
>
> After which point the host (peer from my machine's perspective) seems
> to ignore my machine's ConfRej message and simply reissues its
> previous ConfReq options until my machine times-out. I am not sure if
[...]
Isn't there a possibility that a zero ACCM can't be used here. This POP
asks for 0x0A0000 whereas the other one asks for zero. It could be that
your Config-Reject is getting lost because of ACCM problems and that is
why the peer appears to be ignoring it. Does the peer respond to your
Config-Request?
[...]
> One final rant is the irony of knowing that a lot of these ISP's were
> built using Linux - at least they should return the favor by testing
> their systems with Linux clients and not relying on the fact that
> MS Windows clients can login as proof that everything is OK.
>
Won't it be funny if it turns out to be a ACCM issue?
--
Alan J. McFarlane
http://www.alanjmcf.me.uk/
Please follow-up in the newsgroup for the benefit of all.
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In comp.protocols.ppp Alan McFarlane wrote:
> In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
> [...]
>> sent [LCP ConfReq id=0x1
>> ]
>> rcvd [LCP ConfReq id=0x1
>>
>> ]
>> sent [LCP ConfRej id=0x1 ]
>>
>>
>> After which point the host (peer from my machine's perspective) seems
>> to ignore my machine's ConfRej message and simply reissues its
>> previous ConfReq options until my machine times-out. I am not sure if
> [...]
> Isn't there a possibility that a zero ACCM can't be used here. This POP
> asks for 0x0A0000 whereas the other one asks for zero. It could be that
> your Config-Reject is getting lost because of ACCM problems and that is
> why the peer appears to be ignoring it. Does the peer respond to your
> Config-Request?
It's not an ACCM problem. At this point ACCM has not been negotiated
and all Control Characters are escaped.
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
/* 97.3% of all statistics are made up. */
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In article news:nc8krc.218.ln@corncob.inetport.com, Clifford Kite wrote:
> In comp.protocols.ppp Alan McFarlane
> wrote:
>> In article news:20050106004854.282a69ec@bashir, Michael Shell wrote:
>> [...]
[...]
> It's not an ACCM problem. At this point ACCM has not been negotiated
> and all Control Characters are escaped.
>
Ahh yes, apologies.
--
Alan J. McFarlane
http://www.alanjmcf.me.uk/
Please follow-up in the newsgroup for the benefit of all.
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
On Thu, 6 Jan 2005 11:43:07 -0600
Clifford Kite wrote:
> Try compiling multilink support above and then use the multilink
> option; it *might* be a workaround since my ISP, also via a regular
> landline, will negotiate MP and be happy with just one MP connection.
Thanks for the help, trying this, I get:
CONNECT 45333/ARQ/V90/LAPM/V42BIS
Connected!
Serial connection established.
using channel 1
Starting negotiation on /dev/modem
sent [LCP ConfReq id=0x1 ]
rcvd [LCP ConfReq id=0x1 ]
sent [LCP ConfAck id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
rcvd [LCP ConfReq id=0x2 ]
sent [LCP ConfAck id=0x2 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
sent [LCP ConfReq id=0x1 ]
LCP: timeout sending Config-Requests
Connection terminated.
What gets me is that the host never seems to alter its behavior
regardless of what my machine sends - ConfRej of ConfAck. It is as
if it never sees any of the data sent from my machine. I tried the
asyncmap 0xa0000 option just for the heck of it, but it did not help.
I note that the ppp standard has a lot of conditions for silently
dropping packets. Could it be that they have a buggy ppp host that
sees all Linux pppd generated LCP packets as being invalid? I have
no idea how robust LCP packets are (7 bit, etc.). If so, I wonder
how MS Windows does it differently.
Another possibility is a modem firmware problem. That is, *after* my
particular modem connects, their end never sees any of the data sent
from my modem. I am using a TI chipset based 56Kbps hardware modem
which I have never had a problem with. I can connect to my backup ISP
without trouble, but with that Augusta number I have not been able to
connect for a month (I would think that I would eventually get a "good"
modem after dozens of tries). I tried connecting at 14.4Kbps, but this
did not change anything. I even had the gall to bring my now ancient
Hayes 2400 external modem out of mothball, but I could hear from the
tones that modern modems have long since forgotten about the pre-14.4K
days. (IMHO, 9600bps was the last time everything worked as it should. 
Of course, I would be able to check this with minicom if they still
offered a "login: " prompt (which they don't).
Yet another possibility is something related to this bogus "high speed
dialup" (aka the AOL runner) feature everyone is offering. I sure hope
that they do not require special bits for this to be sent during ppp
negotiation.
Now I am beginning to wonder if what they told me about MS Windows XP
clients being able to connect is really true. Maybe that line is
totally hosed and they are covering it up. 
Mike Shell
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In comp.protocols.ppp Michael Shell wrote:
> On Thu, 6 Jan 2005 11:43:07 -0600
> Clifford Kite wrote:
>> Try compiling multilink support above and then use the multilink
>> option; it *might* be a workaround since my ISP, also via a regular
>> landline, will negotiate MP and be happy with just one MP connection.
....
> sent [LCP ConfReq id=0x1 ]
> sent [LCP ConfReq id=0x1 ]
> sent [LCP ConfReq id=0x1 ]
> sent [LCP ConfReq id=0x1 ]
> LCP: timeout sending Config-Requests
> Connection terminated.
> What gets me is that the host never seems to alter its behavior
> regardless of what my machine sends - ConfRej of ConfAck. It is as
> if it never sees any of the data sent from my machine. I tried the
> asyncmap 0xa0000 option just for the heck of it, but it did not help.
Okay, I focused on MP because it appears you use the same host and
device file for each connection. There is only one other thing I know
about that can cause the peer not to "hear" any of your LCP requests,
given that a good serial connection is established and knowing that
pppd is sending and receiving valid LCP requests.
If the type of UART configured for the device file differs from the
actual UART type then that would cause the problem. I still don't
see how it's possible in this case since you can connect to the other
POP, and seem to have no problem connecting to both until recently.
But that's all I have left to suggest.
(A FYI - the most common UART is a 16550A and configuring the device
file for a 16550 won't work even though the package/manual for the
serial device may say 16550. The UART type can be changed using the
setserial program.)
> I note that the ppp standard has a lot of conditions for silently
> dropping packets. Could it be that they have a buggy ppp host that
> sees all Linux pppd generated LCP packets as being invalid? I have
> no idea how robust LCP packets are (7 bit, etc.). If so, I wonder
> how MS Windows does it differently.
I'm no longer sure it's a buggy peer. PPP is a standard and though
PPP implementations vary they should be compatible enough to provide
a connection (cell-phones excepted).
> Another possibility is a modem firmware problem. That is, *after* my
> particular modem connects, their end never sees any of the data sent
> from my modem.
Since it was able to connect to the troublesome POP previously, I don't
see how firmware could be the problem unless something broke. I'd expect
the other POP connection would also fail if that happened.
....
> Yet another possibility is something related to this bogus "high speed
> dialup" (aka the AOL runner) feature everyone is offering. I sure hope
> that they do not require special bits for this to be sent during ppp
> negotiation.
I *think* that is accomplished with a server at the ISP that caches
web pages and client software provided by the ISP to MS clients.
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
OK, I decided to boot with MS Windows 2000 (same machine) and see if I
could connect with that. Indeed, I could - the byte-level details of the
log file are at the end of this post.
Manually decoding the bytes in the MS log to a pppd-like format, I came up
with this:
sent [LCP ConfReq id=0x00 len=0x32 ]
sent [LCP ConfReq id=0x01 len=0x32 ]
rcvd [LCP ConfReq id=0x01 len=0x2c ]
sent [LCP ConfAck id=0x01 len=0x2c ]
rcvd [LCP ConfRej id=0x01 len=0x07 ]
sent [LCP ConfReq id=0x02 len=0x2f
rcvd [LCP ConfAck id=0x02 len=0x2f
What the heck is going on?! This is the exact same hardware, so now I
don't think it is a modem firmware issue. The 0D 03 06 LCP option
from Windows 2000 is strange. My, possibly incorrect, interpretation of
this is that it is the callback option (0x0d=13) of Section 2.3 of RFC1570.
However, the operation code of 6 is strange in that RFC1570 only lists
up to number 4. Furthermore, why in the heck would MS Windows be requesting
a callback anyway?! The host does wakeup to it and reject it after which
all is well. I have no idea if pppd can be configured to issue this
strange option - I would try it if I could.
The $10,000 question is why does the host seem to see the Windows 2000
generated LCP packets, but not those from Linux's pppd? Remember, I can
connect to other numbers just fine under Linux with the same setup,
options and dialscripts, so the serial line/modem cannot be broken.
I tried using a pppd option:
endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98 .e9.00.00.00.00
so as to more closely mimic MS Windows, but the host didn't respond any
differently to it. Ditto for resetting the modem to factory defaults and
trying the same mrru (1614) as MS Windows.
I only see two possibilities:
1. Something is going wrong at the byte level that causes the host
to silently drop pppd's ConfAck and ConfRej's. I am assuming that my
pppd would put something in the debug output if it received and
dropped something improper from the host. I want to look at the
bytelevel conversation between pppd and the host to see if anything
differs from the LCP bytes MS Windows sends. What is the best way to
eavesdrop on the conversation that flows through /dev/modem?
2. That callback 0x06 invokes some special MS witchcraft.
What a creepy situation!
Mike
Windows 2000 ppp log file details are as follows:
-----
..
..
[1072] 20:13:57:356:
[1072] 20:13:57:356:
[1072] 20:13:57:356:
[1072] 20:13:57:356: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
[1072] 20:13:57:356: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
[1072] 20:13:57:356: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[1072] 20:13:57:356:
[1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=c021,EventType=0,fAuth=0
[1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=0,EventType=3,fAuth=0
[1072] 20:13:59:359: Recv timeout event received for portid=0,Id=0,Protocol=c021,fAuth=0
[1072] 20:13:59:359: NotifyCaller(hPort=5, dwMsgId=9)
[1072] 20:13:59:359:
[1072] 20:13:59:359:
[1072] 20:13:59:359:
[1072] 20:13:59:359: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
[1072] 20:13:59:359: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
[1072] 20:13:59:359: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[1072] 20:13:59:359:
[1072] 20:13:59:359: InsertInTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
[1016] 20:13:59:509: Packet received (46 bytes) for hPort 5
[1072] 20:13:59:509: >PPP packet received at 01/08/2005 01:13:59:509
[1072] 20:13:59:509: >Protocol = LCP, Type = Configure-Req, Length = 0x2e, Id = 0x1, Port = 5
[1072] 20:13:59:509: >C0 21 01 01 00 2C 01 04 05 DD 02 06 00 0A 00 00 |.!...,..........|
[1016] 20:13:59:519: Packet received (9 bytes) for hPort 5
[1072] 20:13:59:509: >03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
[1072] 20:13:59:509: >05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
[1072] 20:13:59:519:
[1072] 20:13:59:519:
[1072] 20:13:59:519:
[1072] 20:13:59:519:
[1072] 20:13:59:519: <03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
[1072] 20:13:59:519: <05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
[1072] 20:13:59:519:
[1072] 20:13:59:519: >PPP packet received at 01/08/2005 01:13:59:519
[1072] 20:13:59:519: >Protocol = LCP, Type = Configure-Reject, Length = 0x9, Id = 0x1, Port = 5
[1072] 20:13:59:519: >C0 21 04 01 00 07 0D 03 06 00 00 00 00 00 00 00 |.!..............|
[1072] 20:13:59:519:
[1072] 20:13:59:519: RemoveFromTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
[1072] 20:13:59:519:
[1072] 20:13:59:519:
[1072] 20:13:59:519:
[1072] 20:13:59:519: <15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
[1072] 20:13:59:519:
[1072] 20:13:59:519: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[1072] 20:13:59:519:
[1072] 20:13:59:519: InsertInTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
[1016] 20:13:59:700: Packet received (49 bytes) for hPort 5
[1072] 20:13:59:700: >PPP packet received at 01/08/2005 01:13:59:700
[1072] 20:13:59:700: >Protocol = LCP, Type = Configure-Ack, Length = 0x31, Id = 0x2, Port = 5
[1072] 20:13:59:700: >C0 21 02 02 00 2F 02 06 00 00 00 00 05 06 3A 0B |.!.../........:.|
[1072] 20:13:59:700: >15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
[1072] 20:13:59:700: >B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 00 00 00 |.-.G.....Px.....|
[1072] 20:13:59:700: >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
[1072] 20:13:59:700:
[1072] 20:13:59:700: RemoveFromTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
[1072] 20:13:59:700: FsmThisLayerUp called for protocol = c021, port = 5
[1072] 20:13:59:700: LCP Local Options-------------
[1072] 20:13:59:700: MRU=1500,ACCM=0,Auth=0,MagicNumber=973804942,PFC=O N,ACFC=ON
[1072] 20:13:59:700: Recv Framing = PPP Multilink,SSHF=OFF,MRRU=1614,LinkDiscrim=0,BAP=OFF
[1072] 20:13:59:700: ED Class = 1, ED Value = 1c793bb12d8c47d09bfca8ca507898e900000000
[1072] 20:13:59:700: LCP Remote Options-------------
[1072] 20:13:59:700: MRU=1501,ACCM=655360,Auth=c023,MagicNumber=1220272 450,PFC=ON,ACFC=ON
[1072] 20:13:59:700: Send Framing = PPP Multilink,SSHF=OFF,MRRU=1524,LinkDiscrim=0
[1072] 20:13:59:700: ED Class = 1, ED Value = 776463342d6c6e73310000000000000000000000
[1072] 20:13:59:700: LCP Configured successfully
..
..
---
EOM
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
In comp.protocols.ppp Michael Shell wrote:
> OK, I decided to boot with MS Windows 2000 (same machine) and see if I
> could connect with that. Indeed, I could - the byte-level details of the
> log file are at the end of this post.
> Manually decoding the bytes in the MS log to a pppd-like format, I came up
> with this:
> sent [LCP ConfReq id=0x00 len=0x32 ]
> sent [LCP ConfReq id=0x01 len=0x32 ]
> rcvd [LCP ConfReq id=0x01 len=0x2c ]
> sent [LCP ConfAck id=0x01 len=0x2c ]
> rcvd [LCP ConfRej id=0x01 len=0x07 ]
> sent [LCP ConfReq id=0x02 len=0x2f
> rcvd [LCP ConfAck id=0x02 len=0x2f
> What the heck is going on?! This is the exact same hardware, so now I
> don't think it is a modem firmware issue. The 0D 03 06 LCP option
> from Windows 2000 is strange. My, possibly incorrect, interpretation of
> this is that it is the callback option (0x0d=13) of Section 2.3 of RFC1570.
You are correct, it is the call-back option.
> However, the operation code of 6 is strange in that RFC1570 only lists
> up to number 4. Furthermore, why in the heck would MS Windows be requesting
> a callback anyway?! The host does wakeup to it and reject it after which
> all is well. I have no idea if pppd can be configured to issue this
> strange option - I would try it if I could.
It's a MS thing (used google to find this):
http://www.microsoft.com/resources/d...dura_tools.asp
Search the page for CBCP, read about the 6, then search for CBCP again
for a section titled CBCP. You may be able to make more sense of it
than I could. My read of it is that the 6 tells the peer that the
request is to use MS CBCP to negotiate call-back after authentication.
Pppd has an undocumented option named `callback' that might generate
what the MS side of your host generated. But you'll likely have to
edit pppd/Makefile in the pppd source, uncomment the second line below:
# Enable Microsoft proprietary Callback Control Protocol
#CBCP=y
and recompile. I don't have it compiled into pppd here and so can't
readily test the option. It may or may not take a call-back number
as a value (callback ) - I'm not a PPP implementor and my C
reading skill is low.
> The $10,000 question is why does the host seem to see the Windows 2000
> generated LCP packets, but not those from Linux's pppd? Remember, I can
> connect to other numbers just fine under Linux with the same setup,
> options and dialscripts, so the serial line/modem cannot be broken.
I've been hoping that James Carlson would participate but his last
post here was on Monday. He's a regular poster here and the one most
likely to come up with an answer. It is indeed a "creepy" problem;
I've never seen anything quite like it.
BTW, your tenacity and your manual translation of raw hex to Linux
PPP-ese are admirable. Also impressive is knowing what information
could be useful in finding a solution to the problem and trying to
imitate the MS requests before posting - sometimes that works.
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Well, I finally solved the mystery - and it took some doing to uncover
it. I used serial port sniffers under both the Linux (slsnif) and
Windows to see exactly what each was sending over the link.
Everything looked great with each PPP frame, which just deepened the
riddle.
I even made sure that Linux was using the exact same modem reset and
initialization strings that MS Windows was using - to no avail. BTW,
for future reference, Clifford's advice on the pppd code did indeed
allow me to enable the option under Linux, but
unfortunately this did not change anything either.
I then noticed that the bad host was not even sending me a TermAck to my
TermReq when I used control-c to prematurely shutdown a connect attempt.
LCP Termination Requests are very simple packets - there isn't much that
can go wrong with them.
So I decided to take another look at my connect script. The final part
of my chat script went like this:
TIMEOUT 50 \
SAY "\nWaiting for Connection..." \
ECHO ON \
"ONNEC" "\c" \
"\n" "\r\n" \
SAY "\nConnected!\n"
What could possibly go wrong here you might ask? Plenty, if the PPP
host has a framing parser so fragile that it cannot withstand a leading
carriage return and/or line feed before the PPP negotiation sequence
begins!
That's right folks, the initial \r\n permanently broke the host's PPP
frame receiver code! After that happens, you can send all the properly
formed LCP packets you want and the host will never see any of them!
But, it will continue to send out its own ConfReq's. The person who
wrote that crappy PPP code outta be run out of town.
The reason I even have this initial new line in there was that some
time in the past, some ISP's PPP or login code would not "wake up"
until it received a CR or LF after connect.
For the record, you can't do any of these:
"\n" "\r\c" \
"\n" "\n\c" \
"\n" "\r\n\c" \
However, you can do these:
"\n" "\c" \
"\n" "\N\c" \
"\n" "\s\c" \
So, nulls and spaces don't hang the receiver, but CR and/or LF does.
I decided to put in a little delay for good measure, so:
"\n" "\p\c" \
is what I use now and all is well.
I am pretty sure this will help somebody out in the future. Can you
imagine what those poor souls with modems that happen to output a
spurious new line just after initial connect will go through?!
I'd sure like to know the name and version of this fragile PPP
software so that people can be warned about it. Geeezzzz.
Thanks for all your help and advice,
Mike Shell
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Michael Shell writes:
> What could possibly go wrong here you might ask? Plenty, if the PPP
> host has a framing parser so fragile that it cannot withstand a leading
> carriage return and/or line feed before the PPP negotiation sequence
> begins!
Typically, dial-in servers attempt to detect what protocol the peer is
using automatically. If the server sees a carriage return, then it
assumes that it's a human at a regular tty, not a machine using PPP.
It's obviously not the best way to do things. A better way than this
is to spit out a text message welcoming the user (which will just be
discarded by any PPP-speaking peer), but _continuously_ look for PPP
data on input and switch modes when appropriate, rather than switching
on the first one or two characters. Doing that right takes a little
more than a minute's thought, though, so it's often not done.
Plus, there's the Windows-effect to consider: most ISP equipment these
days is designed for the least-common-denominator. If it works with
Windows DUN, then that's "good enough." It doesn't have to work well
anywhere else.
(The same is unfortunately true of a lot of consumer gear these days.)
> That's right folks, the initial \r\n permanently broke the host's PPP
> frame receiver code! After that happens, you can send all the properly
> formed LCP packets you want and the host will never see any of them!
> But, it will continue to send out its own ConfReq's. The person who
> wrote that crappy PPP code outta be run out of town.
I don't think it's the server that's bad. The chat script was bad.
> I'd sure like to know the name and version of this fragile PPP
> software so that people can be warned about it. Geeezzzz.
Ask the ISP. But it's likely that it's one of the many commercial
versions, and you just suffered from having a bad chat script.
--
James Carlson, IP Systems Group
Sun Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
On 20 Jan 2005 09:21:33 -0500
James Carlson wrote:
> Typically, dial-in servers attempt to detect what protocol the peer is
> using automatically. If the server sees a carriage return, then it
> assumes that it's a human at a regular tty, not a machine using PPP.
That I can understand, but remember that the ISP in question continued
to send valid PPP ConfReq requests, but ignored all my PPP ConfAck
responses. Something is obviously broken on their end. There is no text
based login with their system. The very reason ISPs went to pure PPP login
and skipped the text based login altogether is because of the difficulties
of handling tech support for all the other different types of
login/Login/username, text based configurations. Going deaf after the first
CRLF kind of defeats the purpose of the default PPP approach because it is,
IMHO, unsafe to trust the first few characters after the initial connect -
there is always the possibility that the client will still be chatting with
the modem or the modem itself may issue a CR at first connect (I've never
personally seen this, but it would not surprise me in the least if some
modems did just that). The PPP protocol was designed to handle all types
of these kinds of initial missteps.
> I don't think it's the server that's bad. The chat script was bad.
I agree that my end did something that it should not have. However, remember
that the ISP continued to send valid ConfReq requests - and so this is a
PPP protocol issue (because it happened within PPP negotiation) and I
don't think the ISP is allowed to do this according to the PPP standards -
invalid PPP data should be silently discarded and then one should resume
scanning for valid PPP config requests - the latter of which was not done.
Mike Shell
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Michael Shell writes:
> On 20 Jan 2005 09:21:33 -0500
> James Carlson wrote:
>
> > Typically, dial-in servers attempt to detect what protocol the peer is
> > using automatically. If the server sees a carriage return, then it
> > assumes that it's a human at a regular tty, not a machine using PPP.
>
>
> That I can understand, but remember that the ISP in question continued
> to send valid PPP ConfReq requests, but ignored all my PPP ConfAck
> responses. Something is obviously broken on their end.
If I remember correctly, your LCP negotiation started off strangely,
with one side (probably theirs) suggesting an asyncmap (ACCM) of
0xa0000, and the other (probably yours) suggesting 0. That's
technically legal per RFC 1662, but is often in practice a good
indicator of bugs in the peer implementation, and usually results in a
failure to negotiate that's remarkably similar to what you saw.
The fix is to add "asyncmap 0xa0000" to your configuration, after some
obligatory swearing at the people who built the bad implementation.
> There is no text
> based login with their system. The very reason ISPs went to pure PPP login
> and skipped the text based login altogether is because of the difficulties
> of handling tech support for all the other different types of
> login/Login/username, text based configurations.
Sure.
> Going deaf after the first
> CRLF kind of defeats the purpose of the default PPP approach because it is,
> IMHO, unsafe to trust the first few characters after the initial connect -
> there is always the possibility that the client will still be chatting with
> the modem or the modem itself may issue a CR at first connect (I've never
> personally seen this, but it would not surprise me in the least if some
> modems did just that). The PPP protocol was designed to handle all types
> of these kinds of initial missteps.
I'm pretty sure I know something like that.
> > I don't think it's the server that's bad. The chat script was bad.
>
>
> I agree that my end did something that it should not have. However, remember
> that the ISP continued to send valid ConfReq requests - and so this is a
> PPP protocol issue (because it happened within PPP negotiation) and I
> don't think the ISP is allowed to do this according to the PPP standards -
> invalid PPP data should be silently discarded and then one should resume
> scanning for valid PPP config requests - the latter of which was not done.
I'm not sure I understand what you're saying here, and I don't see any
specific error that is directly traceable to a violation of any of the
standards.
If the other side cannot hear your side due to communications errors
(which is what I expect is going on here during the failure scenario),
then it rightly should continue sending the same Configure-Request
messages at each Restart timer expiry until the restart limit is
reached.
There's no way that any of the PPP documents can require the peer to
do what it is unable to do. If the packets are getting garbled in
transit (which I expect is true, given the symptoms), there's not much
the peer can do but allow the connection to fail and hope the human
can fix things.
Now if the peer is switching the ACCM too early (before LCP is in
Opened state) or if the implementor confused the transmit and receive
directions for the escaping logic (altogether *way* too common), then
that's indeed an implementation bug. The real issue, though, is the
lack of interoperability, not the conformance (or lack thereof) with
respect to the standards.
--
James Carlson, IP Systems Group
Sun Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
On 21 Jan 2005 13:32:36 -0500
James Carlson wrote:
> If the other side cannot hear your side due to communications errors
> (which is what I expect is going on here during the failure scenario),
> then it rightly should continue sending the same Configure-Request
> messages at each Restart timer expiry until the restart limit is
> reached.
See, this is what is so surprising about the whole thing and why
nobody, including myself, suspected this type of bug triggered by
the chat script. The comm link was/is fine and without error. However,
when a leading CR and/or LF is sent to the host at the start of
PPP negotiations, the host receiver will "lock-up" and never
be able to see any of my PPP Conf Requests or Acks from that
point on. However, the host will continue to transmit its own
valid Conf Requests - indicating clearly that it is trying to
establish a PPP connection. I can watch the whole thing unfold
at the byte level using a serial line sniffer and I can
reproduce the problem at will by sending one LF just prior to
end of the chat script - as well as avoiding the problem and
getting a good PPP negotiation by removing the spurious LF.
I just know this bug is going to bite others and when it does
it is a real bear to understand what the heck is going wrong.
I did ask my ISP what software is being used, but it is
unlikely that they'll ever tell me this. I'd sure like to
know if anybody knows the make of crappy code that does
this.
Mike
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Michael Shell writes:
>On 21 Jan 2005 13:32:36 -0500
>James Carlson wrote:
>> If the other side cannot hear your side due to communications errors
>> (which is what I expect is going on here during the failure scenario),
>> then it rightly should continue sending the same Configure-Request
>> messages at each Restart timer expiry until the restart limit is
>> reached.
Seems to me that you have two options-- 1) Figure out what you can do to
make the link work.
2) Rant and rail against the rest of the world and how if only they did
things better it would make it easier for you.
>See, this is what is so surprising about the whole thing and why
>nobody, including myself, suspected this type of bug triggered by
>the chat script. The comm link was/is fine and without error. However,
That kind of bug is EXTREMELY common. Yes, there are some pretty bad
programmers out there.
>when a leading CR and/or LF is sent to the host at the start of
>PPP negotiations, the host receiver will "lock-up" and never
>be able to see any of my PPP Conf Requests or Acks from that
>point on. However, the host will continue to transmit its own
>valid Conf Requests - indicating clearly that it is trying to
As Carlson said, the reason may very very well have been that it demands an
asyncmap of a000 and you did 0000. Nothing illegal, but it is very well
known in the community that is a recipie for disaster. There is a badly
written program out there ( by some organsation from Washington State) that
breaks in that situation. Should it ? No. Does it? Yes. Should it be fixed?
Yes. do you have the influence to get it done? Probably not.
>establish a PPP connection. I can watch the whole thing unfold
>at the byte level using a serial line sniffer and I can
>reproduce the problem at will by sending one LF just prior to
>end of the chat script - as well as avoiding the problem and
>getting a good PPP negotiation by removing the spurious LF.
So, remove it.
>I just know this bug is going to bite others and when it does
>it is a real bear to understand what the heck is going wrong.
It has for many many years. So? Windows is set up not to trigger this bug.
Do you think that MS is going to change things to make life for other
operating systems easier? I could rather suspect it is there on purpose to
make life as hard as possible for others.
>I did ask my ISP what software is being used, but it is
>unlikely that they'll ever tell me this. I'd sure like to
>know if anybody knows the make of crappy code that does
>this.
What code do you think most ISPs run?
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
On 22 Jan 2005 18:41:23 GMT
unruh@string.physics.ubc.ca (Bill Unruh) wrote:
> Seems to me that you have two options-- 1) Figure out what you can do to
> make the link work. 2) Rant and rail against the rest of the world
> and how if only they did things better it would make it easier for you.
I did and I am not. I simply wanted to track down the source of the
problem for the benefit of future readers of this thread - a lesser poster
would not have followed-up after he got his system working.
> As Carlson said, the reason may very very well have been that
> it demands an asyncmap of a000 and you did 0000.
This is a bit misleading because it implies a configuration problem
with pppd rather than a *single* unescaped LF at the very start of
PPP negotiation. If you had read the entire thread from the beginning
before posting, you would have seen that a different pppd asymcmap
setting was one of the very first things we checked, for on Jan 6th,
Clifford Kite wrote:
: It's not an ACCM problem. At this point ACCM has not been negotiated
: and all Control Characters are escaped.
Indeed, setting my asyncmap to match that asked of the host had no
effect. Of course, the origin of my single unescaped LF was "outside"
of pppd, but we would not expect this single rouge LF to hang the
entire receiver of the host for the remainder of the call.
> Do you think that MS is going to change things to make life for other
> operating systems easier?
Of course not.
> I could rather suspect it is there on purpose to make life as hard
> as possible for others.
I agree that they often do just that, but somehow I don't feel that this
is the case here. As I tried to point out several times, there might
be some *hardware* running Windows that could be bitten by this bug.
> What code do you think most ISPs run?
I don't know as after using dozens of dialup numbers from several
different ISPs over half a decade with the exact same chat scripts, this
is the first time I've run into a problem quite like this and it does not
occur with the several other dialup numbers that I have tried - which is
why I am/was so curious about it.
Have some heart. It took me a lot of effort to track down this "simple"
problem even with the generous help and advice of other posters; and I am
not a newbie. I am sure that the info here will help somebody out of a
jam in the future - maybe even somebody running MS Windows. 
Mike
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Michael Shell writes:
>On 22 Jan 2005 18:41:23 GMT
>unruh@string.physics.ubc.ca (Bill Unruh) wrote:
>> Seems to me that you have two options-- 1) Figure out what you can do to
>> make the link work. 2) Rant and rail against the rest of the world
>> and how if only they did things better it would make it easier for you.
>I did and I am not. I simply wanted to track down the source of the
>problem for the benefit of future readers of this thread - a lesser poster
>would not have followed-up after he got his system working.
>> As Carlson said, the reason may very very well have been that
>> it demands an asyncmap of a000 and you did 0000.
>This is a bit misleading because it implies a configuration problem
>with pppd rather than a *single* unescaped LF at the very start of
>PPP negotiation. If you had read the entire thread from the beginning
>before posting, you would have seen that a different pppd asymcmap
>setting was one of the very first things we checked, for on Jan 6th,
>Clifford Kite wrote:
>: It's not an ACCM problem. At this point ACCM has not been negotiated
>: and all Control Characters are escaped.
>Indeed, setting my asyncmap to match that asked of the host had no
>effect. Of course, the origin of my single unescaped LF was "outside"
>of pppd, but we would not expect this single rouge LF to hang the
>entire receiver of the host for the remainder of the call.
I agree that one would not expect it. On the other hand, I spent some time
trying to understand what actually happened in the real world when I wrote
www.theory.physics.ubc.ca/ppp-linux.html
I came to the conclusion that the ways ISPs had of screwing up were
infinite. Most of them are ways that should not happen, that properly
written/set up pppd's would not do those things, but nevertheless they
did. Perhaps I am just too cynical.
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
h@string.physics.ubc.ca wrote:
> Michael Shell writes:
> >On 22 Jan 2005 18:41:23 GMT
> >unruh@string.physics.ubc.ca (Bill Unruh) wrote:
> >> Seems to me that you have two options-- 1) Figure out what you can do to
> >> make the link work. 2) Rant and rail against the rest of the world
> >> and how if only they did things better it would make it easier for you.
> >I did and I am not. I simply wanted to track down the source of the
> >problem for the benefit of future readers of this thread - a lesser poster
> >would not have followed-up after he got his system working.
> >> As Carlson said, the reason may very very well have been that
> >> it demands an asyncmap of a000 and you did 0000.
> >This is a bit misleading because it implies a configuration problem
> >with pppd rather than a *single* unescaped LF at the very start of
> >PPP negotiation. If you had read the entire thread from the beginning
> >before posting, you would have seen that a different pppd asymcmap
> >setting was one of the very first things we checked, for on Jan 6th,
> >Clifford Kite wrote:
> >: It's not an ACCM problem. At this point ACCM has not been negotiated
> >: and all Control Characters are escaped.
> >Indeed, setting my asyncmap to match that asked of the host had no
> >effect. Of course, the origin of my single unescaped LF was "outside"
> >of pppd, but we would not expect this single rouge LF to hang the
> >entire receiver of the host for the remainder of the call.
> I agree that one would not expect it. On the other hand, I spent some time
> trying to understand what actually happened in the real world when I wrote
> www.theory.physics.ubc.ca/ppp-linux.html
> I came to the conclusion that the ways ISPs had of screwing up were
> infinite. Most of them are ways that should not happen, that properly
> written/set up pppd's would not do those things, but nevertheless they
> did. Perhaps I am just too cynical.
Great
damon_w@fobsig.org
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
Well - it seems I have almost the same problem here.
I can do what I want, send what I want, the remote side is simply ignoring
everything and sends
LCP ConfReq id=0x3
even if I offer (and ACK) a pap auth. To make sure I am not sending any
junk, I patched serial_core.c, uart_write() to dump everythink I am
sending. No linefeeds or cr's in my data stream.
Needless to say, the card (Audiovox RTM-8000 CF) works flawlessly in my
Sharp Zaurus (linux 2.4.20) and with the very Laptop under Windows.
I have no idea and *any* help would be greatly appreciated.
Michaela
-
Re: Linux ppp + MegaPOP dialup change = mrru related LCP timeout
"steyla" writes:
>Well - it seems I have almost the same problem here.
>I can do what I want, send what I want, the remote side is simply ignoring
>everything and sends
>LCP ConfReq id=0x3
>even if I offer (and ACK) a pap auth. To make sure I am not sending any
>junk, I patched serial_core.c, uart_write() to dump everythink I am
>sending. No linefeeds or cr's in my data stream.
>Needless to say, the card (Audiovox RTM-8000 CF) works flawlessly in my
>Sharp Zaurus (linux 2.4.20) and with the very Laptop under Windows.
>I have no idea and *any* help would be greatly appreciated.
YOu give almost no information.
That is indicative that the other end is not getting your messages, or they
are garbled. Often this is indicated by the far side asking for a non-zero
async map.
iYou must use exactly the same asyncmap .
If you want more help post the output exactly , cuta nad paste, including
the time stamps, from the syslog logs ( daemon log )
>Michaela