pppd locking up randomly after hours of use in Linux - PPP
This is a discussion on pppd locking up randomly after hours of use in Linux - PPP ; I am currently working on a project that uses multiple cellular datacards in
a mobile environment. Specifically, we have a PC104 Pentium 3 stack that
has 8 Sierra Wireless PCMCIA Aircards. 4 of the Aircards are model 550s
through Sprint, ...
-
pppd locking up randomly after hours of use in Linux
I am currently working on a project that uses multiple cellular datacards in
a mobile environment. Specifically, we have a PC104 Pentium 3 stack that
has 8 Sierra Wireless PCMCIA Aircards. 4 of the Aircards are model 550s
through Sprint, and 4 of the Aircards are model 555s through Verizon. We
are multiplexing real time video over all 8 of these cellular devices using
custom software which is irrelevant to the problem I am having.
Basically, all 8 Aircards are connected to the cellular network using pppd
with a simple chat script provided by Sierra Wireless from their
unsupported Linux section of their web site. The pppd uses persist to
maintain the connection. Because the environment is mobile and the testing
location is rural, the cellular connectivity will disconnect and reconnect
relatively often.
The problem is that after a while if the cellular datacards are out of
coverage, the pppd process will freeze. Even after returning to an area
with coverage, the frozen pppd process will never redial the modem, while
the other pppd processes redial and reconnect just fine. This does not
happen all the time, but over the course of a day, it is possible for 3 or
4 of the 8 pppd processes to stop responding. We are using a 2.4.20 kernel
with pppd 2.4.1.
I have identified 2 possible solutions, but would like more feedback from
the experts. I plan to add "AT&D2&C1" to the chat script hoping that our
datacards are randomly failing to switch to a command mode. Lastly, I plan
to incorporate Clifford Kite's patch if the chat script changes do not
work.
pppd options:
-detach
/dev/modem1
persist
maxfail 0
lcp-echo-interval 3
lcp-echo-failure 3
debug
usepeerdns
user blah
show-password
crtscts
lock
connect '/usr/sbin/chat -v -t3 -f /etc/ppp/peers/ac550chat'
chat script:
'' AT
OK ATD#777
CONNECT ''
pppd debug output:
Serial connection established.
using channel 62
Using interface ppp3
Connect: ppp3 <--> /dev/modem1
rcvd [LCP ConfReq id=0x2d ]
sent [LCP ConfReq id=0xa ]
sent [LCP ConfAck id=0x2d ]
rcvd [LCP ConfAck id=0xa ]
sent [LCP EchoReq id=0x0 magic=0xe80562be]
sent [IPCP ConfReq id=0x7
0.0.0.0> ]
sent [CCP ConfReq id=0x4 ]
rcvd [LCP DiscReq id=0x2e magic=0x90287b]
rcvd [LCP EchoRep id=0x0 magic=0x90287b e8 05 62 be]
rcvd [IPCP ConfReq id=0x2f ]
sent [IPCP ConfAck id=0x2f ]
rcvd [LCP ProtRej id=0x30 80 fd 01 04 00 0c 1a 04 78 00 18 04 78 00]
rcvd [IPCP ConfNak id=0x7
]
sent [IPCP ConfReq id=0x8
68.28.186.11> ]
rcvd [IPCP ConfAck id=0x8
68.28.186.11> ]
local IP address 68.240.48.222
remote IP address 68.28.160.192
primary DNS address 68.28.186.11
secondary DNS address 68.28.178.11
Script /etc/ppp/ip-up started (pid 30298)
Script /etc/ppp/ip-up finished (pid 30298), status = 0x0
sent [LCP EchoReq id=0x1 magic=0xe80562be]
rcvd [LCP EchoRep id=0x1 magic=0x90287b]
sent [LCP EchoReq id=0x2 magic=0xe80562be]
rcvd [LCP EchoRep id=0x2 magic=0x90287b]
sent [LCP EchoReq id=0x3 magic=0xe80562be]
rcvd [LCP EchoRep id=0x3 magic=0x90287b]
sent [LCP EchoReq id=0x4 magic=0xe80562be]
rcvd [LCP EchoRep id=0x4 magic=0x90287b]
sent [LCP EchoReq id=0x5 magic=0xe80562be]
rcvd [LCP EchoRep id=0x5 magic=0x90287b]
sent [LCP EchoReq id=0x6 magic=0xe80562be]
rcvd [LCP EchoRep id=0x6 magic=0x90287b]
sent [LCP EchoReq id=0x7 magic=0xe80562be]
rcvd [LCP EchoRep id=0x7 magic=0x90287b]
sent [LCP EchoReq id=0x8 magic=0xe80562be]
rcvd [LCP EchoRep id=0x8 magic=0x90287b]
sent [LCP EchoReq id=0x9 magic=0xe80562be]
rcvd [LCP EchoRep id=0x9 magic=0x90287b]
sent [LCP EchoReq id=0xa magic=0xe80562be]
sent [LCP EchoReq id=0xb magic=0xe80562be]
rcvd [LCP EchoRep id=0xa magic=0x90287b]
sent [LCP EchoReq id=0xc magic=0xe80562be]
sent [LCP EchoReq id=0xd magic=0xe80562be]
sent [LCP EchoReq id=0xe magic=0xe80562be]
rcvd [LCP EchoRep id=0xb magic=0x90287b]
rcvd [LCP EchoRep id=0xc magic=0x90287b]
sent [LCP EchoReq id=0xf magic=0xe80562be]
sent [LCP EchoReq id=0x10 magic=0xe80562be]
rcvd [LCP EchoRep id=0xd magic=0x90287b]
rcvd [LCP EchoRep id=0xe magic=0x90287b]
sent [LCP EchoReq id=0x11 magic=0xe80562be]
sent [LCP EchoReq id=0x12 magic=0xe80562be]
sent [LCP EchoReq id=0x13 magic=0xe80562be]
No response to 3 echo-requests
Serial link appears to be disconnected.
Script /etc/ppp/ip-down started (pid 30536)
sent [LCP TermReq id=0xb "Peer not responding"]
Script /etc/ppp/ip-down finished (pid 30536), status = 0x0
sent [LCP TermReq id=0xc "Peer not responding"]
Connection terminated.
Connect time 1.1 minutes.
Sent 92843 bytes, received 379 bytes.
Connect script failed
Connect script failed
Connect script failed
Connect script failed
Connect script failed
Connect script failed
Connect script failed
Connect script failed
Connect script failed
As you can see above, the pppd process tried to reconnect multiple times,
but eventually quit responding.
Any help would be greatly appreciated.
-
Re: pppd locking up randomly after hours of use in Linux
Larry Goats wrote:
> I am currently working on a project that uses multiple cellular datacards in
> a mobile environment. Specifically, we have a PC104 Pentium 3 stack that
> has 8 Sierra Wireless PCMCIA Aircards. 4 of the Aircards are model 550s
> through Sprint, and 4 of the Aircards are model 555s through Verizon. We
> are multiplexing real time video over all 8 of these cellular devices using
> custom software which is irrelevant to the problem I am having.
> Basically, all 8 Aircards are connected to the cellular network using pppd
> with a simple chat script provided by Sierra Wireless from their
> unsupported Linux section of their web site. The pppd uses persist to
> maintain the connection. Because the environment is mobile and the testing
> location is rural, the cellular connectivity will disconnect and reconnect
> relatively often.
> The problem is that after a while if the cellular datacards are out of
> coverage, the pppd process will freeze. Even after returning to an area
> with coverage, the frozen pppd process will never redial the modem, while
> the other pppd processes redial and reconnect just fine. This does not
> happen all the time, but over the course of a day, it is possible for 3 or
> 4 of the 8 pppd processes to stop responding. We are using a 2.4.20 kernel
> with pppd 2.4.1.
> I have identified 2 possible solutions, but would like more feedback from
> the experts. I plan to add "AT&D2&C1" to the chat script hoping that our
> datacards are randomly failing to switch to a command mode. Lastly, I plan
> to incorporate Clifford Kite's patch if the chat script changes do not
> work.
I don't know that the patch will help you. It was generated because
at the time of the first patch the stty program lacked the -F option.
So if pppd didn't restore the clocal terminal line setting then stty
failed to show the terminal line settings (standard input/output had
to be used). Pppd itself had no problem reusing the line without
clocal when started from scratch.
Since the problem occurs when you lose the "physical connection,"
as a test I tried the nearest thing I can do to imitate that, which
was to connect to my ISP using "persist" and "maxfail 0" and then
unplug the modem from the wall jack. Pppd tried to connect 15 times
before I plugged the modem back in, and then was able to reconnected
on the first try.
You showed 9 attempts to reconnect, but I don't know whether that was
typical or not. The S/N ratio will degrade slowly when a wireless
looses the connection, and it's remotely possible that that may make
a difference. Plus what you are doing is, IMHO, rather unusual.
The bottom line is that I know very little about cellular technology
and can't say what's going wrong. But I have come to believe, from
reading posts to this newsgroup, that some of the PPP implementations
used in connection with that technology do strange things.
FWIW, using minicom to configure my modem with the profile I use
with PPP shows &C1 &D1 %E1 . Considering the divergence from the
old Hayes standard, this may be less than worthless.
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
/* Better is the enemy of good enough. */
-
Re: pppd locking up randomly after hours of use in Linux
Clifford Kite wrote:
> Since the problem occurs when you lose the "physical connection,"
> as a test I tried the nearest thing I can do to imitate that, which
> was to connect to my ISP using "persist" and "maxfail 0" and then
> unplug the modem from the wall jack. Pppd tried to connect 15 times
> before I plugged the modem back in, and then was able to reconnected
> on the first try.
I forgot to add that I did this for both the modified pppd and the
unmodified pppd. No difference, except that the modified pppd did
revert to the original ttyS1 configuration before trying to reconnect
while the unmodified one did not.
> You showed 9 attempts to reconnect, but I don't know whether that was
> typical or not. The S/N ratio will degrade slowly when a wireless
> looses the connection, and it's remotely possible that that may make
> a difference. Plus what you are doing is, IMHO, rather unusual.
Gak! It's really disconcerting to see editing-for-clarity-errors
(reconnected) as well as plain typos (looses) in my posts that should
have been caught prior to posting. :/
--
Clifford Kite Email: "echo xvgr_yvahk-ccc@ri1.arg|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/
/* 97.3% of all statistics are made up. */
-
Re: pppd locking up randomly after hours of use in Linux
Larry Goats writes:
]I am currently working on a project that uses multiple cellular datacards in
]a mobile environment. Specifically, we have a PC104 Pentium 3 stack that
]has 8 Sierra Wireless PCMCIA Aircards. 4 of the Aircards are model 550s
]through Sprint, and 4 of the Aircards are model 555s through Verizon. We
]are multiplexing real time video over all 8 of these cellular devices using
]custom software which is irrelevant to the problem I am having.
]Basically, all 8 Aircards are connected to the cellular network using pppd
]with a simple chat script provided by Sierra Wireless from their
]unsupported Linux section of their web site. The pppd uses persist to
]maintain the connection. Because the environment is mobile and the testing
]location is rural, the cellular connectivity will disconnect and reconnect
]relatively often.
]The problem is that after a while if the cellular datacards are out of
]coverage, the pppd process will freeze. Even after returning to an area
]with coverage, the frozen pppd process will never redial the modem, while
]the other pppd processes redial and reconnect just fine. This does not
]happen all the time, but over the course of a day, it is possible for 3 or
]4 of the 8 pppd processes to stop responding. We are using a 2.4.20 kernel
]with pppd 2.4.1.
]I have identified 2 possible solutions, but would like more feedback from
]the experts. I plan to add "AT&D2&C1" to the chat script hoping that our
]datacards are randomly failing to switch to a command mode. Lastly, I plan
]to incorporate Clifford Kite's patch if the chat script changes do not
]work.
]pppd options:
]-detach
]/dev/modem1
]persist
]maxfail 0
]lcp-echo-interval 3
]lcp-echo-failure 3
]debug
]usepeerdns
]user blah
]show-password
]crtscts
]lock
]connect '/usr/sbin/chat -v -t3 -f /etc/ppp/peers/ac550chat'
]chat script:
]'' AT
]OK ATD#777
]CONNECT ''
]pppd debug output:
]Serial connection established.
]using channel 62
]Using interface ppp3
]Connect: ppp3 <--> /dev/modem1
]rcvd [LCP ConfReq id=0x2d ]
]sent [LCP ConfReq id=0xa ]
]sent [LCP ConfAck id=0x2d ]
]rcvd [LCP ConfAck id=0xa ]
]sent [LCP EchoReq id=0x0 magic=0xe80562be]
]sent [IPCP ConfReq id=0x7
]0.0.0.0> ]
]sent [CCP ConfReq id=0x4 ]
]rcvd [LCP DiscReq id=0x2e magic=0x90287b]
]rcvd [LCP EchoRep id=0x0 magic=0x90287b e8 05 62 be]
]rcvd [IPCP ConfReq id=0x2f ]
]sent [IPCP ConfAck id=0x2f ]
]rcvd [LCP ProtRej id=0x30 80 fd 01 04 00 0c 1a 04 78 00 18 04 78 00]
]rcvd [IPCP ConfNak id=0x7
]]
]sent [IPCP ConfReq id=0x8
]68.28.186.11> ]
]rcvd [IPCP ConfAck id=0x8
]68.28.186.11> ]
]local IP address 68.240.48.222
]remote IP address 68.28.160.192
]primary DNS address 68.28.186.11
]secondary DNS address 68.28.178.11
]Script /etc/ppp/ip-up started (pid 30298)
]Script /etc/ppp/ip-up finished (pid 30298), status = 0x0
]sent [LCP EchoReq id=0x1 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x1 magic=0x90287b]
]sent [LCP EchoReq id=0x2 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x2 magic=0x90287b]
]sent [LCP EchoReq id=0x3 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x3 magic=0x90287b]
]sent [LCP EchoReq id=0x4 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x4 magic=0x90287b]
]sent [LCP EchoReq id=0x5 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x5 magic=0x90287b]
]sent [LCP EchoReq id=0x6 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x6 magic=0x90287b]
]sent [LCP EchoReq id=0x7 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x7 magic=0x90287b]
]sent [LCP EchoReq id=0x8 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x8 magic=0x90287b]
]sent [LCP EchoReq id=0x9 magic=0xe80562be]
]rcvd [LCP EchoRep id=0x9 magic=0x90287b]
]sent [LCP EchoReq id=0xa magic=0xe80562be]
]sent [LCP EchoReq id=0xb magic=0xe80562be]
]rcvd [LCP EchoRep id=0xa magic=0x90287b]
]sent [LCP EchoReq id=0xc magic=0xe80562be]
]sent [LCP EchoReq id=0xd magic=0xe80562be]
]sent [LCP EchoReq id=0xe magic=0xe80562be]
]rcvd [LCP EchoRep id=0xb magic=0x90287b]
]rcvd [LCP EchoRep id=0xc magic=0x90287b]
]sent [LCP EchoReq id=0xf magic=0xe80562be]
]sent [LCP EchoReq id=0x10 magic=0xe80562be]
]rcvd [LCP EchoRep id=0xd magic=0x90287b]
]rcvd [LCP EchoRep id=0xe magic=0x90287b]
]sent [LCP EchoReq id=0x11 magic=0xe80562be]
]sent [LCP EchoReq id=0x12 magic=0xe80562be]
]sent [LCP EchoReq id=0x13 magic=0xe80562be]
]No response to 3 echo-requests
]Serial link appears to be disconnected.
]Script /etc/ppp/ip-down started (pid 30536)
]sent [LCP TermReq id=0xb "Peer not responding"]
]Script /etc/ppp/ip-down finished (pid 30536), status = 0x0
]sent [LCP TermReq id=0xc "Peer not responding"]
]Connection terminated.
]Connect time 1.1 minutes.
]Sent 92843 bytes, received 379 bytes.
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]Connect script failed
]As you can see above, the pppd process tried to reconnect multiple times,
]but eventually quit responding.
]Any help would be greatly appreciated.
There are many reasons why that connect script could fail. the remote end
never answers, the cell phone stops working, the port freezes up, etc.
There is not enough info here.
You should also have chat script reporting
Ie chat should be run with the -v option, and you should have syslog steer
local2 somewhere (eg same file as the ppp debug is going to)
Ie, have the line
local2.*;daemon.* /var/log/ppplog
in /etc/syslog.conf and then do
killall -1 syslogd
It looks to me like pppd is NOT the problem. Rather the chat script is
failing due to one of the above reasons.
Note that pppd can sometimes leave a serial port in a weird state.
Adapting a script of Carlson's Ihave a wakeup serial port resetting perl
program in
www.theory.physics.ubc.ca/modem-chk.html
You could try running it in the ip-down script to reset the port (if the
cell cards operate via a serial port)