[9fans] /net panic - Plan9
This is a discussion on [9fans] /net panic - Plan9 ; muzgo (from irc) was playing around with /net in qemu and came across this gem:
on drawterm:
cpu% cd /net/tcp
cpu% cat clone
23cpu% cd 23
cpu% echo connect 10.0.2.1!12345 >ctl
cpu% cat status
Finwait2 qin 0 qout 0 srtt ...
-
[9fans] /net panic
muzgo (from irc) was playing around with /net in qemu and came across this gem:
on drawterm:
cpu% cd /net/tcp
cpu% cat clone
23cpu% cd 23
cpu% echo connect 10.0.2.1!12345 >ctl
cpu% cat status
Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
katimer.count 159
cpu% echo connect 10.0.2.1!12345 >ctl
cpu%
this causes CPU server to reboot with:
panic: timerstate1
panic: timerstate1
dumpstack disabled
cpu0 exiting
The usage of /net is invalid, but you wouldn't really expect that to
reboot the machine (or maybe it's a holdover from before /dev/reboot
existed?
).
Just tried it on my cpu server and got the same panic, so we can rule
qemu out. I adjusted the ip!port to something that would accept my
connection, and my status was something like Timedwait rather than
Finwait2. Second time around I skipped the cat status and still hit
the panic so the status read isn't affecting things (which is probably
blindingly obvious to anyone familiar with the /net code, but oh
well).
I appear to be running a realtek 8169 nic:
#l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
Don't know what muzgo was using in qemu, but let me know if I can
provide any useful information.
-sqweek
-
Re: [9fans] /net panic
On Fri, Feb 15, 2008 at 7:20 AM, sqweek wrote:
> muzgo (from irc) was playing around with /net in qemu and came across this gem:
>
> on drawterm:
> cpu% cd /net/tcp
> cpu% cat clone
> 23cpu% cd 23
> cpu% echo connect 10.0.2.1!12345 >ctl
> cpu% cat status
> Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
> 65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
> katimer.count 159
> cpu% echo connect 10.0.2.1!12345 >ctl
> cpu%
>
> this causes CPU server to reboot with:
> panic: timerstate1
> panic: timerstate1
> dumpstack disabled
> cpu0 exiting
>
>
> The usage of /net is invalid, but you wouldn't really expect that to
> reboot the machine (or maybe it's a holdover from before /dev/reboot
> existed?
).
> Just tried it on my cpu server and got the same panic, so we can rule
> qemu out. I adjusted the ip!port to something that would accept my
> connection, and my status was something like Timedwait rather than
> Finwait2. Second time around I skipped the cat status and still hit
> the panic so the status read isn't affecting things (which is probably
> blindingly obvious to anyone familiar with the /net code, but oh
> well).
>
> I appear to be running a realtek 8169 nic:
> #l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
>
> Don't know what muzgo was using in qemu, but let me know if I can
> provide any useful information.
> -sqweek
>
no *strange* things running, i guess.
iru
-
Re: [9fans] /net panic
On Fri, Feb 15, 2008 at 1:19 PM, Iruata Souza wrote:
>
> On Fri, Feb 15, 2008 at 7:20 AM, sqweek wrote:
> > muzgo (from irc) was playing around with /net in qemu and came across this gem:
> >
> > on drawterm:
> > cpu% cd /net/tcp
> > cpu% cat clone
> > 23cpu% cd 23
> > cpu% echo connect 10.0.2.1!12345 >ctl
> > cpu% cat status
> > Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
> > 65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
> > katimer.count 159
> > cpu% echo connect 10.0.2.1!12345 >ctl
> > cpu%
> >
> > this causes CPU server to reboot with:
> > panic: timerstate1
> > panic: timerstate1
> > dumpstack disabled
> > cpu0 exiting
> >
> >
> > The usage of /net is invalid, but you wouldn't really expect that to
> > reboot the machine (or maybe it's a holdover from before /dev/reboot
> > existed?
).
> > Just tried it on my cpu server and got the same panic, so we can rule
> > qemu out. I adjusted the ip!port to something that would accept my
> > connection, and my status was something like Timedwait rather than
> > Finwait2. Second time around I skipped the cat status and still hit
> > the panic so the status read isn't affecting things (which is probably
> > blindingly obvious to anyone familiar with the /net code, but oh
> > well).
> >
> > I appear to be running a realtek 8169 nic:
> > #l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
> >
> > Don't know what muzgo was using in qemu, but let me know if I can
> > provide any useful information.
> > -sqweek
> >
>
> no *strange* things running, i guess.
>
> iru
>
some info about my environment is in http://iru.oitobits.net/9netpanic/ where:
QEMU_Plan9 - sh script to run emulated Plan 9
cpuemu_config.tgz - CPU server's /cfg/cpuemu
qemu-ifup - sh script to up host<->guest tun interfaces, Plan 9 gets tun0
ifconfig.tun0 - tun interface configuration
listen.c - listener running on host
panic - complete scenario of snap pasted by sqweek
iru
-
-
Re: [9fans] /net panic
On Fri, Feb 15, 2008 at 2:23 PM, erik quanstrom wrote:
>
>i'm not sure this is a perfect solution. i just don't have enough
>of the plan 9 ip stack loaded into cache to be sure nothing's
>been forgotten. but give this patch a whirl. basically, i think
>the problem is that inittcpctl() was stepping on timers that might
>have been active. these timers need to be shutdown. unfortunately,
>tcpclose() and localclose() are too agressive. cleanupconnection()
>is a chopped-down version of localclose.
>
>- erik
>
>
>/n/sources/plan9//sys/src/9/ip/tcp.c:782,787 - tcp.c:782,813
> return mtu;
> }
>
>+ static void
>+ cleanupconnection(Conv *s)
>+ {
>+ Tcpctl *tcb;
>+ Reseq *rp,*rp1;
>+ Tcppriv *tpriv;
>+
>+ tpriv = s->p->priv;
>+ tcb = (Tcpctl*)s->ptcl;
>+
>+ iphtrem(&tpriv->ht, s);
>+
>+ tcphalt(tpriv, &tcb->timer);
>+ tcphalt(tpriv, &tcb->rtt_timer);
>+ tcphalt(tpriv, &tcb->acktimer);
>+ tcphalt(tpriv, &tcb->katimer);
>+
>+ /* Flush reassembly queue; nothing more can arrive */
>+ for(rp = tcb->reseq; rp != nil; rp = rp1) {
>+ rp1 = rp->next;
>+ freeblist(rp->bp);
>+ free(rp);
>+ }
>+ tcb->reseq = nil;
>+ }
>+
> void
> inittcpctl(Conv *s, int mode)
> {
>/n/sources/plan9//sys/src/9/ip/tcp.c:792,798 - tcp.c:818,827
>
> tcb = (Tcpctl*)s->ptcl;
>
>- memset(tcb, 0, sizeof(Tcpctl));
>+ if(tcb->timer.arg) // c->state != Idle?
>+ cleanupconnection(s);
>+ else
>+ memset(tcb, 0, sizeof(Tcpctl));
>
> tcb->ssthresh = 65535;
> tcb->srtt = tcp_irtt<
>
works for me.
I don't know the internal workings of the plan 9 ip stack so I take
the risk of being silly: could be that the bug is not tcp only?
iru
-
Re: [9fans] /net panic
> works for me.
> I don't know the internal workings of the plan 9 ip stack so I take
> the risk of being silly: could be that the bug is not tcp only?
> iru
no. the problem is that active tcp timers are overwritten.
all the tcp timer code is contained within ip/tcp.c
- erik