[9fans] a question on APE - Plan9

This is a discussion on [9fans] a question on APE - Plan9 ; we're doing some work here with Andrey's port of ssh2. It *almost* works. But I'm seeing a stack trace I don't understand. I can't give you all the details -- it's ssh, therefore it is pretty awful -- but here ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: [9fans] a question on APE

  1. [9fans] a question on APE

    we're doing some work here with Andrey's port of ssh2. It *almost*
    works. But I'm seeing a stack trace I don't understand.

    I can't give you all the details -- it's ssh, therefore it is pretty
    awful -- but here is the short form: There is a proc called fromnet()
    which has this inner loop:
    for(;{
    if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
    write(1, buf, n);
    else
    goto Donenet;
    }

    When this proc is entered, ape has forked off two procs to handle the
    fd 'c'. From the fromnet function, we see the libssh2_channel_read
    does a select. here is where I get confused. The stk() for the two
    procs looks like this:
    pread()+0x7 /sys/src/libc/9syscall/pread.s:5
    read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
    recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
    libssh2_packet_read(session=0x1102f8)+0x176
    /usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
    libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
    /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
    fromnet(c=0x114460,s=0x1102f8)+0x2e
    /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    main(argc=0x2,argv=0xdfffef94)+0x47c
    /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    _main+0x31 /sys/src/libc/386/main9.s:16

    The the read on fd 5. That's the socket. Here is the other proc.

    _PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
    _READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
    _copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
    _startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
    select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0 ,efds=0x0,nfds=0x6)+0xe9
    /sys/src/ape/lib/ap/plan9/_buf.c:292
    libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0 x7b
    /usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
    libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x69
    /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
    fromnet(c=0x114460,s=0x1102f8)+0x2e
    /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    main(argc=0x2,argv=0xdfffef94)+0x47c
    /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    _main+0x31 /sys/src/libc/386/main9.s:16

    ok, I think this stack is a bit messed up, since I don't see how we
    can have the coyproc in the call chain from select(), but ... is it?

    I realize there is very little information here, sorry ... here's what
    is bothering me. It seems we have two procs hanging on a read on fd 5.
    I think the copyproc and some other proc are in conflict but ... I am
    unsure. The problems we are seeing might be explained by the wrong
    proc grabbing output at the wrong time -- it feels like a race
    condition. And acid trips we can take to hammer this one down?

    Anyone ever done a select on a socket in ape?

  2. Re: [9fans] a question on APE

    > Anyone ever done a select on a socket in ape?
    >


    the links port does that and it works fine, at least for a while.

    the code snippet you gave is suspect, although i don't know how that
    relates to the stack trace. libssh2 lacks documentation, but from the
    little that i read libssh2_channel_read() can return zero without
    receiving EOF from the remote site. one needs to go through
    libssh2_channel_eof() or something to that effect to check whether the
    other side closed, and the code above doesn't do it (it's my fault, i
    hadn't gotten to debugging that part).

    then the code needs to do it for stderr too

  3. Re: [9fans] a question on APE

    ron minnich wrote:

    >we're doing some work here with Andrey's port of ssh2. It *almost*
    >works. But I'm seeing a stack trace I don't understand.
    >
    >I can't give you all the details -- it's ssh, therefore it is pretty
    >awful -- but here is the short form: There is a proc called fromnet()
    >which has this inner loop:
    > for(;{
    > if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
    > write(1, buf, n);
    > else
    > goto Donenet;
    > }
    >
    >When this proc is entered, ape has forked off two procs to handle the
    >fd 'c'. From the fromnet function, we see the libssh2_channel_read
    >does a select. here is where I get confused. The stk() for the two
    >procs looks like this:
    >pread()+0x7 /sys/src/libc/9syscall/pread.s:5
    >read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
    >recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
    >libssh2_packet_read(session=0x1102f8)+0x176
    >/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
    >libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
    >/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
    >fromnet(c=0x114460,s=0x1102f8)+0x2e
    >/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    >main(argc=0x2,argv=0xdfffef94)+0x47c
    >/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    >_main+0x31 /sys/src/libc/386/main9.s:16
    >
    >The the read on fd 5. That's the socket. Here is the other proc.
    >
    >_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
    >_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
    >_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
    >_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
    >select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0 ,efds=0x0,nfds=0x6)+0xe9
    >/sys/src/ape/lib/ap/plan9/_buf.c:292
    >libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0 x7b
    >/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
    >libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x69
    >/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
    >fromnet(c=0x114460,s=0x1102f8)+0x2e
    >/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    >main(argc=0x2,argv=0xdfffef94)+0x47c
    >/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    >_main+0x31 /sys/src/libc/386/main9.s:16
    >
    >ok, I think this stack is a bit messed up, since I don't see how we
    >can have the coyproc in the call chain from select(), but ... is it?
    >
    >
    >

    Plan9 has no select functionality. Select is emulated in APE by forking
    a childproc that reads an fd and
    fills a buffer (on a shared memory area). Read() should then pick up the
    data from the buffer and
    wakeup the reader proc if it sleeps (because the buffer got filled up).
    Select() will startup such a
    reader proc (startbuf()) if it is not already "bufferd" and then check
    if the buffer has data available,
    so the stacktrace looks valid to me.

    Maybe the bufferd filedescriptors doesnt work with the recv() call and
    are only implemented for read()?
    I think you should find some kind of switch in read() that checks if the
    fd is bufferd and then calls
    some _buf.c function that copies the data from the buffer.
    Maybe this is missing for recv()?

    >I realize there is very little information here, sorry ... here's what
    >is bothering me. It seems we have two procs hanging on a read on fd 5.
    >I think the copyproc and some other proc are in conflict but ... I am
    >unsure. The problems we are seeing might be explained by the wrong
    >proc grabbing output at the wrong time -- it feels like a race
    >condition. And acid trips we can take to hammer this one down?
    >
    >Anyone ever done a select on a socket in ape?
    >
    >
    >



  4. Re: [9fans] a question on APE

    Kernel Panic wrote:

    > ron minnich wrote:
    >
    >> we're doing some work here with Andrey's port of ssh2. It *almost*
    >> works. But I'm seeing a stack trace I don't understand.
    >>
    >> I can't give you all the details -- it's ssh, therefore it is pretty
    >> awful -- but here is the short form: There is a proc called fromnet()
    >> which has this inner loop:
    >> for(;{
    >> if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
    >> write(1, buf, n);
    >> else
    >> goto Donenet;
    >> }
    >>
    >> When this proc is entered, ape has forked off two procs to handle the
    >> fd 'c'. From the fromnet function, we see the libssh2_channel_read
    >> does a select. here is where I get confused. The stk() for the two
    >> procs looks like this:
    >> pread()+0x7 /sys/src/libc/9syscall/pread.s:5
    >> read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
    >> recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e
    >> /sys/src/ape/lib/bsd/send.c:30
    >> libssh2_packet_read(session=0x1102f8)+0x176
    >> /usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
    >> libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
    >>
    >> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
    >> fromnet(c=0x114460,s=0x1102f8)+0x2e
    >> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    >> main(argc=0x2,argv=0xdfffef94)+0x47c
    >> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    >> _main+0x31 /sys/src/libc/386/main9.s:16
    >>
    >> The the read on fd 5. That's the socket. Here is the other proc.
    >>
    >> _PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
    >> _READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f
    >> /sys/src/ape/lib/ap/plan9/9read.c:10
    >> _copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
    >> _startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
    >> select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0 ,efds=0x0,nfds=0x6)+0xe9
    >>
    >> /sys/src/ape/lib/ap/plan9/_buf.c:292
    >> libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0 x7b
    >> /usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
    >> libssh2_channel_read_ex(channel=0x114460,buflen=0x 1000,stream_id=0x0,buf=0xdfffdee8)+0x69
    >>
    >> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
    >> fromnet(c=0x114460,s=0x1102f8)+0x2e
    >> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
    >> main(argc=0x2,argv=0xdfffef94)+0x47c
    >> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
    >> _main+0x31 /sys/src/libc/386/main9.s:16
    >>
    >> ok, I think this stack is a bit messed up, since I don't see how we
    >> can have the coyproc in the call chain from select(), but ... is it?

    >


    Ahh... just looked at the code...
    Ok, as i expected... recv() calls a different read() from
    /sys/src/libc/9sys/read.c. It will all work if
    recv() would call the thing from this one:
    /sys/src/ape/lib/ap/plan9/read.c.

    I guess you could work arround it by using read() instead of recv() in
    ssh-code, but the right
    thing is to fix ape and have recv() call the read() from ap/plan9/read.c.

    > Plan9 has no select functionality. Select is emulated in APE by
    > forking a childproc that reads an fd and
    > fills a buffer (on a shared memory area). Read() should then pick up
    > the data from the buffer and
    > wakeup the reader proc if it sleeps (because the buffer got filled
    > up). Select() will startup such a
    > reader proc (startbuf()) if it is not already "bufferd" and then check
    > if the buffer has data available,
    > so the stacktrace looks valid to me.
    >
    > Maybe the bufferd filedescriptors doesnt work with the recv() call and
    > are only implemented for read()?
    > I think you should find some kind of switch in read() that checks if
    > the fd is bufferd and then calls
    > some _buf.c function that copies the data from the buffer.
    > Maybe this is missing for recv()?
    >
    >> I realize there is very little information here, sorry ... here's what
    >> is bothering me. It seems we have two procs hanging on a read on fd 5.
    >> I think the copyproc and some other proc are in conflict but ... I am
    >> unsure. The problems we are seeing might be explained by the wrong
    >> proc grabbing output at the wrong time -- it feels like a race
    >> condition. And acid trips we can take to hammer this one down?
    >>
    >> Anyone ever done a select on a socket in ape?
    >>



  5. Re: [9fans] a question on APE

    On Dec 17, 2007 11:54 PM, Kernel Panic wrote:

    > Ahh... just looked at the code...
    > Ok, as i expected... recv() calls a different read() from
    > /sys/src/libc/9sys/read.c. It will all work if
    > recv() would call the thing from this one:
    > /sys/src/ape/lib/ap/plan9/read.c.


    I'm not seeing that. I would be happy if you are right but I can't confirm it.

    I run acid on the binary:
    recv 0x000cd7b6 SUBL $0x10,SP
    recv+0x3 0x000cd7b9 MOVL flags+0xc(FP),AX
    recv+0x7 0x000cd7bd ANDL $0x1,AX
    recv+0xa 0x000cd7c0 CMPL AX,$0x0
    recv+0xd 0x000cd7c3 JEQ recv+0x22(SB)
    recv+0xf 0x000cd7c5 MOVL $0x29,errno(SB)
    recv+0x19 0x000cd7cf MOVL $0xffffffff,AX
    recv+0x1e 0x000cd7d4 ADDL $0x10,SP
    recv+0x21 0x000cd7d7 RET
    recv+0x22 0x000cd7d8 MOVL fd+0x0(FP),CX
    recv+0x26 0x000cd7dc MOVL CX,0x0(SP)
    recv+0x29 0x000cd7df MOVL a+0x4(FP),CX
    recv+0x2d 0x000cd7e3 MOVL CX,0x4(SP)
    recv+0x31 0x000cd7e7 MOVL n+0x8(FP),CX
    recv+0x35 0x000cd7eb MOVL CX,0x8(SP)
    recv+0x39 0x000cd7ef CALL read(SB)
    recv+0x3e 0x000cd7f4 ADDL $0x10,SP
    recv+0x41 0x000cd7f7 RET

    so it calls read.

    Read is this:
    read 0x000c3834 SUBL $0x28,SP
    read+0x3 0x000c3837 MOVL nbytes+0x8(FP),DI
    read+0x7 0x000c383b MOVL buf+0x4(FP),SI
    read+0xb 0x000c383f MOVL d+0x0(FP),BX
    read+0xf 0x000c3843 CMPL BX,$0x0
    read+0x12 0x000c3846 JLT read+0x19(SB)
    read+0x14 0x000c3848 CMPL BX,$0x60
    read+0x17 0x000c384b JLT read+0x2c(SB)
    read+0x19 0x000c384d MOVL $0x4,errno(SB)
    read+0x23 0x000c3857 MOVL $0xffffffff,AX
    read+0x28 0x000c385c ADDL $0x28,SP
    read+0x2b 0x000c385f RET
    read+0x2c 0x000c3860 LEAL 0x0(BX)(BX*4),CX
    read+0x2f 0x000c3863 SHLL $0x2,CX
    read+0x32 0x000c3866 LEAL _fdinfo(SB)(CX*1),AX
    read+0x39 0x000c386d MOVL 0x0(AX),AX
    read+0x3b 0x000c386f ANDL $0x2,AX
    read+0x3e 0x000c3872 CMPL AX,$0x0
    read+0x41 0x000c3875 JEQ read+0x19(SB)
    read+0x43 0x000c3877 CMPL DI,$0x0
    read+0x46 0x000c387a JHI read+0x4e(SB)
    read+0x48 0x000c387c XORL AX,AX
    read+0x4a 0x000c387e ADDL $0x28,SP
    read+0x4d 0x000c3881 RET
    read+0x4e 0x000c3882 CMPL SI,$0x0
    read+0x51 0x000c3885 JNE read+0x66(SB)
    read+0x53 0x000c3887 MOVL $0x9,errno(SB)
    read+0x5d 0x000c3891 MOVL $0xffffffff,AX
    read+0x62 0x000c3896 ADDL $0x28,SP
    read+0x65 0x000c3899 RET

    which is the ape version. There is only one read symbol in the binary,
    and it's a T.

    So I am not convinced the recv is calling the wrong thing. That said,
    I'm still going to change it in source to see what happens :-)

    ron

+ Reply to Thread