Extra bytes - Unix

This is a discussion on Extra bytes - Unix ; Hi all, I've looking for an explanation to this issue, but I couldn't even get a clue about it. I have an Unix application (C++) that starts listening on a socket, then forks in order to accept incoming requests through ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: Extra bytes

  1. Extra bytes

    Hi all,

    I've looking for an explanation to this issue, but I couldn't even get
    a clue about it.

    I have an Unix application (C++) that starts listening on a socket,
    then forks in order to accept incoming requests through this socket.
    When a new request arrives, any child process accept()s it and read()s/
    write()s data through a new socket. Basic sockets programming.

    Almost all the time, everything is fine. Sometimes, one of the
    processes accepts an incoming request, reads the data and... it seems
    to get some "extra bytes", some bytes that wasn't put there by the
    client.

    While the client is sending something like this: "USER:abcdefgh
    \n" (ends with a carriage return), the accepting process reads
    "USER:abcdefghp]]\n". The extra bytes vary (not always "p]]"). The
    request is NEVER truncated, it's just "polluted" by these 2 or 3 extra
    characters.

    The client is a Java application (JRE 1.5.0_11) running at a Windows
    XP SP2 machine. The server is AIX 5.3, running a C++ application.

    I've run NetMon at client side, trying to get some evidence that the
    client (the application, Java libraries or Windows infrastructure) was
    inputing those bytes, but the problem never arose when I was
    monitoring. I've checked the applications code and I can't understand
    how those bytes could appear.

    Any ideas?

    Thanks
    Elisabeth Silva


  2. Re: Extra bytes

    On Wed, 15 Oct 2008 14:33:37 -0700, eb_csilva wrote:

    > Hi all,
    >
    > I've looking for an explanation to this issue, but I couldn't even get a
    > clue about it.
    >
    > I have an Unix application (C++) that starts listening on a socket, then
    > forks in order to accept incoming requests through this socket. When a
    > new request arrives, any child process accept()s it and read()s/
    > write()s data through a new socket. Basic sockets programming.
    >
    > Almost all the time, everything is fine. Sometimes, one of the processes
    > accepts an incoming request, reads the data and... it seems to get some
    > "extra bytes", some bytes that wasn't put there by the client.
    >
    > While the client is sending something like this: "USER:abcdefgh \n"
    > (ends with a carriage return), the accepting process reads
    > "USER:abcdefghp]]\n". The extra bytes vary (not always "p]]"). The
    > request is NEVER truncated, it's just "polluted" by these 2 or 3 extra
    > characters.
    >
    > The client is a Java application (JRE 1.5.0_11) running at a Windows XP
    > SP2 machine. The server is AIX 5.3, running a C++ application.
    >
    > I've run NetMon at client side, trying to get some evidence that the
    > client (the application, Java libraries or Windows infrastructure) was
    > inputing those bytes, but the problem never arose when I was monitoring.
    > I've checked the applications code and I can't understand how those
    > bytes could appear.
    >
    > Any ideas?
    >
    > Thanks
    > Elisabeth Silva


    Is the garbage always at the end of the line ?

    This kind of error is often caused by falsely assuming a nul-terminated
    string, when in fact only a (nonterminated) buffer has been read, eg

    char buff[SIZ];
    rc = read(fd, buff, sizeof buff);
    printf("%s", buff);

    Similar things can happen when you use the string functions (eg index() or
    str[r]chr()) on non-terminated input.
    But, it could also be a miscalculation in buffer-management. Do you resize
    the inputbuffers ? Is there a pattern in the stringlength that triggers
    the bug ?

    HTH,
    AvK


  3. Re: Extra bytes

    On Oct 16, 12:33 am, eb_csilva wrote:
    > Hi all,
    >
    > I've looking for an explanation to this issue, but I couldn't even get
    > a clue about it.
    >

    ....

    Can you post the relevant code?

    Michael.

  4. Re: Extra bytes

    On 16 out, 05:14, mman wrote:
    > On Oct 16, 12:33 am, eb_csilva wrote:> Hi all,
    >
    > > I've looking for an explanation to this issue, but I couldn't even get
    > > a clue about it.

    >
    > ...
    >
    > Can you post the relevant code?
    >
    > Michael.


    Hi Michael,

    Here it goes. And to AvK: good hints, but I don't believe it can be
    explained that way... please look at the code snippet and its
    comments.

    Thanks
    Beth

    ==========================

    char *linestart = null;
    char buffer[1024];
    char *ptr = &(buffer[0]);
    char *buffer_end = ptr;
    char* user_tag = "USER:";
    char *user = NULL;

    while ((readcount = read(sockdes, ptr, sizeof(buffer) - (ptr -
    buffer))) > 0) {

    if (linestart == NULL) {
    linestart = buffer_end;
    }
    buffer_end += readcount;

    /* searches for '\n' */

    while (ptr < buffer_end) {
    if (*ptr == '\n') {

    /* grabbed whole line */

    int sz = (ptr - linestart);

    /* searches for "USER:" tag within the captured stream of bytes

    if (strncmp(linestart, user_tag, strlen(user_tag)) == 0 && (sz >=
    strlen(user_tag)) &&

    user == NULL) {
    int asz;

    /* processes "user" field */

    linestart += strlen(user_tag);

    /* get rid of leading blanks */
    while ((*linestart == ' ') && linestart <= ptr) ++linestart;

    asz = (int)(ptr - linestart);
    user = realloc(user, asz + 1);
    if (asz > 0) {
    strncpy(user, linestart, asz);
    } else {
    *user = '\0';
    }
    }

    if (user != NULL) {

    /* found tag "USER:", followed by some chars, followed by \n */

    /***
    AT THIS POINT, WHEN PRINTS THE *user BUFFER,
    IT SHOWS "abcdefghp]]", WHILE THE CLIENT STATES
    IT HAS SENT "abcdefgh\n".
    NOTE THAT THIS SERVER APP ONLY TRIES TO GET
    THE "USER:" TAG WHEN IT FINDS A \n, SO IT SEEMS THE
    EXTRA BYTES ARE INSERTED INTO THE INPUT (NOT AT
    INPUT END) ***/
    (... processes the user field ...)

    }

    }
    ++ptr;
    }
    }

  5. Re: Extra bytes

    eb_csilva wrote:
    > On 16 out, 05:14, mman wrote:
    >> On Oct 16, 12:33 am, eb_csilva wrote:> Hi all,
    >>
    >>> I've looking for an explanation to this issue, but I couldn't even get
    >>> a clue about it.

    >> ...
    >>
    >> Can you post the relevant code?
    >>
    >> Michael.

    >
    > Hi Michael,
    >
    > Here it goes. And to AvK: good hints, but I don't believe it can be
    > explained that way... please look at the code snippet and its
    > comments.
    >
    > Thanks
    > Beth
    >
    > ==========================
    >
    > char *linestart = null;


    Did you mean NULL? (Follow-up: Is this your actual code, or
    are you asking us to debug a paraphrase?)

    > char buffer[1024];
    > char *ptr = &(buffer[0]);
    > char *buffer_end = ptr;
    > char* user_tag = "USER:";
    > char *user = NULL;
    >
    > [...]
    > user = realloc(user, asz + 1);
    > if (asz > 0) {
    > strncpy(user, linestart, asz);


    Note that the memory pointed to by `user' might not hold a
    valid C string at this point. The first `asz' characters are all
    right, but the value of the byte immediately following them is
    unpredictable. Unless `user[asz]' just happens to be '\0', by
    sheer coincidence, you don't have a string. "Moi's" diagnosis is
    looking better and better ...

    General observation: The C language comes with a library, and
    you could benefit by becoming familiar with more of that library
    than you seem to be aware of at the moment. The memchr() function,
    for example, should prove useful. If you stick a terminating '\0'
    in the input buffer (perhaps replacing the '\n'), strstr() would
    also be very handy.

    --
    Eric.Sosman@sun.com

  6. Re: Extra bytes

    On Thu, 16 Oct 2008 12:10:47 -0400, Eric Sosman wrote:


    >> [...]
    >> user = realloc(user, asz + 1);
    >> if (asz > 0) {
    >>


    { strncpy(user, linestart, asz); user[asz] = 0; }
    or { memcpy(user, linestart, asz); user[asz] = 0; }

    should do the trick.

    >
    > Note that the memory pointed to by `user' might not hold a
    > valid C string at this point. The first `asz' characters are all right,
    > but the value of the byte immediately following them is unpredictable.
    > Unless `user[asz]' just happens to be '\0', by sheer coincidence, you
    > don't have a string. "Moi's" diagnosis is looking better and better ...
    >


    Thanks.

    > General observation: The C language comes with a library, and
    > you could benefit by becoming familiar with more of that library than
    > you seem to be aware of at the moment. The memchr() function, for
    > example, should prove useful. If you stick a terminating '\0' in the
    > input buffer (perhaps replacing the '\n'), strstr() would also be very
    > handy.


    or memstr() which is non-standard^H^H^H^H^H non-existant.

    Please remember: strncpy(dst,src,len) is a bitch: it is guaranteed not to
    write at dst+len and beyond, but in the corner-case (strlen >= len) it
    fails to write a nul character at dst[len-1].

    HTH,
    AvK

  7. Re: Extra bytes

    On 16 out, 14:00, Moi wrote:
    > On Thu, 16 Oct 2008 12:10:47 -0400, Eric Sosman wrote:
    > >> [...]
    > >> * * * * * * * * * * * * * * * *user = realloc(user, asz + 1);
    > >> * * * * * * * * * * * * * * * *if (asz> 0) {

    >
    > * * * * * * * * { strncpy(user, linestart, asz); user[asz] = 0; }
    > or * * * * * * *{ memcpy(user, linestart, asz); user[asz] = 0; }
    >
    > should do the trick.
    >
    >
    >
    > > * * *Note that the memory pointed to by `user' might not hold a
    > > valid C string at this point. *The first `asz' characters are all right,
    > > but the value of the byte immediately following them is unpredictable.
    > > Unless `user[asz]' just happens to be '\0', by sheer coincidence, you
    > > don't have a string. *"Moi's" diagnosis is looking better and better ....

    >
    > Thanks.
    >
    > > * * *General observation: The C language comes with a library, and
    > > you could benefit by becoming familiar with more of that library than
    > > you seem to be aware of at the moment. *The memchr() function, for
    > > example, should prove useful. *If you stick a terminating '\0' in the
    > > input buffer (perhaps replacing the '\n'), strstr() would also be very
    > > handy.

    >
    > or memstr() which is non-standard^H^H^H^H^H non-existant.
    >
    > Please remember: strncpy(dst,src,len) is a bitch: it is guaranteed not to
    > write at dst+len and beyond, but in the corner-case (strlen >= len) it
    > fails to write a nul character at dst[len-1].
    >
    > HTH,
    > AvK


    OK AvK,

    You're right, the flaw in my reasoning was the assumption that
    strncpy() would append a '\0'. It's strange how difficult is to
    reproduce the problem - now that I can understand it, it's amazing
    that it happens so rarely.

    And Eric,

    "In case that ptr is NULL, the [realloc] function behaves exactly as
    malloc, assigning a new block of size bytes and returning a pointer to
    the beginning of it." (from cplusplus.com)
    "The realloc function changes the size of an allocated memory block.
    The memblock argument points to the beginning of the memory block. If
    memblock is NULL, realloc behaves the same way as malloc and allocates
    a new block of size bytes. If memblock is not NULL, it should be a
    pointer returned by a previous call to calloc, malloc, or
    realloc." (from microsoft.com).

    Are you telling me that this behaviour can be platform-dependent or
    vendor-dependent?

    I don't understand your comment "debug a paraphrase". This is my
    production code, from which I stripped out some irrellevant (in my
    opinion) details.

    Thank you all,
    Beth



  8. Re: Extra bytes

    Moi writes:

    > On Thu, 16 Oct 2008 12:10:47 -0400, Eric Sosman wrote:
    >> General observation: The C language comes with a library, and
    >> you could benefit by becoming familiar with more of that library than
    >> you seem to be aware of at the moment. The memchr() function, for
    >> example, should prove useful. If you stick a terminating '\0' in the
    >> input buffer (perhaps replacing the '\n'), strstr() would also be very
    >> handy.

    >
    > or memstr() which is non-standard^H^H^H^H^H non-existant.


    Some systems do have memmem() however, which could be used together with
    strlen(). It's still non-standard, though.

    >
    > Please remember: strncpy(dst,src,len) is a bitch: it is guaranteed not to
    > write at dst+len and beyond, but in the corner-case (strlen >= len) it
    > fails to write a nul character at dst[len-1].


    strlcpy() is better for this reason. It's also non-standard, but if
    your system doesn't have it, you can write your own in three lines.

  9. Re: Extra bytes

    On Thu, 16 Oct 2008 11:04:46 -0700, eb_csilva wrote:


    >>
    >> Please remember: strncpy(dst,src,len) is a bitch: it is guaranteed not
    >> to write at dst+len and beyond, but in the corner-case (strlen >= len)
    >> it fails to write a nul character at dst[len-1].
    >>
    >> HTH,
    >> AvK

    >
    > OK AvK,
    >
    > You're right, the flaw in my reasoning was the assumption that strncpy()
    > would append a '\0'. It's strange how difficult is to reproduce the
    > problem - now that I can understand it, it's amazing that it happens so
    > rarely.


    Well, **** happens...
    A few more observations / advices:
    1) You mix pointers and indexes, which leads to harder to read code.
    ( I guess snipped the code and removed some "fold or reuse input
    buffer when needed" code.)

    2) in these kinds of loops, it is often handy to replace the \n by a \0
    once you found it.
    After that, you con be sure the strxxx() functions work as expected.

    3) strspn() and strcspn() are IMHO the clearest way to deal with
    whitespace skipping. The libraries are your friends.

    > And Eric,
    >
    > "In case that ptr is NULL, the [realloc] function behaves exactly as
    > malloc, assigning a new block of size bytes and returning a pointer to
    > the beginning of it." (from cplusplus.com) "The realloc function changes
    > the size of an allocated memory block. The memblock argument points to
    > the beginning of the memory block. If memblock is NULL, realloc behaves
    > the same way as malloc and allocates a new block of size bytes. If
    > memblock is not NULL, it should be a pointer returned by a previous call
    > to calloc, malloc, or realloc." (from microsoft.com).
    >
    > Are you telling me that this behaviour can be platform-dependent or
    > vendor-dependent?


    They should not be.
    (In this case: realloc(NULL,siz) should be equivalent to malloc(siz) )

    For functions that are part of the standard library, the vendor should
    *not* be considerd a supplier of documentation. Ignore it. If it does
    *not* comply to the standards, inform them or walk away.

    Also: C++ != C. The details do matter. (in comp.lang.c you can get roasted
    for compiling C with a C++ compiler. Or mentioning Herb Schildt ;-)

    > Thank you all,
    > Beth


    You're welcome,
    AvK



  10. Re: Extra bytes

    In <0638f53b-701a-4f2a-80a6-ba6831335b02@k30g2000hse.googlegroups.com> eb_csilva writes:

    > I don't understand your comment "debug a paraphrase". This is my
    > production code, from which I stripped out some irrellevant (in my
    > opinion) details.


    He was commenting on this line of code:

    char *linestart = null;

    In standard C code, "null" is not defined. (NULL is, but "null" is not.)

    Either you omitted the definition of null (which naturally leads one to
    ask what else you omitted), or you mistyped NULL as null (which naturally
    leads one to ask what else you mistyped.)

    --
    John Gordon A is for Amy, who fell down the stairs
    gordon@panix.com B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"


  11. Re: Extra bytes

    eb_csilva wrote:
    > [...]
    > And Eric,
    >
    > "In case that ptr is NULL, the [realloc] function behaves exactly as
    > malloc, assigning a new block of size bytes and returning a pointer to
    > the beginning of it." (from cplusplus.com)
    > "The realloc function changes the size of an allocated memory block.
    > The memblock argument points to the beginning of the memory block. If
    > memblock is NULL, realloc behaves the same way as malloc and allocates
    > a new block of size bytes. If memblock is not NULL, it should be a
    > pointer returned by a previous call to calloc, malloc, or
    > realloc." (from microsoft.com).
    >
    > Are you telling me that this behaviour can be platform-dependent or
    > vendor-dependent?


    No, there was nothing wrong with your use of realloc (except that
    you didn't compare the returned value to NULL to check for failure).
    I was trying to tell you that what your filled the allocated memory
    with wasn't a string, and that its non-stringness could explain the
    presence of the extra characters. I also pointed you at some library
    functions that could replace a good deal of your code.

    > I don't understand your comment "debug a paraphrase". This is my
    > production code, from which I stripped out some irrellevant (in my
    > opinion) details.


    The fact that a pointer was initialized to `null' and later
    compared to `NULL' led me to suspect otherwise. It is, of course,
    perfectly legal to define your own `null' macro or to create a
    pointer variable named `null' with the value `NULL', but it seems
    a strange thing to do. But if you say so ...

    Also, since the problem at hand is extra characters in the
    printed output, stripping out the code that printed the output
    may not have been the very best choice.

    --
    Eric.Sosman@sun.com

  12. Re: Extra bytes

    Eric Sosman wrote in
    news:1224190499.877583@news1nwk:

    > No, there was nothing wrong with your use of realloc
    > (except that
    > you didn't compare the returned value to NULL to check for
    > failure).


    Depending on the context, it might be better to store the value
    returned by realloc in a temporary variable until the returned value
    is checked against NULL, so that the original pointer is not lost
    (possibly causing a leak) if realloc does fail.

    E.g.,

    char* tmp = realloc( user, asz + 1 );
    if ( tmp )
    {
    user = tmp;
    }
    else
    {
    // handle realloc failure
    }

    MV

    --
    I do not want replies; please follow-up to the group.

  13. Re: Extra bytes

    Martin Vuille wrote:
    > Eric Sosman wrote in
    > news:1224190499.877583@news1nwk:
    >
    >> No, there was nothing wrong with your use of realloc
    >> (except that
    >> you didn't compare the returned value to NULL to check for
    >> failure).

    >
    > Depending on the context, it might be better to store the value
    > returned by realloc in a temporary variable until the returned value
    > is checked against NULL, so that the original pointer is not lost
    > (possibly causing a leak) if realloc does fail.
    >
    > E.g.,
    >
    > char* tmp = realloc( user, asz + 1 );
    > if ( tmp )
    > {
    > user = tmp;
    > }
    > else
    > {
    > // handle realloc failure
    > }


    Yes, this is how to use realloc() in most cases. In the O.P.'s
    code, `user' was known to be NULL and there was no "original pointer"
    to be lost. IMHO malloc() would have been a better choice, but ...

    --
    Eric.Sosman@sun.com


  14. Re: Extra bytes

    Eric Sosman wrote in
    news:1224195304.812954@news1nwk:

    >
    > Yes, this is how to use realloc() in most cases. In the
    > O.P.'s
    > code, `user' was known to be NULL and there was no "original
    > pointer" to be lost. IMHO malloc() would have been a better
    > choice, but ...
    >


    I may have scanned the code too quickly but I thought that the
    realloc was in a loop and that the buffer got expanded as more data
    was received over the socket.

    MV

    --
    I do not want replies; please follow-up to the group.

  15. Re: Extra bytes

    Martin Vuille wrote:
    > Eric Sosman wrote in
    > news:1224195304.812954@news1nwk:
    >
    >> Yes, this is how to use realloc() in most cases. In the
    >> O.P.'s
    >> code, `user' was known to be NULL and there was no "original
    >> pointer" to be lost. IMHO malloc() would have been a better
    >> choice, but ...
    >>

    >
    > I may have scanned the code too quickly but I thought that the
    > realloc was in a loop and that the buffer got expanded as more data
    > was received over the socket.


    It's inside a loop, yes, but it's in a block that looks
    like

    if (... && user == NULL) {
    ...
    user = realloc(user, asz + 1);
    ...
    }

    The structure of the code seems inside-out to me, but I
    recall writing contorted code myself when I was new to the
    idioms of string-bashing. That's another reason I suggested
    looking at some of the other library functions: by pulling
    some of the loops out of the visible code they might help the
    O.P. discover that things are simpler than she's making them.

    --
    Eric Sosman
    esosman@ieee-dot-org.invalid

  16. Re: Extra bytes

    Moi writes:

    [...]

    > 2) in these kinds of loops, it is often handy to replace the \n by a \0
    > once you found it.


    \0 is in no way different from 0 (or any other constant integer
    expression whose value is 0).

    [...]

    >> Are you telling me that this behaviour can be platform-dependent or
    >> vendor-dependent?

    >
    > They should not be.
    > (In this case: realloc(NULL,siz) should be equivalent to malloc(siz) )
    >
    > For functions that are part of the standard library, the vendor should
    > *not* be considerd a supplier of documentation. Ignore it.


    'Standards' document situation-specific requirements, eg a conforming
    C implementation shall behave like ... when ... (or a conforming
    UNIX(*)-implementation shall ... when ...). Without consulting vendor
    documentation, it is not even possible to determine if conformance to
    X or Y is even claimed.


  17. Re: Extra bytes

    Eric Sosman wrote in
    news:Pfidnbaidql4mmXVnZ2dnUVZ_j2dnZ2d@comcast.com:

    > It's inside a loop, yes, but it's in a block that looks
    > like
    >
    > if (... && user == NULL) {
    > ...
    > user = realloc(user, asz + 1);
    > ...
    > }


    Ah, thanks. I missed that.

    >
    > The structure of the code seems inside-out to me, but I
    > recall writing contorted code myself when I was new to the
    > idioms of string-bashing. That's another reason I suggested
    > looking at some of the other library functions: by pulling
    > some of the loops out of the visible code they might help the
    > O.P. discover that things are simpler than she's making them.
    >


    Yes, that was part of the reason I did not spend a lot of time on the
    code: it seemed that the best thing to do would be to throw the code
    away and start over again.

    MV

    --
    I do not want replies; please follow-up to the group.

  18. Re: Extra bytes

    Moi wrote:
    > [...]
    > For functions that are part of the standard library, the vendor should
    > *not* be considerd a supplier of documentation. Ignore it. If it does
    > *not* comply to the standards, inform them or walk away.


    I must disagree here. The applicable standards (C, C++,
    POSIX, whatever) often leave some areas undefined or open to
    extension, and the vendor's documentation is the place to look
    for information on those extensions. For example,

    FILE *stream = fopen("foo.dat", "wb,recfm=f,reclen=76");

    is permitted by C and (I think) by POSIX, but you will not
    find the extra goodies described in ISO or POSIX documents.

    The choice of whether you want to rely on such extensions
    is a separate matter, of course, but if you want to use them
    you've got to learn how they work.

    --
    Eric Sosman
    esosman@ieee-dot-org.invalid

  19. Re: Extra bytes

    On Thu, 16 Oct 2008 11:34:26 -0700, Nate Eldredge wrote:
    >> Please remember: strncpy(dst,src,len) is a bitch: it is guaranteed
    >> not to write at dst+len and beyond, but in the corner-case (strlen >=
    >> len) it fails to write a nul character at dst[len-1].

    >
    > strlcpy() is better for this reason. It's also non-standard, but if
    > your system doesn't have it, you can write your own in three lines.


    Its also trivial to copy a BSD licensed strlcpy() implementation and
    conditionally compile it in a program when the system's C library does
    not have it. The source of the Varnish cache reverse proxy includes a
    very good way of doing that.


+ Reply to Thread