UNICODE system calls? - Linux

This is a discussion on UNICODE system calls? - Linux ; Does Linux provide unicode (wide char) versions of system calls? Like open() for example? thx -- _______________ hello friend! Please don't reply to me about : top, bottom, or middle posting, grammar, spelling. Thank you!...

+ Reply to Thread
Results 1 to 5 of 5

Thread: UNICODE system calls?

  1. UNICODE system calls?

    Does Linux provide unicode (wide char) versions of system calls? Like open() for
    example?

    thx
    --
    _______________
    hello friend!
    Please don't reply to me about: top, bottom, or middle posting, grammar,
    spelling. Thank you!

  2. Re: UNICODE system calls?

    On Fri, 25 Jan 2008 13:58:21 -0700, takeout wrote:

    > Does Linux provide unicode (wide char) versions of system calls? Like
    > open() for example?


    I don't think so. Linux uses the UTF-8 representation for unicode
    filenames, and UTF-8 is compatible with the existing open() call (see
    "man utf-8" for info). There are C library functions to convert between
    wchar_t strings and UTF-8.

    In addition, on Unix systems there is really no difference between text
    and binary for read() and write(), so there would be no reason to have
    special calls for wide characters.

    Many higher-level libraries provide special handling for unicode, but
    the kernel doesn't really deal with that AFAIK.


    --
    -| Bob Hauck
    -| http://www.haucks.org/

  3. Re: UNICODE system calls?

    ok, makes sense. thanks
    _______________
    hello friend!
    Please don't reply to me about: top, bottom, or middle posting, grammar,
    spelling. Thank you!

  4. Re: UNICODE system calls?

    On Fri, 25 Jan 2008 19:19:13 -0500 Bob Hauck wrote:

    | On Fri, 25 Jan 2008 13:58:21 -0700, takeout wrote:
    |
    |> Does Linux provide unicode (wide char) versions of system calls? Like
    |> open() for example?
    |
    | I don't think so. Linux uses the UTF-8 representation for unicode
    | filenames, and UTF-8 is compatible with the existing open() call (see
    | "man utf-8" for info). There are C library functions to convert between
    | wchar_t strings and UTF-8.
    |
    | In addition, on Unix systems there is really no difference between text
    | and binary for read() and write(), so there would be no reason to have
    | special calls for wide characters.
    |
    | Many higher-level libraries provide special handling for unicode, but
    | the kernel doesn't really deal with that AFAIK.

    An added point here. UTF-8 was specifically designed to make things work
    reasonably well in environments that treat ASCII characters with special
    meanings. Only the upper 128 code bank is used to encode Unicode values
    greater than 128. The lower 128 code bank is just ASCI as-is. So when
    you use UTF-8 for file NAMES, the '/' still works correctly to separate
    directory levels. There would be no single byte with the '/' value unless
    it really is the '/' character as opposed to any other Unicode character.
    Although some programs and display devices might not like UTF-8 encodings,
    it should be entirely transparent as file names, aside from the decreased
    length limit depending on the actual codes (e.g. the 255 character limit
    for one file simple name is the limit on the UTF-8 encoded result, which
    can be fewer than 255 Unicode characters).

    --
    |---------------------------------------/----------------------------------|
    | Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
    | first name lower case at ipal.net / spamtrap-2008-01-25-2045@ipal.net |
    |------------------------------------/-------------------------------------|

  5. Re: UNICODE system calls?

    Bob Hauck writes:

    > On Fri, 25 Jan 2008 13:58:21 -0700, takeout wrote:
    >
    >> Does Linux provide unicode (wide char) versions of system calls? Like
    >> open() for example?

    >
    > I don't think so. Linux uses the UTF-8 representation for unicode
    > filenames,


    That's not entirely accurate. Linux, and indeed any POSIX system,
    does not interpret filenames according to any encoding. They are
    merely a sequence of bytes. The only bytes with a special meaning are
    the nul byte, which terminates the filename, and the ASCII '/'
    character, which is the directory delimiter.

    --
    Måns Rullgård
    mans@mansr.com

+ Reply to Thread