[9fans] Acme Edit/tcs and different character sets - Plan9

This is a discussion on [9fans] Acme Edit/tcs and different character sets - Plan9 ; Hey, Today I was trying to deal with some Japanese text data in acme and tried to Edit ,|tcs -f ms-kanji on the text. It ended up as gibberish. However when I did a regular tcs -f ms-kanji on the ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: [9fans] Acme Edit/tcs and different character sets

  1. [9fans] Acme Edit/tcs and different character sets

    Hey,

    Today I was trying to deal with some Japanese text data in acme and
    tried to Edit ,|tcs -f ms-kanji on the text. It ended up as gibberish.
    However when I did a regular tcs -f ms-kanji on the file outside of
    acme it worked. Can anybody who understands Edit and the way that acme
    deals with non unicode text explain to me what is going wrong here? Is
    it fixable? If so what do I need to do to make it work?

    Best Regards,

    Noah

  2. Re: [9fans] Acme Edit/tcs and different character sets

    I have a suspicion that your file gets munged when acme reads it as acme
    is expecting valid unicode and ms-kanji is not valid unicode. I think
    you have no option but to translate to/from MS outside acme.

    If this is a common problem for you then you could write a little file server
    which envokes tcs to translate to and from an ms-kanji transparently and run it
    behind acme (so acme inherits its namespace).

    -Steve

  3. Re: [9fans] Acme Edit/tcs and different character sets

    Acme treats all text as UTF-8. If the input text was ms-kanji, it won't
    be UTF-8 and when acme reads it, it will end up full of encoding errors
    - represented in UTF-8. Running that UTF-8 text back through
    tcs -f ms-kanji will produce gibberish.

    You need to use tcs on the raw files before putting them into the editor
    (or almost any other Plan 9 tool).

    -rob

  4. Re: [9fans] Acme Edit/tcs and different character sets

    > Acme treats all text as UTF-8. If the input text was ms-kanji, it won't
    > be UTF-8 and when acme reads it, it will end up full of encoding errors
    > - represented in UTF-8. Running that UTF-8 text back through
    > tcs -f ms-kanji will produce gibberish.
    >
    > You need to use tcs on the raw files before putting them into the editor
    > (or almost any other Plan 9 tool).
    >
    > -rob


    one of the best decisions made in plan 9 is to have
    one character set. there are a few downsides, but
    plan 9 doesn't need locals and the tools may be ignorant
    of other character sets.

    gnu grep is a good example of why locals are a bad idea.

    - erik


+ Reply to Thread