[9fans] Acme Edit/tcs and different character sets - Plan9
This is a discussion on [9fans] Acme Edit/tcs and different character sets - Plan9 ; Hey,
Today I was trying to deal with some Japanese text data in acme and
tried to Edit ,|tcs -f ms-kanji on the text. It ended up as gibberish.
However when I did a regular tcs -f ms-kanji on the ...
-
[9fans] Acme Edit/tcs and different character sets
Hey,
Today I was trying to deal with some Japanese text data in acme and
tried to Edit ,|tcs -f ms-kanji on the text. It ended up as gibberish.
However when I did a regular tcs -f ms-kanji on the file outside of
acme it worked. Can anybody who understands Edit and the way that acme
deals with non unicode text explain to me what is going wrong here? Is
it fixable? If so what do I need to do to make it work?
Best Regards,
Noah
-
Re: [9fans] Acme Edit/tcs and different character sets
I have a suspicion that your file gets munged when acme reads it as acme
is expecting valid unicode and ms-kanji is not valid unicode. I think
you have no option but to translate to/from MS outside acme.
If this is a common problem for you then you could write a little file server
which envokes tcs to translate to and from an ms-kanji transparently and run it
behind acme (so acme inherits its namespace).
-Steve
-
Re: [9fans] Acme Edit/tcs and different character sets
Acme treats all text as UTF-8. If the input text was ms-kanji, it won't
be UTF-8 and when acme reads it, it will end up full of encoding errors
- represented in UTF-8. Running that UTF-8 text back through
tcs -f ms-kanji will produce gibberish.
You need to use tcs on the raw files before putting them into the editor
(or almost any other Plan 9 tool).
-rob
-
Re: [9fans] Acme Edit/tcs and different character sets
> Acme treats all text as UTF-8. If the input text was ms-kanji, it won't
> be UTF-8 and when acme reads it, it will end up full of encoding errors
> - represented in UTF-8. Running that UTF-8 text back through
> tcs -f ms-kanji will produce gibberish.
>
> You need to use tcs on the raw files before putting them into the editor
> (or almost any other Plan 9 tool).
>
> -rob
one of the best decisions made in plan 9 is to have
one character set. there are a few downsides, but
plan 9 doesn't need locals and the tools may be ignorant
of other character sets.
gnu grep is a good example of why locals are a bad idea.
- erik