DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ... - Debian

This is a discussion on DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ... - Debian ; CVSROOT: /cvs/debian-doc Module name: ddp Changes by: jseidel 05/01/09 09:30:55 Modified files: manuals.sgml/release-notes/de: release-notes.de.sgml Log message: use latin1 encoding instead of HTML entities to simplify proofreading and to increase compatibility with various tools -- To UNSUBSCRIBE, email to debian-doc-REQUEST@lists.debian.org with ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

  1. DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    CVSROOT: /cvs/debian-doc
    Module name: ddp
    Changes by: jseidel 05/01/09 09:30:55

    Modified files:
    manuals.sgml/release-notes/de: release-notes.de.sgml

    Log message:
    use latin1 encoding instead of HTML entities to simplify proofreading and to increase compatibility with various tools


    --
    To UNSUBSCRIBE, email to debian-doc-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  2. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:
    > CVSROOT: /cvs/debian-doc
    > Module name: ddp
    > Changes by: jseidel 05/01/09 09:30:55
    >
    > Modified files:
    > manuals.sgml/release-notes/de: release-notes.de.sgml
    >
    > Log message:
    > use latin1 encoding instead of HTML entities to simplify proofreading
    > and to increase compatibility with various tools


    And which tools are that?
    I think you broke compatibility with XML & SGML tools. [1]


    Cheers
    Geert Stappers

    [1] both are ASCII only.


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFB4WJyOSINbgwa/7sRAmTbAJ47bPjeIfIaMw5ZQZdVyaGNqkCzkQCfV6YQ
    i/1pn67VHMeH1hXQT0hCXbo=
    =PxD5
    -----END PGP SIGNATURE-----


  3. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:
    > > CVSROOT: /cvs/debian-doc
    > > Module name: ddp
    > > Changes by: jseidel 05/01/09 09:30:55
    > >
    > > Modified files:
    > > manuals.sgml/release-notes/de: release-notes.de.sgml
    > >
    > > Log message:
    > > use latin1 encoding instead of HTML entities to simplify proofreading
    > > and to increase compatibility with various tools

    >
    > And which tools are that?
    > I think you broke compatibility with XML & SGML tools. [1]
    > [1] both are ASCII only.


    OK, your're partially right.

    The old code using ü and ß in each third word makes the SGML
    source code nearly unreadable. Please note that the previous version
    already contained a mixture of latin1 and ASCII.

    I apply very often a grep -ri "errror" on German texts I maintain and
    proofread to find and fix errors. I also wrote a few minor scripts which
    check for permutation of characters, ... which is much faster than spell
    checkers.

    IIRC aspell or ispell has trouble with HTML entities (I'm not sure about
    this). I know both support a HTML option but ...

    I also found a missing umlaut in the output (last PDF page) because

    was used but which works great using name="‹bersetzungen".

    PS: Please note that the website uses latin1 encoding as well. Nobody
    complained ...

    Jens


    --
    To UNSUBSCRIBE, email to debian-doc-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  4. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 06:24:17PM +0100, Jens Seidel wrote:
    > On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:


    > > > Modified files:
    > > > manuals.sgml/release-notes/de: release-notes.de.sgml
    > > >
    > > > Log message:
    > > > use latin1 encoding instead of HTML entities to simplify proofreading
    > > > and to increase compatibility with various tools

    > >
    > > And which tools are that?
    > > I think you broke compatibility with XML & SGML tools. [1]
    > > [1] both are ASCII only.

    >
    > OK, your're partially right.
    >
    > The old code using ü and ß in each third word makes the SGML
    > source code nearly unreadable. Please note that the previous version
    > already contained a mixture of latin1 and ASCII.
    >
    > I apply very often a grep -ri "errror" on German texts I maintain and
    > proofread to find and fix errors. I also wrote a few minor scripts which
    > check for permutation of characters, ... which is much faster than spell
    > checkers.
    >
    > IIRC aspell or ispell has trouble with HTML entities (I'm not sure about
    > this). I know both support a HTML option but ...
    >
    > I also found a missing umlaut in the output (last PDF page) because
    >
    > was used but which works great using name="‹bersetzungen".
    >
    > PS: Please note that the website uses latin1 encoding as well. Nobody
    > complained ...


    My concern is that the source is not pure 7-bit ASCII.
    It should ASCII only for XML and SGML.

    Jens, that you spend time on the release notes is good.
    I do respect that.

    But your arguments to break stuff are poor.

    * The website has latin1
    That is because it is converted to latin1
    With latin1 "precompiled" codes, you can't convert to other encodings

    * aspell can't handle HTML entities.
    Then use another tool or aspell on another file format

    * the source had already latin1 codes.
    That has you set on the wrong track,
    but is no excuus to go further downhill

    * it is hard to proofread größe dateien
    Consider the SGML source as computer program source
    and proofreading is running the programm.
    Each "bug" you find, has to be modified in the source.
    The edit-compile-test cycle can indeed be boring,
    you shouldn't cheat by implementing "compiled blobs"
    in the source code.


    As one volunteer to another volunteer:

    Please revert the latin1 changes



    Cheers
    Geert Stappers

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFB4YBaOSINbgwa/7sRAhRMAJ4ka8wxgrGbrMaPNpRJ94M2Co8LegCgyvWA
    L+K14W7qviStXeH2Bo0qt8I=
    =hQn3
    -----END PGP SIGNATURE-----


  5. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 08:04:58PM +0100, Geert Stappers wrote:
    > On Sun, Jan 09, 2005 at 06:24:17PM +0100, Jens Seidel wrote:
    > > On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > > > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:

    >
    > > > > Modified files:
    > > > > manuals.sgml/release-notes/de: release-notes.de.sgml
    > > > >
    > > > > Log message:
    > > > > use latin1 encoding instead of HTML entities to simplify proofreading
    > > > > and to increase compatibility with various tools
    > > >
    > > > I think you broke compatibility with XML & SGML tools. [1]
    > > > [1] both are ASCII only.

    > >
    > > The old code using ü and ß in each third word makes the SGML
    > > source code nearly unreadable. Please note that the previous version
    > > already contained a mixture of latin1 and ASCII.

    [snip]
    > > I also found a missing umlaut in the output (last PDF page) because
    > >
    > > was used but which works great using name="‹bersetzungen".

    >
    > My concern is that the source is not pure 7-bit ASCII.


    Please note that all versions of Release Notes (except English one)
    use 8bit characters. Do you really expect that a Japanese translator
    writes &entity1;&entity2;&entity3;...?

    > It should ASCII only for XML and SGML.


    But debiandoc-sgml supports all common locales, especially latin1,
    latin2,...

    > Jens, that you spend time on the release notes is good.
    > I do respect that.
    >
    > But your arguments to break stuff are poor.


    Maybe. I agree that pure ASCII has advantages but the current file can
    easily converted to another locale using iconv, recode, konwert, ...

    > * it is hard to proofread größe dateien


    It is!

    > Consider the SGML source as computer program source
    > and proofreading is running the programm.
    > Each "bug" you find, has to be modified in the source.
    > The edit-compile-test cycle can indeed be boring,
    > you shouldn't cheat by implementing "compiled blobs"
    > in the source code.


    I would really like to know other people's opinions.

    > As one volunteer to another volunteer:
    >
    > Please revert the latin1 changes


    First I will wait for more feedback. If the majority agrees with you I
    will revert my changes (but what about name="Übersetzungen" which
    doesn't work for url tags?).

    Jens


    --
    To UNSUBSCRIBE, email to debian-doc-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  6. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sunday 09 January 2005 21:38, Jens Seidel wrote:
    > First I will wait for more feedback. If the majority agrees with you I
    > will revert my changes (but what about name="Übersetzungen" which
    > doesn't work for url tags?).


    The Dutch version is in ISO-8859-1. I tried converting the sgml file to
    UTF-8 the first time I did the translation, but somehow the build system
    didn't like that.

    IMHO the people who do the building of the Release Notes should have the
    final say in this. If they want things in a different encoding, then fine
    by me, but as the current system works and I am not aware of any
    practical disadvantages, I for one see no reason to change it.
    I could see arguments to change all translations to UTF-8 and would have
    no problems with that.

    BTW. AFAIK the default encoding for XML files is UTF-8. Other encodings
    are allowed if an encoding header is added, like:

    At least, that's how it works for the Installation Manual.

    Cheers,
    Frans Pop

    P.S. No way am I going to use è if Ť is acceptable as well...

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQBB4ZyNgm/Kwh6ICoQRAkbSAJ0Vn0ZR7fOYCJpriQVHBstoyxRMxgCg2cVu
    m50UEmyiBKl8PIOEHDDvHtY=
    =K4l1
    -----END PGP SIGNATURE-----


  7. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:
    > And which tools are that?
    > I think you broke compatibility with XML & SGML tools. [1]
    >
    >
    > Cheers
    > Geert Stappers
    >
    > [1] both are ASCII only.


    Really? This should be UTF-8, shouldn't it? You don't really
    propose to write languages like Japanese or Greek in ASCII + Entities?
    Or have I completly misunderstood you?

    Gruesse,
    --
    Frank Lichtenheld
    www: http://www.djpig.de/


    --
    To UNSUBSCRIBE, email to debian-doc-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  8. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Mon, Jan 10, 2005 at 12:24:50AM +0100, Frank Lichtenheld wrote:
    > On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:
    > > And which tools are that?
    > > I think you broke compatibility with XML & SGML tools. [1]
    > >
    > >
    > > Cheers
    > > Geert Stappers
    > >
    > > [1] both are ASCII only.

    >
    > Really? This should be UTF-8, shouldn't it?


    XML definition at http://www.w3.org/TR/REC-xml/#charsets
    speaks about ISO 10646 and there is this
    All XML processors MUST accept the UTF-8 and UTF-16 encodings of
    Unicode 3.1


    > You don't really propose to write languages like Japanese or
    > Greek in ASCII + Entities?

    Nope.

    > Or have I completly misunderstood you?


    I'm worried about the transformation into latin1.
    I shall elaborate my E-mail in another message in this thread.

    >
    > Gruesse,


    Grüße Geert

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFB4lNoOSINbgwa/7sRAkOaAJ4g/icTIgaiC+ewCW1EN4bBTYZd0wCguQqe
    r5LeODi/zivzZoPiMPWnxvg=
    =7UsV
    -----END PGP SIGNATURE-----


  9. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    On Sun, Jan 09, 2005 at 09:38:12PM +0100, Jens Seidel wrote:
    > On Sun, Jan 09, 2005 at 08:04:58PM +0100, Geert Stappers wrote:
    > > On Sun, Jan 09, 2005 at 06:24:17PM +0100, Jens Seidel wrote:
    > > > On Sun, Jan 09, 2005 at 05:57:22PM +0100, Geert Stappers wrote:
    > > > > On Sun, Jan 09, 2005 at 09:30:55AM -0700, DDP CVS wrote:

    > >
    > > > > > Modified files:
    > > > > > manuals.sgml/release-notes/de: release-notes.de.sgml
    > > > > >
    > > > > > Log message:
    > > > > > use latin1 encoding instead of HTML entities to simplify proofreading
    > > > > > and to increase compatibility with various tools
    > > > >
    > > > > I think you broke compatibility with XML & SGML tools. [1]
    > > > > [1] both are ASCII only.


    I should have react with:

    Why latin1 and how does it fit in XML & SGML tools?
    I think you broke compatibility with those tools.

    I was wrong about the ASCII-only argument.

    > > > The old code using ü and ß in each third word makes the SGML
    > > > source code nearly unreadable. Please note that the previous version
    > > > already contained a mixture of latin1 and ASCII.

    > [snip]
    > > > I also found a missing umlaut in the output (last PDF page) because
    > > >
    > > > was used but which works great using name="‹bersetzungen".

    > >
    > > My concern is that the source is not pure 7-bit ASCII.

    >
    > Please note that all versions of Release Notes (except English one)
    > use 8bit characters. Do you really expect that a Japanese translator
    > writes &entity1;&entity2;&entity3;...?
    >
    > > It should ASCII only for XML and SGML.

    >
    > But debiandoc-sgml supports all common locales, especially latin1,
    > latin2,...
    >
    > > Jens, that you spend time on the release notes is good.
    > > I do respect that.
    > >
    > > But your arguments to break stuff are poor.

    >
    > Maybe. I agree that pure ASCII has advantages but the current file can
    > easily converted to another locale using iconv, recode, konwert, ...
    >
    > > * it is hard to proofread größe dateien

    >
    > It is!
    >
    > > Consider the SGML source as computer program source
    > > and proofreading is running the programm.
    > > Each "bug" you find, has to be modified in the source.
    > > The edit-compile-test cycle can indeed be boring,
    > > you shouldn't cheat by implementing "compiled blobs"
    > > in the source code.

    >
    > I would really like to know other people's opinions.
    >
    > > As one volunteer to another volunteer:
    > >
    > > Please revert the latin1 changes

    >
    > First I will wait for more feedback. If the majority agrees with you I
    > will revert my changes (but what about name="Übersetzungen" which
    > doesn't work for url tags?).


    Here feedback from me, based on my previous posting.

    | My concern is that the source is not pure 7-bit ASCII.
    | It should ASCII only for XML and SGML.

    I was wrong, at least incomplete, ISO 10646 is allowed.

    | Jens, that you spend time on the release notes is good.
    | I do respect that.
    I stay with that :-)

    | But your arguments to break stuff are poor.
    I shall calm down.

    | * The website has latin1
    | That is because it is converted to latin1
    | With latin1 "precompiled" codes, you can't convert to other encodings
    See below

    | * aspell can't handle HTML entities.
    | Then use another tool or aspell on another file format
    |
    | * the source had already latin1 codes.
    | That has you set on the wrong track,
    | but is no excuus to go further downhill
    See below

    | * it is hard to proofread größe dateien
    | Consider the SGML source as computer program source
    | and proofreading is running the programm.
    | Each "bug" you find, has to be modified in the source.
    | The edit-compile-test cycle can indeed be boring,
    | you shouldn't cheat by implementing "compiled blobs"
    | in the source code.
    The blobs are allowed, when they are UTF-8 or UTF-16 encoded.

    | As one volunteer to another volunteer:
    |
    | Please revert the latin1 changes

    I couldn't find where latin1 and UTF-8 differ for our usage,
    release-notes.de.sgml. I could be that we are save.


    I did found that

    latin1 ~~ rfc1345 ~~ iso 8859
    UTF-8 ~~ rfc3629 ~~ iso 10646


    But not that ë is for both the same.
    Where it now cost time, let _assume_ they are equal.[1]


    Cheers
    Geert Stappers

    [1] please prove me wrong,
    but a confirmation is also welcome.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFB4l6WOSINbgwa/7sRAhpPAJ97l7mOvjibBRMqaJV/0A5INsLPrwCfTueA
    17alBNHeghg0fFh57S9i7QI=
    =+2Gi
    -----END PGP SIGNATURE-----


  10. Re: DDP CVS commit by jseidel: ddp/manuals.sgml/release-notes/de release-note ...

    Hi,
    On Sun, Jan 09, 2005 at 06:24:17PM +0100, Jens Seidel wrote:
    > I also found a missing umlaut in the output (last PDF page) because
    >
    > was used but which works great using name="‹bersetzungen".


    Yep. these &...; construct only works well in most parts but not all
    parts. You may call it bug but I do not see it fixed anytime soon.

    As long as encodings are consistent, I think fine to use any working
    one. EUC, latain1, latain2, ...

    Oh, we use &...; construct to put out Japanese on our web.

    See my name at
    http://www.debian.org/doc/user-manuals#quick-reference
    with Japanse character capable browsers with charset being
    charset=iso-8859-1.

    Osamu

+ Reply to Thread