DICOM and UNICODE.. - DICOM

This is a discussion on DICOM and UNICODE.. - DICOM ; Hi, Does DICOM support UNICODE only for Chinese Character Set? Or I can wrtie my Japanese Data also in UTF 8 Encoding?...

+ Reply to Thread
Results 1 to 3 of 3

Thread: DICOM and UNICODE..

  1. DICOM and UNICODE..

    Hi,
    Does DICOM support UNICODE only for Chinese Character Set?
    Or I can wrtie my Japanese Data also in UTF 8 Encoding?


  2. Re: DICOM and UNICODE..

    DICOM has no rules about what languange is to be encoded with any
    particular character set. This is a decision that must be agreed at
    each location, and will generally be driven by the character set that
    is used by the hospital information system.

    So you can encode Japanese characters using Unicode, but the receiver
    of your SOP Instances may not be able to process Unicode characters,
    and may therefore reject those SOP Instances.

    My recommendation for a robust implementation would be for it to
    receive data in any character set, convert it to Unicode for all its
    internal processing, and convert to a configured character set for
    output.


  3. Re: DICOM and UNICODE..

    Omkar Parkhi wrote:

    > Does DICOM support UNICODE only for Chinese Character Set?
    > Or I can write my Japanese Data also in UTF 8 Encoding?


    Harry Solomon wrote:

    > DICOM has no rules about what language is to be encoded with any
    > particular character set. This is a decision that must be agreed at
    > each location, and will generally be driven by the character set that
    > is used by the hospital information system.


    Unfortunately, DICOM also does not provide any means of
    negotiating this at association establishment either.

    > So you can encode Japanese characters using Unicode, but the receiver
    > of your SOP Instances may not be able to process Unicode characters,
    > and may therefore reject those SOP Instances.


    Or accept the SOP Instances and just display garbage in
    the browser or display.

    > My recommendation for a robust implementation would be for it to
    > receive data in any character set, convert it to Unicode for all its
    > internal processing, and convert to a configured character set for
    > output.


    Which begs the question of what to configure it to

    Since an increasing number of systems support the Unicode
    UTF-8 (ISO_IR 192) Specific Character Set, it would seem
    reasonable nowadays to always use that as the default
    unless it is necessary to configure something different
    (like ISO_IR 100 for older Latin-1 systems, or \ISO 2022 IR 87
    for older Japanese systems that do not support that do not
    support ISO_IR 192).

    Note that converting to Unicode internally will usually undo
    the distinctions that cause Japanese to sometimes prefer IR 87
    over Unicode in the first place (Google "Han unification").

    A Google of "ISO IR 192 UTF-8 dicom conformance statement" is
    quite encouraging, but I would imagine there are many systems
    out there that still don't handle it.

    Another approach is to survey the encoded object (or query
    response) for the repertoire of characters actually used, and
    if none are anything than ASCII, use that, if a constrained
    subset is used that fits in a commonly available single byte
    character set, use that, otherwise use UTF-8. I had a lot of
    fun playing around with this in the PixelMed toolkit (see
    SpecificCharacterSet.getSetOfUnicodeBlocksUsedBy(S tring) and
    the corresponding constructor), though I am not sure if this
    is really a good idea in practice.

    David

+ Reply to Thread