Cyrillic and UTF-8 - DICOM

This is a discussion on Cyrillic and UTF-8 - DICOM ; Hi all, Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of multi-byte characters sets without code extensions. For Cyrillic it specifies ISO_IR 144. However I heard that some vendors are considering UTF-8 for Cyrillic. What is this ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Cyrillic and UTF-8

  1. Cyrillic and UTF-8

    Hi all,
    Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
    multi-byte characters sets without code extensions. For Cyrillic it
    specifies ISO_IR 144. However I heard that some vendors are
    considering UTF-8 for Cyrillic.
    What is this - simple deviation of existing standard or anticipation
    of its coming changes?
    I would appreciate any information about that matter.
    Thanks

  2. Re: Cyrillic and UTF-8

    Hi Igor

    UTF-8 can essentially support all older, regionally specific
    character sets, not just Asian multi-byte character sets, with a
    single character set.

    This means, for example, that there are two ways to encode Cyrillic,
    and recipients are encouraged to support both.

    The benefit is mostly on the deployment side, where less site-
    specific configuration is necessary if UTF-8 is used in preference
    to the plethora of "older" character sets.

    David

    Igor Orlovsky wrote:
    > Hi all,
    > Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
    > multi-byte characters sets without code extensions. For Cyrillic it
    > specifies ISO_IR 144. However I heard that some vendors are
    > considering UTF-8 for Cyrillic.
    > What is this - simple deviation of existing standard or anticipation
    > of its coming changes?
    > I would appreciate any information about that matter.
    > Thanks


  3. Re: Cyrillic and UTF-8

    On Aug 18, 2:31*pm, David Clunie wrote:
    > Hi Igor
    >
    > UTF-8 can essentially support all older, regionally specific
    > character sets, not just Asian multi-byte character sets, with a
    > single character set.
    >
    > This means, for example, that there are two ways to encode Cyrillic,
    > and recipients are encouraged to support both.
    >
    > The benefit is mostly on the deployment side, where less site-
    > specific configuration is necessary if UTF-8 is used in preference
    > to the plethora of "older" character sets.
    >
    > David
    >
    > Igor Orlovsky wrote:
    > > Hi all,
    > > Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
    > > multi-byte characters sets without code extensions. For Cyrillic it
    > > specifies ISO_IR 144. However I heard that some vendors are
    > > considering UTF-8 for Cyrillic.
    > > What is this - simple deviation of existing standard or anticipation
    > > of its coming changes?
    > > I would appreciate any information about that matter.
    > > Thanks


    Hi David,
    I understand that UTF-8 can replace all single and multi-byte
    character sets and totally agree that for recipient it is better to
    support both. The question is should producers of the images employ
    UTF-8 (or other Unicode version) or stick for now with regionally
    specific character sets as current Dicom standard (ftp://
    medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
    understand it correctly. Unlike recipients, producers cannot support
    both at the same time - when you are writing Dicom header you have to
    select the character set first. What is your opinion?
    Thank you

  4. Re: Cyrillic and UTF-8

    Hi Igor

    The choice is a pragmatic one, but I would weigh the probability
    that the current installed base of receivers supports a) UTF8,
    b) ISO_IR 144, or c) neither.

    A quick Google for:

    - 'dicom "ISO_IR 192" "conformance statement"'
    - 'dicom "ISO_IR 144" "conformance statement"'

    reveals a greater number of hits for 144, for what it is worth.

    Regardless though, I would make the sender configurable to send
    either, depending on the sites preference.

    Another thing to consider is the support of either in Modality
    Worklist SCPs ... you may need to support receiving Cyrillic in
    the MWL query in one encoding and converting it to another
    encoding for saving in the images, depending on the capabilities
    of the RIS and PACS.

    David

    Igor Orlovsky wrote:
    > On Aug 18, 2:31 pm, David Clunie wrote:
    >> Hi Igor
    >>
    >> UTF-8 can essentially support all older, regionally specific
    >> character sets, not just Asian multi-byte character sets, with a
    >> single character set.
    >>
    >> This means, for example, that there are two ways to encode Cyrillic,
    >> and recipients are encouraged to support both.
    >>
    >> The benefit is mostly on the deployment side, where less site-
    >> specific configuration is necessary if UTF-8 is used in preference
    >> to the plethora of "older" character sets.
    >>
    >> David
    >>
    >> Igor Orlovsky wrote:
    >>> Hi all,
    >>> Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
    >>> multi-byte characters sets without code extensions. For Cyrillic it
    >>> specifies ISO_IR 144. However I heard that some vendors are
    >>> considering UTF-8 for Cyrillic.
    >>> What is this - simple deviation of existing standard or anticipation
    >>> of its coming changes?
    >>> I would appreciate any information about that matter.
    >>> Thanks

    >
    > Hi David,
    > I understand that UTF-8 can replace all single and multi-byte
    > character sets and totally agree that for recipient it is better to
    > support both. The question is should producers of the images employ
    > UTF-8 (or other Unicode version) or stick for now with regionally
    > specific character sets as current Dicom standard (ftp://
    > medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
    > understand it correctly. Unlike recipients, producers cannot support
    > both at the same time - when you are writing Dicom header you have to
    > select the character set first. What is your opinion?
    > Thank you


+ Reply to Thread