Cyrillic and UTF-8

This is a discussion on Cyrillic and UTF-8 within the DICOM forums, part of the Protocols category; Hi all, Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of multi-byte characters sets without code extensions. For Cyrillic it specifies ISO_IR 144. However I heard that ...

Go Back   Unix Linux Forum > Technologies & Tools > Protocols > DICOM

FixUnix.com - Unix Linux Forums

Unix Content Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-18-2008, 10:13 AM
Default Cyrillic and UTF-8

Hi all,
Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
multi-byte characters sets without code extensions. For Cyrillic it
specifies ISO_IR 144. However I heard that some vendors are
considering UTF-8 for Cyrillic.
What is this - simple deviation of existing standard or anticipation
of its coming changes?
I would appreciate any information about that matter.
Thanks
Reply With Quote
  #2  
Old 08-18-2008, 02:31 PM
Default Re: Cyrillic and UTF-8

Hi Igor

UTF-8 can essentially support all older, regionally specific
character sets, not just Asian multi-byte character sets, with a
single character set.

This means, for example, that there are two ways to encode Cyrillic,
and recipients are encouraged to support both.

The benefit is mostly on the deployment side, where less site-
specific configuration is necessary if UTF-8 is used in preference
to the plethora of "older" character sets.

David

Igor Orlovsky wrote:
> Hi all,
> Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
> multi-byte characters sets without code extensions. For Cyrillic it
> specifies ISO_IR 144. However I heard that some vendors are
> considering UTF-8 for Cyrillic.
> What is this - simple deviation of existing standard or anticipation
> of its coming changes?
> I would appreciate any information about that matter.
> Thanks

Reply With Quote
  #3  
Old 08-19-2008, 08:10 AM
Default Re: Cyrillic and UTF-8

On Aug 18, 2:31*pm, David Clunie wrote:
> Hi Igor
>
> UTF-8 can essentially support all older, regionally specific
> character sets, not just Asian multi-byte character sets, with a
> single character set.
>
> This means, for example, that there are two ways to encode Cyrillic,
> and recipients are encouraged to support both.
>
> The benefit is mostly on the deployment side, where less site-
> specific configuration is necessary if UTF-8 is used in preference
> to the plethora of "older" character sets.
>
> David
>
> Igor Orlovsky wrote:
> > Hi all,
> > Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
> > multi-byte characters sets without code extensions. For Cyrillic it
> > specifies ISO_IR 144. However I heard that some vendors are
> > considering UTF-8 for Cyrillic.
> > What is this - simple deviation of existing standard or anticipation
> > of its coming changes?
> > I would appreciate any information about that matter.
> > Thanks


Hi David,
I understand that UTF-8 can replace all single and multi-byte
character sets and totally agree that for recipient it is better to
support both. The question is should producers of the images employ
UTF-8 (or other Unicode version) or stick for now with regionally
specific character sets as current Dicom standard (ftp://
medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
understand it correctly. Unlike recipients, producers cannot support
both at the same time - when you are writing Dicom header you have to
select the character set first. What is your opinion?
Thank you
Reply With Quote
  #4  
Old 08-19-2008, 11:47 AM
Default Re: Cyrillic and UTF-8

Hi Igor

The choice is a pragmatic one, but I would weigh the probability
that the current installed base of receivers supports a) UTF8,
b) ISO_IR 144, or c) neither.

A quick Google for:

- 'dicom "ISO_IR 192" "conformance statement"'
- 'dicom "ISO_IR 144" "conformance statement"'

reveals a greater number of hits for 144, for what it is worth.

Regardless though, I would make the sender configurable to send
either, depending on the sites preference.

Another thing to consider is the support of either in Modality
Worklist SCPs ... you may need to support receiving Cyrillic in
the MWL query in one encoding and converting it to another
encoding for saving in the images, depending on the capabilities
of the RIS and PACS.

David

Igor Orlovsky wrote:
> On Aug 18, 2:31 pm, David Clunie wrote:
>> Hi Igor
>>
>> UTF-8 can essentially support all older, regionally specific
>> character sets, not just Asian multi-byte character sets, with a
>> single character set.
>>
>> This means, for example, that there are two ways to encode Cyrillic,
>> and recipients are encouraged to support both.
>>
>> The benefit is mostly on the deployment side, where less site-
>> specific configuration is necessary if UTF-8 is used in preference
>> to the plethora of "older" character sets.
>>
>> David
>>
>> Igor Orlovsky wrote:
>>> Hi all,
>>> Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
>>> multi-byte characters sets without code extensions. For Cyrillic it
>>> specifies ISO_IR 144. However I heard that some vendors are
>>> considering UTF-8 for Cyrillic.
>>> What is this - simple deviation of existing standard or anticipation
>>> of its coming changes?
>>> I would appreciate any information about that matter.
>>> Thanks

>
> Hi David,
> I understand that UTF-8 can replace all single and multi-byte
> character sets and totally agree that for recipient it is better to
> support both. The question is should producers of the images employ
> UTF-8 (or other Unicode version) or stick for now with regionally
> specific character sets as current Dicom standard (ftp://
> medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
> understand it correctly. Unlike recipients, producers cannot support
> both at the same time - when you are writing Dicom header you have to
> select the character set first. What is your opinion?
> Thank you

Reply With Quote
Reply

Thread Tools


All times are GMT -5. The time now is 12:02 AM.

In an effort to better serve ads to our visitors, cookies are used on Fixunix.com. For more information, check out our Privacy Policy.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Ad Management by RedTyger