Cyrillic and UTF-8 - DICOM
This is a discussion on Cyrillic and UTF-8 - DICOM ; Hi all,
Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
multi-byte characters sets without code extensions. For Cyrillic it
specifies ISO_IR 144. However I heard that some vendors are
considering UTF-8 for Cyrillic.
What is this ...
-
Cyrillic and UTF-8
Hi all,
Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
multi-byte characters sets without code extensions. For Cyrillic it
specifies ISO_IR 144. However I heard that some vendors are
considering UTF-8 for Cyrillic.
What is this - simple deviation of existing standard or anticipation
of its coming changes?
I would appreciate any information about that matter.
Thanks
-
Re: Cyrillic and UTF-8
Hi Igor
UTF-8 can essentially support all older, regionally specific
character sets, not just Asian multi-byte character sets, with a
single character set.
This means, for example, that there are two ways to encode Cyrillic,
and recipients are encouraged to support both.
The benefit is mostly on the deployment side, where less site-
specific configuration is necessary if UTF-8 is used in preference
to the plethora of "older" character sets.
David
Igor Orlovsky wrote:
> Hi all,
> Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
> multi-byte characters sets without code extensions. For Cyrillic it
> specifies ISO_IR 144. However I heard that some vendors are
> considering UTF-8 for Cyrillic.
> What is this - simple deviation of existing standard or anticipation
> of its coming changes?
> I would appreciate any information about that matter.
> Thanks
-
Re: Cyrillic and UTF-8
On Aug 18, 2:31*pm, David Clunie wrote:
> Hi Igor
>
> UTF-8 can essentially support all older, regionally specific
> character sets, not just Asian multi-byte character sets, with a
> single character set.
>
> This means, for example, that there are two ways to encode Cyrillic,
> and recipients are encouraged to support both.
>
> The benefit is mostly on the deployment side, where less site-
> specific configuration is necessary if UTF-8 is used in preference
> to the plethora of "older" character sets.
>
> David
>
> Igor Orlovsky wrote:
> > Hi all,
> > Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
> > multi-byte characters sets without code extensions. For Cyrillic it
> > specifies ISO_IR 144. However I heard that some vendors are
> > considering UTF-8 for Cyrillic.
> > What is this - simple deviation of existing standard or anticipation
> > of its coming changes?
> > I would appreciate any information about that matter.
> > Thanks
Hi David,
I understand that UTF-8 can replace all single and multi-byte
character sets and totally agree that for recipient it is better to
support both. The question is should producers of the images employ
UTF-8 (or other Unicode version) or stick for now with regionally
specific character sets as current Dicom standard (ftp://
medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
understand it correctly. Unlike recipients, producers cannot support
both at the same time - when you are writing Dicom header you have to
select the character set first. What is your opinion?
Thank you
-
Re: Cyrillic and UTF-8
Hi Igor
The choice is a pragmatic one, but I would weigh the probability
that the current installed base of receivers supports a) UTF8,
b) ISO_IR 144, or c) neither.
A quick Google for:
- 'dicom "ISO_IR 192" "conformance statement"'
- 'dicom "ISO_IR 144" "conformance statement"'
reveals a greater number of hits for 144, for what it is worth.
Regardless though, I would make the sender configurable to send
either, depending on the sites preference.
Another thing to consider is the support of either in Modality
Worklist SCPs ... you may need to support receiving Cyrillic in
the MWL query in one encoding and converting it to another
encoding for saving in the images, depending on the capabilities
of the RIS and PACS.
David
Igor Orlovsky wrote:
> On Aug 18, 2:31 pm, David Clunie wrote:
>> Hi Igor
>>
>> UTF-8 can essentially support all older, regionally specific
>> character sets, not just Asian multi-byte character sets, with a
>> single character set.
>>
>> This means, for example, that there are two ways to encode Cyrillic,
>> and recipients are encouraged to support both.
>>
>> The benefit is mostly on the deployment side, where less site-
>> specific configuration is necessary if UTF-8 is used in preference
>> to the plethora of "older" character sets.
>>
>> David
>>
>> Igor Orlovsky wrote:
>>> Hi all,
>>> Dicom standard Part 3 mentions UTF-8 (ISO_IR 192) only in context of
>>> multi-byte characters sets without code extensions. For Cyrillic it
>>> specifies ISO_IR 144. However I heard that some vendors are
>>> considering UTF-8 for Cyrillic.
>>> What is this - simple deviation of existing standard or anticipation
>>> of its coming changes?
>>> I would appreciate any information about that matter.
>>> Thanks
>
> Hi David,
> I understand that UTF-8 can replace all single and multi-byte
> character sets and totally agree that for recipient it is better to
> support both. The question is should producers of the images employ
> UTF-8 (or other Unicode version) or stick for now with regionally
> specific character sets as current Dicom standard (ftp://
> medical.nema.org/medical/dicom/2008/08_03pu.pdf) requires, if I
> understand it correctly. Unlike recipients, producers cannot support
> both at the same time - when you are writing Dicom header you have to
> select the character set first. What is your opinion?
> Thank you