A little help getting started with DICOM - DICOM

This is a discussion on A little help getting started with DICOM - DICOM ; I'm trying to wrap my head around the dicom format, but I have to admit it's making me feel a bit retarded. I'd much appreciate a little help getting on my feet. I've downloaded the DICOM specification reference and am ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: A little help getting started with DICOM

  1. A little help getting started with DICOM

    I'm trying to wrap my head around the dicom format, but I have to
    admit it's making me feel a bit retarded. I'd much appreciate a little
    help getting on my feet.

    I've downloaded the DICOM specification reference and am reading
    through the Medical Image FAQ found here and on alt.medical.images et
    al.

    My understanding of the reference (or references in general) is that
    they are not intended to be read front to back like a novel or
    memorized in every detail like a text book. However, the "flow" of
    this specification reference is eluding me.

    I'll step through what I'm doing and welcome any suggestions.

    1) I'm interested in learning the file format, so I open up dicom
    specification part 10 "Dicom File Format" (07_10pu.pdf) and begin
    reading "DICOM File Meta Information"
    2) I see that the header is of the format: 128 byte preamble followed
    by 4 bytes of DICM denoting the beginning of the dicom headder proper.
    (excellent...I'm on my way)
    3) Note that the header consists of meta elements demarked by numbered
    tags of the form: group,element. These are found in Table 7.1-1
    4) Then we get a nice pretty table of tags and descriptions.

    The first tag of interest is the "Group Length" tag (0002,0000). This
    should tell us the length of the 0002 group.

    Now the first thing I notice is this reference table has no legend and
    I don't know what the "Type" column represents. The pdf's don't have
    descriptive names so I just start in the beginning with Part 1 and see
    if I can discover the meaning of the "Type" column. Perhaps in
    Normative reference? Nope, Definitions? wait must be in Symbols and
    Abbreviations...nada.

    At this point, I'm down to searching the entire document collection
    with spotlight and being that I only have the keyword "type" or "dicom
    element type" etc I get a hit in every pdf.

    5) Hmmm ok, well lets dump a header and see what it looks like...maybe
    that will shed some light.

    So I do just that and produce the following output by doing a byte
    wise grab of the first 10 bytes with a 132 byte offset to account for
    preamble and DICM element:

    Each byte dumped into an array: [2, 0, 0, 0, 85, 76, 4, 0, 200, 0]
    Expressed as a string with escaped special characters:
    "\002\000\000\000UL\004\000\310\000"
    ASCII output of string: UL?

    Ok, well I can get some sort of meaning out of that.

    Looks like we have the group,element marker (0002,0000) followed by
    some null padding and the ascii characters of UL.

    Huh, now the attribute description from table 7.1-1 of group,element
    0002,0000 says:
    -Number of bytes following this File Meta Element (end of the Value
    field) up to and including the last File Meta Element of the Group 2
    File Meta Information.

    >From that I was expecting a number representing the byte offset...not

    ascii such as UL. From here I begin searching documents for "UL" much
    like I did above with much the same result. But of course, it may not
    be the ascii UL. It may be tne numeric value of 85. Hm.

    This then gets compounded by the fact that when I do find references
    to, say, the Type field or the UL marker it inevitable cross
    references other tables, uses other hard to search for acronyms, and
    so on until I have 6 or 7 pdfs, 9 web pages, etc etc open.

    I also tried downloading the source to a few of the more popular dicom
    readers/viewers, but they were in large part uncommented and
    undocumented.

    ..........and I still haven't determined what the length of the first
    group (002) is.

    My apologies if this is all perfectly obvious and I'm just being very
    dense. I would very much appreciate a little guidance, preferrably in
    the form of how to find the answers to questions myself rather then
    being spoon fed a few answers at a time.


  2. Re: A little help getting started with DICOM


    I read your recent request for information. I found this posting in
    this group and the related threads most informative.

    Locate this posting in this group and more importantly the replying
    threads:

    From: jondelac@gmail.com
    Newsgroups: comp.protocols.dicom
    Subject: Storing medical images
    Date: 30 Dec 2006 13:05:54 -0800

    You may find the thread reply from eric.goodall@gmail.com most useful
    or at least give clarity to your understanding.




    On 20 Mar 2007 09:56:21 -0700, "krunk-" wrote:

    >I'm trying to wrap my head around the dicom format, but I have to
    >admit it's making me feel a bit retarded. I'd much appreciate a little
    >help getting on my feet.
    >
    >I've downloaded the DICOM specification reference and am reading
    >through the Medical Image FAQ found here and on alt.medical.images et
    >al.
    >
    >My understanding of the reference (or references in general) is that
    >they are not intended to be read front to back like a novel or
    >memorized in every detail like a text book. However, the "flow" of
    >this specification reference is eluding me.
    >
    >I'll step through what I'm doing and welcome any suggestions.
    >
    >1) I'm interested in learning the file format, so I open up dicom
    >specification part 10 "Dicom File Format" (07_10pu.pdf) and begin
    >reading "DICOM File Meta Information"
    >2) I see that the header is of the format: 128 byte preamble followed
    >by 4 bytes of DICM denoting the beginning of the dicom headder proper.
    >(excellent...I'm on my way)
    >3) Note that the header consists of meta elements demarked by numbered
    >tags of the form: group,element. These are found in Table 7.1-1
    >4) Then we get a nice pretty table of tags and descriptions.
    >
    >The first tag of interest is the "Group Length" tag (0002,0000). This
    >should tell us the length of the 0002 group.
    >
    >Now the first thing I notice is this reference table has no legend and
    >I don't know what the "Type" column represents. The pdf's don't have
    >descriptive names so I just start in the beginning with Part 1 and see
    >if I can discover the meaning of the "Type" column. Perhaps in
    >Normative reference? Nope, Definitions? wait must be in Symbols and
    >Abbreviations...nada.
    >
    >At this point, I'm down to searching the entire document collection
    >with spotlight and being that I only have the keyword "type" or "dicom
    >element type" etc I get a hit in every pdf.
    >
    >5) Hmmm ok, well lets dump a header and see what it looks like...maybe
    >that will shed some light.
    >
    >So I do just that and produce the following output by doing a byte
    >wise grab of the first 10 bytes with a 132 byte offset to account for
    >preamble and DICM element:
    >
    >Each byte dumped into an array: [2, 0, 0, 0, 85, 76, 4, 0, 200, 0]
    >Expressed as a string with escaped special characters:
    >"\002\000\000\000UL\004\000\310\000"
    >ASCII output of string: UL?
    >
    >Ok, well I can get some sort of meaning out of that.
    >
    >Looks like we have the group,element marker (0002,0000) followed by
    >some null padding and the ascii characters of UL.
    >
    >Huh, now the attribute description from table 7.1-1 of group,element
    >0002,0000 says:
    >-Number of bytes following this File Meta Element (end of the Value
    >field) up to and including the last File Meta Element of the Group 2
    >File Meta Information.
    >
    >>From that I was expecting a number representing the byte offset...not

    >ascii such as UL. From here I begin searching documents for "UL" much
    >like I did above with much the same result. But of course, it may not
    >be the ascii UL. It may be tne numeric value of 85. Hm.
    >
    >This then gets compounded by the fact that when I do find references
    >to, say, the Type field or the UL marker it inevitable cross
    >references other tables, uses other hard to search for acronyms, and
    >so on until I have 6 or 7 pdfs, 9 web pages, etc etc open.
    >
    >I also tried downloading the source to a few of the more popular dicom
    >readers/viewers, but they were in large part uncommented and
    >undocumented.
    >
    >.........and I still haven't determined what the length of the first
    >group (002) is.
    >
    >My apologies if this is all perfectly obvious and I'm just being very
    >dense. I would very much appreciate a little guidance, preferrably in
    >the form of how to find the answers to questions myself rather then
    >being spoon fed a few answers at a time.


  3. Re: A little help getting started with DICOM

    (Repost)
    Greetings!
    I'm working on a C# DICOM SDK class library of my own as a school
    project, so I'm quite familiar with the file format itself (I'm yet to
    look up character code pages, though).
    I'm going to try and summarize the format. If anyone finds mistakes,
    corrections will be most appreciated.
    Most of whatt you need is in 04_05PU.pdf
    Terms used that are important for the actual file encoding/decoding
    process:

    - Element: A base unit where data is stored. Consists of:
    -- Element.Tag: Unique identifier of the element within the dataset
    the element is stored in. Treat it as an unsigned integer. It consists
    of two unsigned short values, Group Number and Element Number (within
    that group).
    -- Element.VR: "Value Representation", two ASCII characters that
    tell you how to interpret the element value(s). There are 26 of them,
    I think. You can find them in PS3.5
    -- Element.VL: "Value Length", 16 or 32 bits, depending on the VR
    and Transfer Syntax, that tell you the length of the element value in
    bytes. If the VL is 0xFFFFFFFF (only 32bit), it's called "Undefined"
    and in that case, delimiters are used to store the value(s).
    -- Element.Value: This is actually one or more values. In my
    opinion, DICOM is a bit inconsistent on this subject and the:
    -- Element.VM: "Value Multiplicity", an integer that tells you the
    number of values, but cannot be zero.

    - Dataset: Basicly, an ordered list of elements with unique tags. In
    DICOM definition however, a dataset cannot contain groups 0 and 2, I
    think.

    - Transfer Syntax: A term used to define three things:
    -- TransferSyntax.ByteOrder: "Little Endian" or "Big Endian". Order
    of bytes used when writing non-string stuff. Little Endian is least
    significant byte first, and Big Endian vice versa. Here's a list of
    things that are affected by byte ordering: Tag (Group Number and
    Element Number separately), Value Length, Element Value for elements
    with VR of: US, SS, OW, AT, UL, SL, FL, OF or FD.
    -- TransferSyntax.Explicit/Implicit: This tells you the way to get
    the VR associated with the Tag.
    If an element is written the Explicit way, you can read the VR as two
    ASCII characters right after the Tag. In that case, the following two
    bytes are either Value Length, or reserved and set to 0x0000 (this
    applies for VRs of UN, UT, SQ, OB, OF and OW) and in that case, the
    Value Length is stored in the next four bytes. (Note that it can be
    undefined for SQ, OB, OW, OF and possibly UN).
    If an element is written Implicit way, you need to get the VR from
    some form of dictionary. In this case, Value Length is always four
    bytes and right after the Tag.
    -- TransferSyntax.Compression: T.S. also tells you the compression
    type used for the image, if any. If a compression is used, T.S. is
    always Explicit - Little Endian. Every Transfer Syntax is presented as
    an UID (a string consisting of characters 0-9 separated by dots).
    Default T.S. is Implicit - Little Endian.

    - Dictionary: A set of pairs Tag-VR, as defined by DICOM. It also
    includes Tag descriptions and possible VM (usually 1 or 1-n).

    Now let's have a look at all the VRs out there.

    Classic numeric types:
    - US - Unsigned Short, 16-bit
    - SS - Signed Short, 16-bit
    - UL - Unsigned Long, 32-bit
    - SL - Signed Long, 32-bit
    - FL - Float, 32-bit
    - FD - Double, 64-bit
    - AT - Attribute Tag, 32-bit (2 * 16-bit)
    With these VRs, when VM is greater than one, the values are not
    delimited because each value is of fixed length.

    Binary arrays:
    - OB - Other Byte, an array of bytes.
    - OW - Other Word, an array of.. hmm.. sets of two bytes
    - OF - Other Float, an array of sets of four bytes.
    Now, according to DICOM, elements with these VRs always have VM of
    one. But they can still have multiple values with use of delimiters
    (explained later).

    Sequence (- SQ -): These elements contain nested datasets, written
    with use of delimiters. There's a recursion, several ways of writing
    the value, and so on. I'll try and explain the format later. VM for SQ
    elements, as defined by DICOM, is always one (1).

    Values of elements with all other VRs are basicly strings. The VR
    defines the format of that string, but it's not important at this
    point. Some of these strings are always ASCII (AE, AS, CS, DA, DS, DT,
    IS, TM, UI) and others are affected by code pages, and the information
    needed to interpret them properly is stored in the element with tag of
    (0008, 0005), I think. Nested datasets use the same information,
    unless they have their own (8,10) element.
    Elements with VR of ST, LT and UT always have the VM of one (1). Other
    string elements can have multiple values and in that case, the values
    are separated with dicom-defined delimiter characters (',', ';', '\t',
    '\n', '\r' and '\\').

    Now, some lovely exceptions to the rules, special elements/groups
    -Command Group: Elements with Group Number = 0 are always written in
    Implicit Little Endian
    -File Meta Information Group: Elements with Group Number = 2 are
    always written in Explicit Little Endian
    (Note: I think DICOM treats these as if they are not part of a
    dataset)
    -Group Length: Elements with Element Number = 0 always have the VR of
    UL and VM of 1, and their value represents the length of their group.
    -Repeating Groups: When written implicitly, elements with Group Number
    of 0x5000 - 0x501E and 0x6000 - 0x601E need special treatment to get
    their VR. The Dictionary defines the VR of these elements only for one
    group. Elements with same Element Number, but different Group Number
    (in the specified range) get the same VR as their brothers from the
    group that has defined VRs.

    -- Delimiters --
    These are not real elements. They are used to store multiple values
    for Binary Array elements (OB, OW and OF), and datasets in the SQ
    elements. They are defined by Tags, and they don't have a VR.
    (FFFE, E000) - "Item Tag"
    (FFFE, E00D) - "Item Delimitation Item Tag"
    (FFFE, E0DD) - "Sequence Delimitation Item Tag"
    Their Value Length is always four bytes, and the two delimitation ones
    always have the VL of zero. VL of an "Item" can be proper Value
    Length, or undefined (in case of some SQ elements).

    Phew.. Now, let's have a peek at a dicom file. It begins with 128 zero
    bytes, followed by 4 ASCII chars "DICM". That's the file header, the
    way I see it. It's followed by a FMI group (0002, eeee), and that
    group is followed by a proper dataset.
    To read a file properly, the ~only~ element value you need to
    interpret is the ASCII string stored in the (0002, 0010) element.
    That's the Transfer Syntax UID. From that string you can find out the
    byte order and the explicit/implicit thingie. If there's no such
    element, the dataset uses the default Transfer Syntax (Implicit Little
    Endian). All nested datasets use same T.S. as their mother dataset.

    Here's an example of a few elements. The [] are there to delimit the
    byte groups, and commas to delimit the bytes in groups. Bytes are
    written as hex, in two chars, or as ASCII strings in "", or as ASCII
    chars in ''. I'll use "xx" when I can't tell the actual value of the
    byte.

    Length of Command Group (0002, 0000). Written in Explicit Little
    Endian:
    [02, 00] [00, 00] ['U', 'L'] [04, 00] [xx, 00, 00, 00]
    Byte groups are: Group Number, Element Number, VR, VL (Value Length),
    Value.

    Transfer Syntax UID (0002, 0010). Written in Explicit Little Endian:
    [02, 00] [10, 00] ['U', 'I'] [11, 00] ["1.2.840.10008.1.2"]
    Byte groups are same as last one.

    >From this one we learn that the Transfer Syntax in the dataset is

    Implicit Little Endian. So, now, in the dataset we start reading these
    elements:

    Length of group (0008, eeee). Written in Implicit Little Endian:
    [08, 00] [00, 00] [04, 00, 00, 00] [xx, 00, 00, 00]
    Byte groups are: Group Number, Element Number, VL, Value. No VR when
    writing the Implicit Way!

    Specific Character Set (0008, 0005). Written in Implicit Little
    Endian:
    [08, 00] [05, 00] [0A, 00, 00, 00] ["ISO_IR 100"]
    Byte groups are: Group Number, Element Number, VL, Value.

    Okay.. As for the sequences, you can figure that out on your own by
    looking at the examples they gave in PS3.5
    However, I came up with this little interpretation of the rules, it
    might help:
    - Value of an SQ element is a set of datasets, and the nested
    datasets are called "Items".
    - Each dataset is placed within the value of the "Item Tag"
    delimiter.
    - If an Item Tag is of undefined length, the item ends with a "Item
    Delimitation Item Tag" delimiter.
    - If the SQ element is of undefined length, "Sequence Delimitation
    Item Tag" will be written after the last item.

    In a similar way, Pixel Data (and perhaps some other elements) can
    have multiple values, although it's VR is OB or OW. In that case, the
    element is of undefined length, and followed by Items with defined
    length that hold the data.


    I hope this helps. It would have helped me a lot, when I started
    reading those PS documents about a month ago. However, it would be
    nice if the experienced DICOM-ers would correct my errors before you
    followed up on all this.


  4. Re: A little help getting started with DICOM

    krunk- wrote:
    > I'm trying to wrap my head around the dicom format, but I have to
    > admit it's making me feel a bit retarded. I'd much appreciate a little
    > help getting on my feet. [...]


    The problem with reading the DICOM standard specification is that it
    is organized in a way that facilitates maintenance of the standard
    by avoiding redundancy wherever possible, but makes reading rather difficult.
    Once you have worked your way through all of the document and then start
    again for a second round, many things will start to make sense. However,
    few people ever reach that point given the 3000+ pages of specification.

    Concerning the file format, you need to consult the following documents
    in parallel:
    - part 3 describes the so-called IODs (Information Object Definitions).
    You can think of each IOD as one specific file format. There are different
    formats for CT images, MR images, X-ray images, Ultrasound etc.
    They all use the same encoding rules, but require different elements
    in the file. Part 3 describes which elements (attributes) are required,
    optional or even forbidden for each IOD, and also explains the meaning
    of each element.
    - Each of the elements (attributes) can then be looked up in part 6.
    This document is just a list that defines the data type (called
    Value Representation or, in short, "VR" in DICOM) and the number
    of values that each element with the given tag may contain.
    - Part 5 then defines the meaning (possible content) of each VR.
    This document also defines the so-called Transfer Syntaxes (encoding rules).
    DICOM has many different sets of encoding rules (big endian, little endian,
    files with and without explicitly encoded VRs, compressed and uncompressed).
    This document describes how attribute tags, VRs, length fields and the
    values themselves are encoded as a byte stream depending on the transfer
    syntax.
    - Part 10 finally describes the file format (as opposed to the byte stream
    representing an object during network transmission). The file format
    consists of preamble, magic word, metaheader, followed by a dataset
    as defined in part 5.

    Regards,
    Marco Eichelberg
    OFFIS

  5. Re: A little help getting started with DICOM

    This is all great stuff guys, thanks for the time/efforts.

    @boysha: The repost is a great plain english explanation of what's
    going on.

    @Eichelberg: Thanks, I'm sure this will make using the specification
    docs a bit less painful. I was beginning to feel schizophrenic when
    using the dicom specs and though it still feels that way....at least I
    know it's by design.

    @binary also a good read for getting started.

    James Kyle
    UCLA


+ Reply to Thread