64 bytes for SOP Instance UID - DICOM

This is a discussion on 64 bytes for SOP Instance UID - DICOM ; Hello, I have just finished the implementation of the generation of UID in gdcm(*), and I wanted to share my experience with you. The main problem I had here was not really the cross plateform implementation on different -broken- plateforms ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: 64 bytes for SOP Instance UID

  1. 64 bytes for SOP Instance UID

    Hello,

    I have just finished the implementation of the generation of UID in
    gdcm(*), and I wanted to share my experience with you.

    The main problem I had here was not really the cross plateform
    implementation on different -broken- plateforms but rather dealing with
    the extremely limited size of UIDs.

    Explanation of the size. GDCM is used with ITK(**) and thus I had to use
    the DICOM prefix we had after David Clunie send us the link for a free
    one(***). Already the size of this prefix was 27 bytes. AS we wanted a
    subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.

    Then -still following David advices- I need to incode a spacial
    identifier: the mac address of the machine (and not IP address). After
    some google reserch I came to a final implementation. And the result is
    coded on unsigned char[6] this can then be represented as a number up to
    256^6 = 281474976710656 which can fit on 15 bytes.

    After the spatial identifier I needed a temporal identifier. This is far
    easier. I just used regular time function and coded something like:

    "Year Month Day Hour Minutes Second Millisecond"
    which fits on
    4 + 2 + 2 + 2 + 2 + 2 + 3 = 17 bytes

    Therefore I have 29 + 15 + 17 = 62 bytes !

    So I only have two bytes left for the random number.

    Implementing a software generation of unique identifier that fits on 64
    bytes is clearly impossible.

    Any advices to squeeze all this *without* loosing the uniqueness ?

    Thanks
    Mathieu
    (*) http://www.creatis.insa-lyon.fr/Public/Gdcm/Main.html
    (**) http://www.itk.org/
    (***)
    http://www.dclunie.com/medical-image...IDRegistration


  2. Re: 64 bytes for SOP Instance UID

    Hi Mathieu.

    The assigned UID root and the MAC together make a pretty good UID
    scheme. From what's left, adding the date/time down to the millisecond
    is good.
    The worst part seems to me to be the random number.
    Keeping its impact in the UID schema down to a minimum looks like a
    good thing to do, so I won't try to create a larger number of bytes to
    be filled in by pseudo-random values.

    I'd always rely to a maximum on organization-managed UIDs, if possible
    and keep the random parts as low as possible.
    By the way, what random-number generator are you using?
    ~~Razvan


  3. Re: 64 bytes for SOP Instance UID

    Hi Mathieu,

    I don't see any compelling reason for the random number.
    Your scheme fits within 64 bytes and is unique assuming
    the application never attempts to generate more than one UID
    with the value returned from the time function.
    With the two extra digits available, 100 UID's are possible
    using one time value.

    You can save digits by taking advantage of the fact that
    the number of milliseconds in a year easily fits within 11 digits
    rather than the 13 you now allocate. Adopting the year
    the ISO prefix was granted as year zero, saves two more
    digits over the next few hundred years even allowing for
    period delimiters to bracket the time.

    Don't forget (as many do) that leading zeros are not allowed in the UID.

    I advise that GDCM allow its client to configure a UID prefix
    for users that already have an ISO prefix, and perhaps also
    to allow the application to set the UID with its own scheme.

    - Doug

    "Mathieu Malaterre" wrote in message
    news:NjiId.36678$Xs6.13811@twister.nyroc.rr.com...
    > Hello,
    >
    > I have just finished the implementation of the generation of UID in gdcm(*), and I wanted to share
    > my experience with you.
    >
    > The main problem I had here was not really the cross plateform
    > implementation on different -broken- plateforms but rather dealing with
    > the extremely limited size of UIDs.
    >
    > Explanation of the size. GDCM is used with ITK(**) and thus I had to use
    > the DICOM prefix we had after David Clunie send us the link for a free
    > one(***). Already the size of this prefix was 27 bytes. AS we wanted a
    > subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.
    >
    > Then -still following David advices- I need to incode a spacial
    > identifier: the mac address of the machine (and not IP address). After
    > some google reserch I came to a final implementation. And the result is
    > coded on unsigned char[6] this can then be represented as a number up to
    > 256^6 = 281474976710656 which can fit on 15 bytes.
    >
    > After the spatial identifier I needed a temporal identifier. This is far
    > easier. I just used regular time function and coded something like:
    >
    > "Year Month Day Hour Minutes Second Millisecond"
    > which fits on
    > 4 + 2 + 2 + 2 + 2 + 2 + 3 = 17 bytes
    >
    > Therefore I have 29 + 15 + 17 = 62 bytes !
    >
    > So I only have two bytes left for the random number.
    >
    > Implementing a software generation of unique identifier that fits on 64
    > bytes is clearly impossible.
    >
    > Any advices to squeeze all this *without* loosing the uniqueness ?
    >
    > Thanks
    > Mathieu
    > (*) http://www.creatis.insa-lyon.fr/Public/Gdcm/Main.html
    > (**) http://www.itk.org/
    > (***)
    > http://www.dclunie.com/medical-image...IDRegistration
    >




  4. Re: 64 bytes for SOP Instance UID

    Mathieu Malaterre wrote:

    > The main problem I had here was not really the cross plateform
    > implementation on different -broken- platforms ...


    Actually, I thought your work on getting a MAC address regardless
    of what platform one is running on was most excellent, and must
    have taken forever to research and test.

    > Explanation of the size. GDCM is used with ITK(**) and thus I had to use
    > the DICOM prefix we had after David Clunie send us the link for a free
    > one(***). Already the size of this prefix was 27 bytes. AS we wanted a
    > subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.


    As I think I pointed out earlier on the ITK mailing list, you could
    choose a root from a source that is shorter.

    You (well, ITK) is using one from Dave Harvey, which is long. Specifically
    "1.2.826.0.1.3680043.2.1125", which is 27 bytes.

    I suggested you use one from the IANA registry, which is relatively
    short. E.g. mine is 17 characters before I start to subdivide the space.

    The highest number in the IANA registry right now is 22233, so they
    are only into 5 digits on top of the 12+period prefix, so if you
    act quickly (some time in the next decade or so), you should probably
    be able to get an 18 byte root.

    That is 9 bytes better than what you are currently using.

    > After the spatial identifier I needed a temporal identifier. This is far
    > easier. I just used regular time function and coded something like:

    ....
    > So I only have two bytes left for the random number.


    Safest not ever to assume time is sufficient; if the granularity of time
    is milliseconds (and it may not be that granular even if the return
    value is in milliseconds), this could be too short on a modern fast
    machine that can generate a lot of UIDs in a millisecond.

    For example, I just ran a quick check on the Java UID generator in
    one of my toolkits and it generates over 10 UIDs per millisecond if
    the same system and time based prefix is used, and still over 5
    UIDs per second if a new system and time based prefix is generated,
    and that's on my laptop. Once it has been running for a while and
    the JIT optimizer gets to it, it actually exceeds 20 UIDs per
    millisecond for the latter scenario.

    So a random number, or cache of counts per time increment is
    definitely required I think (java.rmi.server.UID does the latter
    for you).

    David


  5. Re: 64 bytes for SOP Instance UID

    Why use a random number generator at all?
    A counter is guaranteed not to repeat a prior value
    but an RNG might.

    "David Clunie" wrote in message news:41F2C335.8000002@dclunie.com...
    > Mathieu Malaterre wrote:
    >
    > > The main problem I had here was not really the cross plateform
    > > implementation on different -broken- platforms ...

    >
    > Actually, I thought your work on getting a MAC address regardless
    > of what platform one is running on was most excellent, and must
    > have taken forever to research and test.
    >
    >> Explanation of the size. GDCM is used with ITK(**) and thus I had to use
    >> the DICOM prefix we had after David Clunie send us the link for a free
    >> one(***). Already the size of this prefix was 27 bytes. AS we wanted a
    >> subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.

    >
    > As I think I pointed out earlier on the ITK mailing list, you could
    > choose a root from a source that is shorter.
    >
    > You (well, ITK) is using one from Dave Harvey, which is long. Specifically
    > "1.2.826.0.1.3680043.2.1125", which is 27 bytes.
    >
    > I suggested you use one from the IANA registry, which is relatively
    > short. E.g. mine is 17 characters before I start to subdivide the space.
    >
    > The highest number in the IANA registry right now is 22233, so they
    > are only into 5 digits on top of the 12+period prefix, so if you
    > act quickly (some time in the next decade or so), you should probably
    > be able to get an 18 byte root.
    >
    > That is 9 bytes better than what you are currently using.
    >
    >> After the spatial identifier I needed a temporal identifier. This is far
    >> easier. I just used regular time function and coded something like:

    > ...
    >> So I only have two bytes left for the random number.

    >
    > Safest not ever to assume time is sufficient; if the granularity of time
    > is milliseconds (and it may not be that granular even if the return
    > value is in milliseconds), this could be too short on a modern fast
    > machine that can generate a lot of UIDs in a millisecond.
    >
    > For example, I just ran a quick check on the Java UID generator in
    > one of my toolkits and it generates over 10 UIDs per millisecond if
    > the same system and time based prefix is used, and still over 5
    > UIDs per second if a new system and time based prefix is generated,
    > and that's on my laptop. Once it has been running for a while and
    > the JIT optimizer gets to it, it actually exceeds 20 UIDs per
    > millisecond for the latter scenario.
    >
    > So a random number, or cache of counts per time increment is
    > definitely required I think (java.rmi.server.UID does the latter
    > for you).
    >
    > David
    >




  6. Re: 64 bytes for SOP Instance UID

    Doug Sluis wrote:
    > Why use a random number generator at all?
    > A counter is guaranteed not to repeat a prior value
    > but an RNG might.


    I have full spatial resolution (MAC Address) + full temporal precision
    (well almost). Assuming I would generate my UID slowly it is impossible
    to find twice the same one. The problem only comes with fast machine/
    multi process / multi thread, that's why RNG comes handy.

    If you are saying that I could use a static counter while generating the
    UID then I have to deal will mutex locks to garantee I am not writting
    (=incrementing) this counter at the same time from two threads. Which
    seems to me a lot more troubles (cross plateform speaking) than the
    simple call to 'rand()' ...

    Thanks anyway,
    Mathieu

  7. Re: 64 bytes for SOP Instance UID

    David Clunie wrote:
    > Mathieu Malaterre wrote:
    >
    > > The main problem I had here was not really the cross plateform
    > > implementation on different -broken- platforms ...

    >
    > Actually, I thought your work on getting a MAC address regardless
    > of what platform one is running on was most excellent, and must
    > have taken forever to research and test.


    Thank you. The thing I am happy with is that it compiled right away on
    all machines without errors. I am not sure it works (in fact it does not
    on old HPUX if you were to copy/paste it).

    > I suggested you use one from the IANA registry, which is relatively
    > short. E.g. mine is 17 characters before I start to subdivide the space.


    Ok I'll try to convince people to ask for a new one...

    > Safest not ever to assume time is sufficient; if the granularity of time
    > is milliseconds (and it may not be that granular even if the return
    > value is in milliseconds), this could be too short on a modern fast
    > machine that can generate a lot of UIDs in a millisecond.
    >
    > For example, I just ran a quick check on the Java UID generator in
    > one of my toolkits and it generates over 10 UIDs per millisecond if
    > the same system and time based prefix is used, and still over 5
    > UIDs per second if a new system and time based prefix is generated,
    > and that's on my laptop. Once it has been running for a while and
    > the JIT optimizer gets to it, it actually exceeds 20 UIDs per
    > millisecond for the latter scenario.
    >
    > So a random number, or cache of counts per time increment is
    > definitely required I think (java.rmi.server.UID does the latter
    > for you).


    Yeah I know, we have some fancy machines here which were generating
    DICOM like lightspeed, and the UID ended up being mainly the same ( I
    was generating 5-10 DICOM within a second before I start using the
    milliseconds).

    Mathieu

  8. Re: 64 bytes for SOP Instance UID


    > You can save digits by taking advantage of the fact that
    > the number of milliseconds in a year easily fits within 11 digits
    > rather than the 13 you now allocate.


    Very nice. I'll try this right away.

    > Adopting the year
    > the ISO prefix was granted as year zero, saves two more
    > digits over the next few hundred years even allowing for
    > period delimiters to bracket the time.


    That's also very smart. Plus in 100 years I won't be blamed for the
    implementation :P

    > Don't forget (as many do) that leading zeros are not allowed in the UID.


    D. Clunie remind me this - and I was doing it -

    > I advise that GDCM allow its client to configure a UID prefix
    > for users that already have an ISO prefix, and perhaps also
    > to allow the application to set the UID with its own scheme.


    that the easiest part and is already done. There is also a special
    easter egg. By default the function doesn't take any parameter and set
    the prefix as "GDCM" in binary:

    $ echo "GDCM" | od -b

    107 104 103 115 012


    Mathieu

  9. Re: 64 bytes for SOP Instance UID


    > I'd always rely to a maximum on organization-managed UIDs, if possible
    > and keep the random parts as low as possible.
    > By the way, what random-number generator are you using?


    The C rand() function:

    CONFORMING TO
    The functions rand() and srand() conform to SVID 3, BSD 4.3, ISO
    9899, POSIX 1003.1-2003. The function rand_r() is from POSIX 1003.1-2003.


    I am doing a cross plateform software so this is very hard to do a nice
    random generator.

    Mathieu
    If everybody was using linux it would be easy to read /dev/random

  10. Re: 64 bytes for SOP Instance UID

    I presume then, that your RNG is a service that runs in its
    own thread. Otherwise it suffers from the same weakness as
    the static counter you cite. From my experience, a threadsafe
    counter is trivial to implement, and the one that I tend to prefer,
    because it NEVER returns the same value within the
    allocated range it cycles through.


    "Mathieu Malaterre" wrote in message
    news:f_CId.159060$Uf.149904@twister.nyroc.rr.com.. .
    > Doug Sluis wrote:
    >> Why use a random number generator at all?
    >> A counter is guaranteed not to repeat a prior value
    >> but an RNG might.

    >
    > I have full spatial resolution (MAC Address) + full temporal precision (well almost). Assuming I
    > would generate my UID slowly it is impossible to find twice the same one. The problem only comes
    > with fast machine/ multi process / multi thread, that's why RNG comes handy.
    >
    > If you are saying that I could use a static counter while generating the UID then I have to deal
    > will mutex locks to garantee I am not writting (=incrementing) this counter at the same time from
    > two threads. Which seems to me a lot more troubles (cross plateform speaking) than the simple call
    > to 'rand()' ...
    >
    > Thanks anyway,
    > Mathieu




  11. Re: 64 bytes for SOP Instance UID

    HI,
    Is adding MAC number is needed in generating UID. I have read in
    dicom standard that suffix can be generated using device serial number,
    study ID, serial and timestamp. This suffix along with root UID is used
    to generate UID appropriately, i mean provided your having your own
    root UID. Is this right?



    Razvan wrote:
    > Hi Mathieu.
    >
    > The assigned UID root and the MAC together make a pretty good UID
    > scheme. From what's left, adding the date/time down to the

    millisecond
    > is good.
    > The worst part seems to me to be the random number.
    > Keeping its impact in the UID schema down to a minimum looks like a
    > good thing to do, so I won't try to create a larger number of bytes

    to
    > be filled in by pseudo-random values.
    >
    > I'd always rely to a maximum on organization-managed UIDs, if

    possible
    > and keep the random parts as low as possible.
    > By the way, what random-number generator are you using?
    > ~~Razvan



  12. Re: 64 bytes for SOP Instance UID

    Hi Gurikar.

    >From my reading of the standard, I cannot see a constraint imposed by

    the Dicom standard itself on how the uid suffix part is generated.

    The MAC address is a good thing to have as part of the UID, because it
    is guaranteed to be unique, being managed globally, as opposed to a
    randomly generated number.

    So the MAC is not needed nor required, but it is a good path to follow
    in constructing the UID. You can add on top on that numbers generated
    by some incremental schemes, random numbers or combinations of the two,
    there doesn't seem to be a best way to do this.

    ~~Razvan


  13. Re: 64 bytes for SOP Instance UID

    Razvan wrote:

    > The MAC address is a good thing to have as part of the UID, because it
    > is guaranteed to be unique, being managed globally, as opposed to a
    > randomly generated number.


    I agree that the MAC address is a good candidate to be used for UID
    creation. However, the MAC address is _not_ guaranteed to be world-wide
    unique. The uniqueness is only required within a (local) network. For
    example most onboard Ethernet adapters allow to change the MAC address.
    Furthermore, most OS also allow to change the MAC address (under Linux
    "ifconfig eth0 hw ether 00:01:02:03:04:05").

    See http://en.wikipedia.org/wiki/MAC_address for details.

    Regards,
    Joerg Riesmeier

  14. Re: 64 bytes for SOP Instance UID

    Mathieu Malaterre wrote:

    > Implementing a software generation of unique identifier that fits
    > on 64 bytes is clearly impossible.


    I agree that it is impossible to _guarantee_ that a software generated
    UID is really world-wide unique unless you have a unique identifier for
    each system where your software runs on. As I wrote in another posting
    the MAC address is _not_ such an identifier, but e.g. the "host id" of
    Sun Sparc processors should be. Unfortunately, there are many other
    systems ...

    Btw, the approach we've taken for the DCMTK can be found in file
    "dcmdata/libsrc/dcuid.cc". In order to make sure that the generated
    UID fits into the 64 bytes restriction we create a hash code from
    many system dependent parameters, add the process id, a time stamp
    and a threadsafe counter.

    Regards,
    Joerg Riesmeier

  15. Re: 64 bytes for SOP Instance UID

    Joerg Riesmeier schrieb:

    > As I wrote in another posting
    > the MAC address is _not_ such an identifier, but e.g. the "host id" of
    > Sun Sparc processors should be. Unfortunately, there are many other
    > systems ...


    Joerg,

    I used to work for a SPARC house, and one of the topics the staff
    regularly did is change the host ID in hardware-upgraded systems. So no
    uniqueness guaranteed, either.

    Matthieu,

    the way I found to be the most reliable is to register each single
    implementation - a php script on your webserver could do - and put it
    into you UID root. Unfortuately, some users may fear a loss of privacy
    if the product is open source, but you may explain the item in product
    documentation.

    A scheme I deployed assumed the following:
    * I have a UID root for the software producer
    * There are less than 1000 installations in one ZIP Postcode area.
    => Deployed:
    "UID root (company specific)."
    + "companies software code (2 digits)."
    + "Zip (5 digit) plus 4 digits installation counter."
    + "further stuff (2-6 digits)."
    + "YYYYMMDDHHMMSS plus running counter."

    which gave enough space within the 64 byte to encode different types of
    UIDs (done by the "further stuff" to distinguish different UID types.

    If you forget about the ZIP-Code stuff, you may use a running counter of
    some digits for an "Installation Counter" that can be povided by a php /
    cgi script where users can download the "system specific UID".

    DICOM states that suppliers _must_ guarantee uniqueness of the
    generatesd UIDs, so any statistical guessing and Random number stuff is
    *not* sufficient.


    My two cents.

    HTH,

    Peter

+ Reply to Thread