how to best "characterize" a thread - Unix

This is a discussion on how to best "characterize" a thread - Unix ; This is an unusual thread question because it's not about thread programming per se but rather thread _auditing_. I'm working on a process auditing tool (think strace but put to a different purpose). Imagine that a threaded program is run ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: how to best "characterize" a thread

  1. how to best "characterize" a thread

    This is an unusual thread question because it's not about thread
    programming per se but rather thread _auditing_. I'm working on a
    process auditing tool (think strace but put to a different purpose).
    Imagine that a threaded program is run twice under control of this
    auditor and the results are left in two output files "trace1" and
    trace2". I need to be able to "match up" the threads between trace1 and
    trace2 for comparison purposes. And BTW, for now I only want to consider
    pthreads; if the answer can be adapted to other implementations I can
    probably handle that myself later on.

    The obvious answer is to compare thread ids but this breaks down pretty
    quickly. Some platforms issue random thread ids while others increment
    from 1, but even on the latter type, due to the basic asynchronous
    nature of threads, thread #3 in one run might be #4 in another.

    Right now the best I can think of is a combination of (1) the address
    from which pthread_create() was called, (2) the address of the
    start_routine, and (3) the address of the argument to the start_routine.
    It seems to me that, though it's possible for that same combination to
    occur multiple times in the same process, even if they did the threads
    would be logically interchangeable (i.e. they'd be doing the same thing
    so it would be ok to treat them the same).

    I can see at least one flaw in the above scheme; the data that the 'arg'
    pointer address could change even if the pointer itself didn't. I don't
    see an answer to that.

    It might make sense to use the first argument to pthread_create (the
    address to which the thread id is written, as opposed to the id itself)
    as well, since I can't see it usually making sense to write multiple
    thread ids to the same place.

    These are my ideas; does anyone have better ones? Or is this by chance a
    problem with a documented solution?

    Thanks,
    Arch Stanton

  2. Re: how to best "characterize" a thread

    Arch Stanton wrote:
    > This is an unusual thread question because it's not about thread
    > programming per se but rather thread _auditing_. I'm working on a
    > process auditing tool (think strace but put to a different purpose).
    > Imagine that a threaded program is run twice under control of this
    > auditor and the results are left in two output files "trace1" and
    > trace2". I need to be able to "match up" the threads between trace1 and
    > trace2 for comparison purposes. And BTW, for now I only want to consider
    > pthreads; if the answer can be adapted to other implementations I can
    > probably handle that myself later on.
    >
    > The obvious answer is to compare thread ids but this breaks down pretty
    > quickly. Some platforms issue random thread ids while others increment
    > from 1, but even on the latter type, due to the basic asynchronous
    > nature of threads, thread #3 in one run might be #4 in another.
    >
    > Right now the best I can think of is a combination of (1) the address
    > from which pthread_create() was called, (2) the address of the
    > start_routine, and (3) the address of the argument to the start_routine.
    > It seems to me that, though it's possible for that same combination to
    > occur multiple times in the same process, even if they did the threads
    > would be logically interchangeable (i.e. they'd be doing the same thing
    > so it would be ok to treat them the same).
    >
    > I can see at least one flaw in the above scheme; the data that the 'arg'
    > pointer address could change even if the pointer itself didn't. I don't
    > see an answer to that.
    >
    > It might make sense to use the first argument to pthread_create (the
    > address to which the thread id is written, as opposed to the id itself)
    > as well, since I can't see it usually making sense to write multiple
    > thread ids to the same place.
    >
    > These are my ideas; does anyone have better ones? Or is this by chance a
    > problem with a documented solution?


    I may have misunderstood, so let me restate what I think your
    problem is. You have a multi-threaded program in which various
    threads write data to the same output file, perhaps a log of some
    kind. Running the program a second time, even with the same inputs,
    may produce a different log because "corresponding" threads may be
    scheduled differently in the two executions. Your job is to separate
    the individual threads' log entries from each of the two logs, and to
    match them with the "corresponding" threads' entries from the other log.
    If that's not what you're after, just skip this message ...

    It seems to me that you must be in control of what is written to
    the log, since you're contemplating making a change to it. Instead
    of relying on implementation-specific and out-of-your-control things
    like thread ID's and so on, why not just assign each thread an "ID"
    of your own when you launch it? Just pass your own sequence number
    or string or whatever to each new thread at pthread_create() time,
    and thereafter have each thread tag its own log entries with the ID
    you gave it. That is, your log entries would be tagged "T1" and "T2"
    even if the system's thread ID's in one run were 42 and 4242, and in
    the other run 25 and 99.

    ... or have I mistaken your intent?

    --
    Eric.Sosman@sun.com

  3. Re: how to best "characterize" a thread

    On Sep 18, 5:00 pm, Arch Stanton wrote:

    > It might make sense to use the first argument to
    > pthread_create (the address to which the thread id is written,
    > as opposed to the id itself) as well, since I can't see it
    > usually making sense to write multiple thread ids to the same
    > place.


    If the threads are detached, there's no reason to keep the
    pthread_t variable around. When starting a detached thread,
    I'll use a local variable for it, which disappears when I return
    from the function. And if I call the function again, there's a
    distinct possibility that it will end up at the same address.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

+ Reply to Thread