Multi-CPU ? - Unix

This is a discussion on Multi-CPU ? - Unix ; Hi All, I have an multithreaded application. It is running properly on a single CPU machine, but crashes on multi-cpu machine. Analysing the core also did not help as each time the core dump happens at different place (core is ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: Multi-CPU ?

  1. Multi-CPU ?

    Hi All,
    I have an multithreaded application. It is running properly on a
    single CPU machine, but crashes on multi-cpu machine. Analysing the
    core also did not help as each time the core dump happens at different
    place (core is generated by signal 11).
    Any idea about what may be happening? Documents or links which
    inform the points to be considered for a program on multi-cpu machine
    will be of help.
    I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    on IBM eServer 336 (Dual Intel Xeon processor).

    Regards,
    Seenu.


  2. Re: Multi-CPU ?

    seenutn@gmail.com wrote:

    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.
    > I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    > on IBM eServer 336 (Dual Intel Xeon processor).


    My bet is that it's caused by bad ram.

    --
    mail: teeuu at qwest dot net

  3. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.


    One thread deleting something another is still using maybe? Or
    modifying something another doesn't expect to be modified?

    It all depends what you doing and how thread safe your implementation is.

    --
    Ian Collins.

  4. Re: Multi-CPU ?

    ["Followup-To:" header set to comp.unix.programmer
    for this really doesn't need to be crossposted.]
    Begin <1140670783.883180.120380@i40g2000cwc.googlegroups. com>
    On 2006-02-23, seenutn@gmail.com wrote:
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.


    Any text on multithreading on multiple cpus. You might have a race
    condition which somehow causes null or meaningless pointers, and no lock
    to protect against it. Using multiple CPUs really is harder than merely
    faking it with one cpu.


    > I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    > on IBM eServer 336 (Dual Intel Xeon processor).


    IIRC these things usually run with ECC RAM, and keep logs of ECC faults
    in the BIOS. That makes me not be as quick to assume bad RAM as another
    poster did, altough the randomized problem place does point in that
    direction.

    It won't hurt to check with a memory checker, and it's probably easier
    than trying to pin down multi-cpu race conditions. Note that it will
    need to go through the entire memory at least a few times as no error
    reported is no guarantee there is no error.


    --
    j p d (at) d s b (dot) t u d e l f t (dot) n l .
    This message was originally posted on Usenet in plain text.
    Any other representation, additions, or changes do not have my
    consent and may be a violation of international copyright law.

  5. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.


    The most likely scenario is that you have bugs in your code.

    Now that you have threads that are actually running simultaneously,
    you're hitting race conditions that you didn't see before.

    Chris

  6. Re: Multi-CPU ?

    >>>>> "seenutn" == seenutn writes:

    seenutn> Hi All, I have an multithreaded application. It is
    seenutn> running properly on a single CPU machine, but crashes on
    seenutn> multi-cpu machine. Analysing the core also did not help
    seenutn> as each time the core dump happens at different place
    seenutn> (core is generated by signal 11). Any idea about what
    seenutn> may be happening? Documents or links which inform the
    seenutn> points to be considered for a program on multi-cpu
    seenutn> machine will be of help. I am using RHEL 4 update 2
    seenutn> (kernel-2.6.9-5EL.smp), running the app on IBM eServer
    seenutn> 336 (Dual Intel Xeon processor).

    Sounds like problems in your code. First time running your own
    multi-threaded code on a real SMP machine?

    You should read standard university textbooks on OS or parallel
    programming. Keywords include "race condition", "critical section",
    "synchronization", etc. Many of these concurrent-programming problems
    do not show up when running a multi-threaded program in a single-CPU
    system.

    BTW, learn to use Apache Log4j (or the less popular and less flexible
    java.util.logging.*) to print out log messages from your code. It's
    very helpful for tracing what your threads are really doing, and more
    importantly, how they dance together. Very likely, one of your
    threads is stepping on the foot of the other, causing the latter to
    cry! (Try also the Chainsaw GUI for viewing the log messages!)



    --
    Lee Sau Dan u ~{@nJX6X~}

    E-mail: danlee@informatik.uni-freiburg.de
    Home page: http://www.informatik.uni-freiburg.de/~danlee

  7. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.
    > I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    > on IBM eServer 336 (Dual Intel Xeon processor).
    >
    > Regards,
    > Seenu.
    >


    Been there, done that.

    Debugging multi-threaded code is no simple task. A very talented guy I
    used to work with run his MT code on a dual processor Linux box with no
    hassle. He then tried it on a quad processor Sun SPARCstation 20 and
    found it crashed. He found it was a real bug, that just never seemed to
    occur on single processor Suns or a dual processor Linux box.

    I wrote a program that would work fine on numerous UNIX boxes *all* are
    multi-processor. These include PCs running Linux, *BSD etc, Suns, HPs
    running both HP-UX, Cray and others. It also run on single processor Dec
    Alpha. But it would not run properly on a Sun running Linux. I tended to
    ignore that and blame Linux on SPARC, as multi-threaded Linux on SPARC
    is not well tested.

    But then I found it sometimes mis-behaved on a quad processor IBM
    RS/60000. Then I became suspicious of my code, as I trusted AIX somewhat
    better than Linux on a SPARC.

    Sure enough, I found a bug in my code.

    There is a open-source library that you can link in that is supposed to
    help find such bugs. I can't say I was over-impressed with it, but it
    might be worth a try.

    Multi-threaded code is just hard to write properly. Finding bugs is hard.

    BTW, I have a couple of book recomdations.

    1) This is one *NOT* to buy. It is the second worst technical book I
    have ever seen. I am afraid to say it is published by Sun.

    "Multithreaded programming with pthreads"

    by B. Lewis and Daniel J Berg, published by Sun Microsystems.

    IMHO a total waste of time. It tries to cover far too much, including
    threads on OS2, yet ignores LWP on Solaris. Seems odd for a book
    published by Sun.

    2) "Foundations of multithraaded, parallel and distributed programming"

    by Gregory R. Andrews, Addisonm Wesley. Excellent book.

    I'm not sure if the author is the same 'Greg Andrews' that used to post
    to the Sun related newsgroups, but I suspect not. But that is well worth
    the money. Just forget the Sun book.


    --
    Dave K

    Minefield Consultant and Solitaire Expert (MCSE).

    Please note my email address changes periodically to avoid spam.
    It is always of the form: month-year@domain. Hitting reply will work
    for a couple of months only. Later set it manually.

  8. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.
    > I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    > on IBM eServer 336 (Dual Intel Xeon processor).



    I would wager this is a bug in your code.

    Finding such bugs is very difficult so you need to:

    a) Instrument your code so that debug builds uncover broken assumptions
    b) Keep to a small number of well tested MT helper classes that "just
    work". (not necessarily simple classes BTW)
    c) Monte-carlo unit tests (stress tests).

    I recently deployed a heavily multithreaded system which followed these
    principles and no MT related bugs have been found yet (in the wild).

    The monte-carlo tests really did work to ferret out the most unthinkable
    issues.

    MT code results in severe limitations on how your design looks like. I
    remember having a number of engineers complain about the design
    constraints because of the "strange" way the MT classes worked. I now
    hear amazement at the stability of the application.


  9. Re: Multi-CPU ?

    >
    > Sure enough, I found a bug in my code.
    >
    > There is a open-source library that you can link in that is supposed to
    > help find such bugs. I can't say I was over-impressed with it, but it
    > might be worth a try.
    > well worth


    Hi Dave,

    Any idea about what was the library you used? I thought I will give a
    try.

    Regards,
    Seenu.


  10. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    > Hi All,
    > I have an multithreaded application. It is running properly on a
    > single CPU machine, but crashes on multi-cpu machine. Analysing the
    > core also did not help as each time the core dump happens at different
    > place (core is generated by signal 11).
    > Any idea about what may be happening? Documents or links which
    > inform the points to be considered for a program on multi-cpu machine
    > will be of help.
    > I am using RHEL 4 update 2 (kernel-2.6.9-5EL.smp), running the app
    > on IBM eServer 336 (Dual Intel Xeon processor).


    The article
    http://docs.sun.com/app/docs/doc/816...delines&a=view
    (Working With Multiprocessors section) helps little bit.

    Regards,
    Seenu.


  11. Re: Multi-CPU ?

    seenutn@gmail.com wrote:
    >>Sure enough, I found a bug in my code.
    >>
    >>There is a open-source library that you can link in that is supposed to
    >>help find such bugs. I can't say I was over-impressed with it, but it
    >>might be worth a try.
    >> well worth

    >
    >
    > Hi Dave,
    >
    > Any idea about what was the library you used? I thought I will give a
    > try.
    >
    > Regards,
    > Seenu.
    >

    No,
    I can't. Thinking about it more, it might have been a patch on gcc.
    There are some thread specific newsgroups - that would probably be the
    place to ask.

    --
    Dave K MCSE.

    MCSE = Minefield Consultant and Solitaire Expert.

    Please note my email address changes periodically to avoid spam.
    It is always of the form: month-year@domain. Hitting reply will work
    for a couple of months only. Later set it manually.

+ Reply to Thread