Unexplained Hang During Boot - Embedded

This is a discussion on Unexplained Hang During Boot - Embedded ; I am experiencing a very bizarre problem with vxWorks and I am hoping that someone might be able to offer some suggestions on where to start looking to determine the root of the problem. VxWorks is being used on a ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: Unexplained Hang During Boot

  1. Unexplained Hang During Boot

    I am experiencing a very bizarre problem with vxWorks and I am hoping
    that someone might be able to offer some suggestions on where to start
    looking to determine the root of the problem.

    VxWorks is being used on a Synergy Microsystems VME SBC which is PPC
    based. The problem seems to arise at random times after rebuilding the
    OS image. For instance, by commenting out a single 'printf' statement
    such as "printf("Message Received\n"); in an application level piece of
    code that is not even invoked; and rebuilding the image, the image can
    hang while booting (early in the boot procedure). Uncomment this
    'printf' statement, rebuild the image, and the OS will boot without
    error. Note that this routine is not called at any time during the boot
    procedure so the code containing that printf is never even executed.

    This problem has been experienced by multiple developers on different
    modules. I am not sure if this is a hardware, or a software type of
    problem. Can anyone think of any reason why something as non-intrusive
    as commenting out a printf statement, in a function that is never even
    invoked, would cause the OS to hang during boot?

    The printf statement is only adding a handful of bytes to the resultant
    image and larger images than the ones that fail have been booted
    successfully.

    Similar hangs have been produced by changing array sizes in uncalled
    routines, etc., (i.e., add a few more bytes to an array in an uncalled
    function and the images hangs during boot, add a few more bytes and the
    image loads fine).


  2. Re: Unexplained Hang During Boot

    On 28 Jun 2006, eon_blue_80@verizon.net wrote:
    > I am experiencing a very bizarre problem with vxWorks and I am
    > hoping that someone might be able to offer some suggestions on where
    > to start looking to determine the root of the problem.


    [snip]

    > The printf statement is only adding a handful of bytes to the
    > resultant image and larger images than the ones that fail have been
    > booted successfully.


    This sounds like a cache problem. The "printf" is unrelated to the
    code. It just changes the image size at the "right" place. You could
    add a ".bytes 7" or something in the code section and the same thing
    would result.

    At some point in the boot sequence, there may be an alias between data
    and code cache. It could be when the MMU is turned on. The address
    space will change and code must often jump in a very specific
    sequence. It maybe a conflict with a device. For instance an "eieio"
    instruction may be necessary in some cases, but due to code section
    alignment, the code is executing in different times and the "eieio"
    become necessary/un-necessary depending on the build.

    It is very good that you try to hunt this down. I've known several
    "senior" people who have let this type of problem go on for ever.

    You can toggle an LED, an general purpose I/O with scope or you can
    use some polled console output to provide check points in the boot
    sequence to see where the hang occurs.

    The important point is that the "printf" has nothing to do with the
    problem besides making the code move around. You can verify this by
    inserting different dummy routines with different lengths (a cache
    line is typically 32/64 bytes). Observing a map file of the full
    image and knowing the location of these bytes can be helpful. For
    instance if code following this is an ethernet driver, then that may
    be helpful to know.

    It could also be reading of garbage strings, code, constant data. I
    have also seen one section of code round MMU rights and another read
    to the byte. Sometimes this rounding is wrong and a "bus error"
    happens due to memory not being sized right.

    hth,
    Bill Pringlemeir.

    --
    You have the right to remain silent -- so shut up!

    vxWorks FAQ, "http://www.xs4all.nl/~borkhuis/vxworks/vxworks.html"

  3. Re: Unexplained Hang During Boot

    eon_blue_80@verizon.net wrote:

    > VxWorks is being used on a Synergy Microsystems VME SBC which is PPC
    > based. The problem seems to arise at random times after rebuilding the
    > OS image. For instance, by commenting out a single 'printf' statement
    > such as "printf("Message Received\n"); in an application level piece of
    > code that is not even invoked; and rebuilding the image, the image can
    > hang while booting (early in the boot procedure). Uncomment this
    > 'printf' statement, rebuild the image, and the OS will boot without
    > error. Note that this routine is not called at any time during the boot
    > procedure so the code containing that printf is never even executed.
    >
    > This problem has been experienced by multiple developers on different
    > modules. I am not sure if this is a hardware, or a software type of
    > problem. Can anyone think of any reason why something as non-intrusive
    > as commenting out a printf statement, in a function that is never even
    > invoked, would cause the OS to hang during boot?
    >
    > The printf statement is only adding a handful of bytes to the resultant
    > image and larger images than the ones that fail have been booted
    > successfully.
    >
    > Similar hangs have been produced by changing array sizes in uncalled
    > routines, etc., (i.e., add a few more bytes to an array in an uncalled
    > function and the images hangs during boot, add a few more bytes and the
    > image loads fine).
    >


    Another possibility is that errant code is corrupting memory during the
    boot process. The commonest case is the "wild pointer" where an
    uninitialized pointer is used to write data. Other possibilites would be
    over-running the stack reserved area or using pointers to buffers that
    have been returned to the buffer pool and re-used. I have also seen
    incorrect function prototypes cause this type of problem. If you are
    using vector tables in RAM, walking on them will cause this type of
    problem too.

    The way I would attempt to solve this problem is with a logic analyzer.
    Start out by finding where the code hangs. Then see if the instruction
    sequence to get there took any un-explainable jumps. See if the
    departure point for the unexplainable sequence values match the expected
    values for that address. If they don't match the expected values, use
    writes to those locations to trigger the logic analyzer and you should
    be able to locate the errant code. The departure from expected execution
    could also be un-initialized or corrupted vectors in the vector table.

    I am not familiar with the particular VME card you mentioned, but memory
    management hardware could protect you from a number of the things I
    described. Because it is a boot sequence problem, memory management
    hardware may not be operational at this point.

    Another place to look would be the linker command file. Are all of the
    segements large enough and in non-overlapping regions of memory? The
    logic analyzer approach would leady you to this type of problem, but it
    could be a painful path that could be avoided by careful study.

    Good Luck,
    Bob



  4. Re: Unexplained Hang During Boot

    > eon_blue_80@verizon.net wrote:

    > Another possibility is that errant code is corrupting memory during
    > the boot process.


    This is *unlikely* as the OP noted that adding un-executed code would
    cause the problem. If the code is directly corrupting memory this
    would be unlikely to introduce the problem. Especially if the added
    code make no types of allocation, nor writes to memory. If simply
    changing the cache on/off will cause the crash, I find it extremely
    unlikely that it is a memory corruption.

    So there is a quick way to rule this out. Disable/enable the cache
    with a crashing image. Often you can arrange the code so that the
    size is the same, just a constant has changed to disable/enable the
    cache.

    fwiw,
    Bill Pringlemeir.

    --
    Anyone who trades liberty for security deserves neither liberty nor
    security - Benjamin Franklin

    vxWorks FAQ, "http://www.xs4all.nl/~borkhuis/vxworks/vxworks.html"

  5. Re: Unexplained Hang During Boot

    eon_blue_80@verizon.net wrote:

    > I am experiencing a very bizarre problem with vxWorks and I am hoping
    > that someone might be able to offer some suggestions on where to start
    > looking to determine the root of the problem.
    >
    > VxWorks is being used on a Synergy Microsystems VME SBC which is PPC
    > based. The problem seems to arise at random times after rebuilding the
    > OS image. For instance, by commenting out a single 'printf' statement
    > such as "printf("Message Received\n"); in an application level piece of
    > code that is not even invoked; and rebuilding the image, the image can
    > hang while booting (early in the boot procedure). Uncomment this
    > 'printf' statement, rebuild the image, and the OS will boot without
    > error. Note that this routine is not called at any time during the boot
    > procedure so the code containing that printf is never even executed.


    Reading your post, it's not clear how many different
    physical units you've tried this on. If the answer
    is one, the problem could be a bad byte with a bad
    bit of flash memory.



+ Reply to Thread