11/40 misbehaviour - DEC

This is a discussion on 11/40 misbehaviour - DEC ; I'm working on an 11/40 running RSX which is somewhat flaky: It keeps getting T04 halts, which the manual tells me indicate that the trap bit (4) of the PSW is getting set, but that RSX has no handler for ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: 11/40 misbehaviour

  1. 11/40 misbehaviour

    I'm working on an 11/40 running RSX which is somewhat flaky: It keeps getting
    T04 halts, which the manual tells me indicate that the trap bit (4) of the PSW
    is getting set, but that RSX has no handler for this. It happens fairly often,
    but not predictably.

    I don't have a set of 11/40 diagnostics, only 11/34--some will run, most will
    not. Where can I find RX01 disk images with 11/40 diags? Or does anyone have
    a feel for what might be causing the T04 halts?

    --
    Rich Alderson | /"\ ASCII ribbon |
    news@alderson.users.panix.com | \ / campaign against |
    "You get what anybody gets. You get a lifetime." | x HTML mail and |
    --Death, of the Endless | / \ postings |

  2. Re: 11/40 misbehaviour

    On 4 Aug 2004, Rich Alderson wrote:

    > I'm working on an 11/40 running RSX which is somewhat flaky: It keeps getting
    > T04 halts, which the manual tells me indicate that the trap bit (4) of the PSW
    > is getting set, but that RSX has no handler for this. It happens fairly often,
    > but not predictably.
    >
    > I don't have a set of 11/40 diagnostics, only 11/34--some will run, most will
    > not. Where can I find RX01 disk images with 11/40 diags? Or does anyone have
    > a feel for what might be causing the T04 halts?


    Are you sure it is a PSW bit 4 (T bit, indicates trace trap), and not a
    trap through the vector at 4? Debuggers use the trace trap in conjunction
    with the RTT instruction to single-step. They set the "T" bit in the saved
    PSW, then RTT. One instruction executes and then traps again. The debugger
    has to set up the trap handler for the trace trap (at the vector at 14) and
    proceed accordingly. Getting spurious trace-traps is very weird. Either
    you have bad memory in the kernel stack area (which is setting bit 4 in the
    saved PSW from various random interrupts), or the PS hardware itself is
    flakey and is reading or writing the bit 4 incorrectly.

    Trap through 4 could be anything... T4 is Timeout and other errors,
    according to my PDP 11/40 Processor Handbook. A Unibus timeout could
    be caused by incorrect device configuration, but I think the device
    setup code at boot time should try to protect you against this. (It
    does on RSTS, by disabling non-existent devices, but I'm not nearly
    as familiar with RSX.) Memory could be going out to lunch, and failing
    to respond to reads or writes. Or the bus cable(s), jumper(s) or
    terminators could be bad, and randomly changing an address bit causing
    a memory or device address to be wrong.

    Have you tried all the standard things, i.e. reseating boards, checking
    PS voltages, etc.?

    If that doesn't help, try paring down the system to the minimal
    working configuration (CPU, console terminal, enough memory to run,
    and the system disk) and then add things back one at a time until
    the problem recurs.

    I haven't seen PDP-11/40 diags in years. We used to have a tape
    (9-track, 800 bpi) of them, but it is long gone.

    HTH.

    --
    John Santos
    Evans Griffiths & Hart, Inc.
    781-861-0670 ext 539


  3. Re: 11/40 misbehaviour

    In article <1040804230752.7326D-100000@Ives.egh.com>,
    John Santos wrote:
    >On 4 Aug 2004, Rich Alderson wrote:



    >I haven't seen PDP-11/40 diags in years. We used to have a tape
    >(9-track, 800 bpi) of them, but it is long gone.


    Would the PDP-10 KL frontend diags work?

    /BAH

    Subtract a hundred and four for e-mail.

  4. Re: 11/40 misbehaviour

    We have the diags in RX-01 format.
    But I agree with the previous email you need to shorten the bus,
    ck P/S's and swap the first two banks of memory cards by changing the
    jumpers/switchs and see if the sympton changes.

    Plus posted on the web somewhere are a few hand toggle in programs that
    will check memory, etc.

    Bob B.


    jmfbahciv@aol.com wrote in message news:<411210cb$0$2832$61fed72c@news.rcn.com>...
    > In article <1040804230752.7326D-100000@Ives.egh.com>,
    > John Santos wrote:
    > >On 4 Aug 2004, Rich Alderson wrote:

    >
    >
    > >I haven't seen PDP-11/40 diags in years. We used to have a tape
    > >(9-track, 800 bpi) of them, but it is long gone.

    >
    > Would the PDP-10 KL frontend diags work?
    >
    > /BAH
    >
    > Subtract a hundred and four for e-mail.


  5. Re: 11/40 misbehaviour

    RHB wrote:

    > But I agree with the previous email you need to shorten the bus,
    > ck P/S's and swap the first two banks of memory cards by changing the
    > jumpers/switchs and see if the sympton changes.
    >


    Moving memory banks is always a good first test, but beware that most
    decently configured systems would interleave the banks, so a straight
    meory address swap might just move the error from even to odd words
    (16 bits)

    > Plus posted on the web somewhere are a few hand toggle in programs that
    > will check memory, etc.
    >

    Have a look at http://www.psych.usyd.edu.au/pdp-11/hints.html,
    especially the trap catcher.

  6. Re: 11/40 misbehaviour

    John Holden writes:
    > Moving memory banks is always a good first test, but beware that most
    > decently configured systems would interleave the banks, so a straight
    > meory address swap might just move the error from even to odd words
    > (16 bits)


    Memory interleave on an 11/40? How do you configure that? I've only
    seen them use normal Unibus memory with each memory unit responding to
    all addresses in a contiguous range.


  7. Re: 11/40 misbehaviour

    On the 11/40 you interleave with jumpers on the memory board
    controllers.

    Using the MM11-U type memory, two contiguously addressed 16K banks
    yield 32K by properly jumpering the M8293 timing board:

    W9 is in and W10 is cut for one of the 16K pair
    and
    W9 is cut and W10 is in for the other 16K pair
    Both interleave memorys shud be cut for the same starting
    address

    To non-interleave:

    W1 is in, W2, W8, W9 and W10 are cut

    Using the MM11-L type memory, two contiguously addressed 8K banks
    yield 16K by properly jumpering the G110 control board using jumpers
    W7 and W8:

    If W7 and W8 are configured in an “X” pattern, they
    are interleave

    If W7 and W8 are straight across to their own post, they are
    non-interleave

    That info is from the book but as I recall it sounds correct.








    Eric Smith wrote in message news:...
    > John Holden writes:
    > > Moving memory banks is always a good first test, but beware that most
    > > decently configured systems would interleave the banks, so a straight
    > > meory address swap might just move the error from even to odd words
    > > (16 bits)

    >
    > Memory interleave on an 11/40? How do you configure that? I've only
    > seen them use normal Unibus memory with each memory unit responding to
    > all addresses in a contiguous range.


  8. Re: 11/40 misbehaviour

    On the 11/40 you interleave with jumpers on the memory board
    controllers.

    Using the MM11-U type memory, two contiguously addressed 16K banks
    yield 32K by properly jumpering the M8293 timing board:

    W9 is in and W10 is cut for one of the 16K pair
    and
    W9 is cut and W10 is in for the other 16K pair
    Both interleave memorys shud be cut for the same starting
    address

    To non-interleave:

    W1 is in, W2, W8, W9 and W10 are cut

    Using the MM11-L type memory, two contiguously addressed 8K banks
    yield 16K by properly jumpering the G110 control board using jumpers
    W7 and W8:

    If W7 and W8 are configured in an “X” pattern, they
    are interleave

    If W7 and W8 are straight across to their own post, they are
    non-interleave

    That info is from the book but as I recall it sounds correct.








    Eric Smith wrote in message news:...
    > John Holden writes:
    > > Moving memory banks is always a good first test, but beware that most
    > > decently configured systems would interleave the banks, so a straight
    > > meory address swap might just move the error from even to odd words
    > > (16 bits)

    >
    > Memory interleave on an 11/40? How do you configure that? I've only
    > seen them use normal Unibus memory with each memory unit responding to
    > all addresses in a contiguous range.


  9. Re: 11/40 misbehaviour

    My last post had the text garbled in one the sentences.

    It shud read :
    If W7 and W8 are configured in an X pattern, they
    are interleaved


    dcs8506@bellsouth.net (RHB) wrote in message news:<221a1b70.0408052218.10663697@posting.google.com>...
    > On the 11/40 you interleave with jumpers on the memory board
    > controllers.
    >
    > Using the MM11-U type memory, two contiguously addressed 16K banks
    > yield 32K by properly jumpering the M8293 timing board:
    >
    > W9 is in and W10 is cut for one of the 16K pair
    > and
    > W9 is cut and W10 is in for the other 16K pair
    > Both interleave memorys shud be cut for the same starting
    > address
    >
    > To non-interleave:
    >
    > W1 is in, W2, W8, W9 and W10 are cut
    >
    > Using the MM11-L type memory, two contiguously addressed 8K banks
    > yield 16K by properly jumpering the G110 control board using jumpers
    > W7 and W8:
    >
    > If W7 and W8 are configured in an “X” pattern, they
    > are interleave
    >
    > If W7 and W8 are straight across to their own post, they are
    > non-interleave
    >
    > That info is from the book but as I recall it sounds correct.
    >
    >
    >
    >
    >
    >
    >
    >
    > Eric Smith wrote in message news:...
    > > John Holden writes:
    > > > Moving memory banks is always a good first test, but beware that most
    > > > decently configured systems would interleave the banks, so a straight
    > > > meory address swap might just move the error from even to odd words
    > > > (16 bits)

    > >
    > > Memory interleave on an 11/40? How do you configure that? I've only
    > > seen them use normal Unibus memory with each memory unit responding to
    > > all addresses in a contiguous range.


  10. Re: 11/40 misbehaviour

    jmfbahciv@aol.com writes:

    > In article <1040804230752.7326D-100000@Ives.egh.com>,
    > John Santos wrote:
    > >On 4 Aug 2004, Rich Alderson wrote:

    >


    >> I haven't seen PDP-11/40 diags in years. We used to have a tape
    >> (9-track, 800 bpi) of them, but it is long gone.


    > Would the PDP-10 KL frontend diags work?


    No, this is clearly a problem with the 11/40 itself; the only 11-based-11 diags
    on the KLAD pack are XXDP, XTECO, and 3 RP04 exercisers (ZRJA, ZRJB, and ZRJC).
    Everything else that runs on the 11/40 exercises the KL.

    --
    Rich Alderson | /"\ ASCII ribbon |
    news@alderson.users.panix.com | \ / campaign against |
    "You get what anybody gets. You get a lifetime." | x HTML mail and |
    --Death, of the Endless | / \ postings |

  11. Re: 11/40 misbehaviour

    John Santos writes:

    > Are you sure it is a PSW bit 4 (T bit, indicates trace trap), and not a
    > trap through the vector at 4?


    Yes, I was quite sure. I was, however, confusing two occurrences. Error code
    TBT is "T-bit trap"--and I did get it on the other box, quite often.

    > Trap through 4 could be anything... T4 is Timeout and other errors,
    > according to my PDP 11/40 Processor Handbook. A Unibus timeout could
    > be caused by incorrect device configuration, but I think the device
    > setup code at boot time should try to protect you against this. (It
    > does on RSTS, by disabling non-existent devices, but I'm not nearly
    > as familiar with RSX.) Memory could be going out to lunch, and failing
    > to respond to reads or writes. Or the bus cable(s), jumper(s) or
    > terminators could be bad, and randomly changing an address bit causing
    > a memory or device address to be wrong.


    Error code T04 is precisely this, bus timeouts or odd-address word references.

    > Have you tried all the standard things, i.e. reseating boards, checking
    > PS voltages, etc.?


    Yes.

    > If that doesn't help, try paring down the system to the minimal
    > working configuration (CPU, console terminal, enough memory to run,
    > and the system disk) and then add things back one at a time until
    > the problem recurs.


    This is a minimal system: 28KW memory, console, RH11, RX01.

    > I haven't seen PDP-11/40 diags in years. We used to have a tape
    > (9-track, 800 bpi) of them, but it is long gone.


    Just on general principles, and not my own need, that's too bad.

    --
    Rich Alderson | /"\ ASCII ribbon |
    news@alderson.users.panix.com | \ / campaign against |
    "You get what anybody gets. You get a lifetime." | x HTML mail and |
    --Death, of the Endless | / \ postings |

  12. Re: 11/40 misbehaviour

    In article ,
    Rich Alderson wrote:
    >jmfbahciv@aol.com writes:
    >
    >> In article <1040804230752.7326D-100000@Ives.egh.com>,
    >> John Santos wrote:
    >> >On 4 Aug 2004, Rich Alderson wrote:

    >>

    >
    >>> I haven't seen PDP-11/40 diags in years. We used to have a tape
    >>> (9-track, 800 bpi) of them, but it is long gone.

    >
    >> Would the PDP-10 KL frontend diags work?

    >
    >No, this is clearly a problem with the 11/40 itself; the only 11-based-11

    diags
    >on the KLAD pack are XXDP, XTECO, and 3 RP04 exercisers (ZRJA, ZRJB, and

    ZRJC).
    >Everything else that runs on the 11/40 exercises the KL.
    >

    Then what did Field Service use to check out the 11s on the
    KLs? PDP-11s were good but not that good :-).

    /BAH

    Subtract a hundred and four for e-mail.

  13. Re: 11/40 misbehaviour

    Rich Alderson wrote in message news:...
    > John Santos writes:
    >
    > > Are you sure it is a PSW bit 4 (T bit, indicates trace trap), and not a
    > > trap through the vector at 4?

    >
    > Yes, I was quite sure. I was, however, confusing two occurrences. Error code
    > TBT is "T-bit trap"--and I did get it on the other box, quite often.
    >
    > > Trap through 4 could be anything... T4 is Timeout and other errors,
    > > according to my PDP 11/40 Processor Handbook. A Unibus timeout could
    > > be caused by incorrect device configuration, but I think the device
    > > setup code at boot time should try to protect you against this. (It
    > > does on RSTS, by disabling non-existent devices, but I'm not nearly
    > > as familiar with RSX.) Memory could be going out to lunch, and failing
    > > to respond to reads or writes. Or the bus cable(s), jumper(s) or
    > > terminators could be bad, and randomly changing an address bit causing
    > > a memory or device address to be wrong.

    >
    > Error code T04 is precisely this, bus timeouts or odd-address word references.
    >
    > > Have you tried all the standard things, i.e. reseating boards, checking
    > > PS voltages, etc.?

    >
    > Yes.
    >
    > > If that doesn't help, try paring down the system to the minimal
    > > working configuration (CPU, console terminal, enough memory to run,
    > > and the system disk) and then add things back one at a time until
    > > the problem recurs.

    >
    > This is a minimal system: 28KW memory, console, RH11, RX01.
    >
    > > I haven't seen PDP-11/40 diags in years. We used to have a tape
    > > (9-track, 800 bpi) of them, but it is long gone.

    >
    > Just on general principles, and not my own need, that's too bad.


    Do you have a scope to look at levels on the unibus ?

    Does the system run but die intermittently or is it flat out dead ?

  14. Re: 11/40 misbehaviour

    dcs8506@bellsouth.net (RHB) writes:

    > Do you have a scope to look at levels on the unibus ?


    There is a scope available, yes. Please teach me to fish: What am I looking
    for?

    > Does the system run but die intermittently or is it flat out dead ?


    Runs intermittently. Was running for hours at a time, now dies in under 2
    hours.

    --
    Rich Alderson | /"\ ASCII ribbon |
    news@alderson.users.panix.com | \ / campaign against |
    "You get what anybody gets. You get a lifetime." | x HTML mail and |
    --Death, of the Endless | / \ postings |

  15. Re: 11/40 misbehaviour

    Rich Alderson wrote in message news:...
    > dcs8506@bellsouth.net (RHB) writes:
    >
    > > Do you have a scope to look at levels on the unibus ?

    >
    > There is a scope available, yes. Please teach me to fish: What am I looking
    > for?
    >
    > > Does the system run but die intermittently or is it flat out dead ?

    >
    > Runs intermittently. Was running for hours at a time, now dies in under 2
    > hours.


    The old DEC field service procedure on tough problems usually
    entailed:
    1. Check all the fans above each P/S, H7444, H745,the small one above
    the 54-9728 regulator plus above and below the boards. Make sure
    they are running.

    2. Check the power supply voltages at the CPU backplane, memory,
    device backplanes not at each regulator. Believe pins A2 was +5VDC and
    C1 was -15VDC. They mite look good at each power supply but be lower
    at the backplane pins due to the cable loss effect.

    3. Un-interleave the memory and swap the starting address among stacks
    and see if symptoms change.

    4. At the CPU backplane unibus slots A & B hang a scope probe on the
    Unibus address pins and look at the "1" and "0" levels and the
    quiescent levels. The old 8881 and 380 driver receivers chips ( I
    think that was the part numbers) would deteriorate over time and hold
    down the "1" level to -2vdc or less and cause traps to 4, etc. Or the
    "0" level would not be 0 VDC but maybe -.8, etc.You can compare know
    good signals to suspect ones. I can get back to you on exact pin
    numbers to check as I don't have it right in front of me. If you find
    a suspect signal you then pop driver/receiver boards out till the
    suspect level goes back to normal.

    5. Obviously shorten the bus as previous posts suggest.

    6. Check and clean dirty contacts as previous post suggest.

    7. This mite be a stretch but here goes....back in the days of tubes
    and discreet components one the procedure was to run the intermittent
    failing program and lightly tap a few times on the board handles with
    a screwdriver and see if the error occurs. Many a bad board was
    detected this way without diags. Step 2 of this procedure was to use a
    heat gun or hand held hair dryer and blow hot air on suspect boards.
    Step 3 was equally or more bizarre and will detail that later if need
    be.

    Granted this might seem like a fishing expedition but after 30+ years
    of fixing PDP's these worked and still do. Good luck.

  16. Re: 11/40 misbehaviour

    Thanks! I'll report back when I have some results.

    --
    Rich Alderson | /"\ ASCII ribbon |
    news@alderson.users.panix.com | \ / campaign against |
    "You get what anybody gets. You get a lifetime." | x HTML mail and |
    --Death, of the Endless | / \ postings |

  17. Re: 11/40 misbehaviour

    jmfbahciv@aol.com wrote:

    > In article ,
    > Rich Alderson wrote:
    >>jmfbahciv@aol.com writes:
    >>
    >>> In article <1040804230752.7326D-100000@Ives.egh.com>,
    >>> John Santos wrote:
    >>> >On 4 Aug 2004, Rich Alderson wrote:
    >>>

    >>
    >>>> I haven't seen PDP-11/40 diags in years. We used to have a tape
    >>>> (9-track, 800 bpi) of them, but it is long gone.

    >>
    >>> Would the PDP-10 KL frontend diags work?

    >>
    >>No, this is clearly a problem with the 11/40 itself; the only 11-based-11

    > diags
    >>on the KLAD pack are XXDP, XTECO, and 3 RP04 exercisers (ZRJA, ZRJB, and

    > ZRJC).
    >>Everything else that runs on the 11/40 exercises the KL.
    >>

    > Then what did Field Service use to check out the 11s on the
    > KLs? PDP-11s were good but not that good :-).


    We used the standard 11/40 diags. Besides, the /40 rarely gave us any
    trouble. It was the -10/20 stuff that gave us grief.

    --

    Stu

  18. Re: 11/40 misbehaviour

    Rich Alderson wrote:

    > Runs intermittently. Was running for hours at a time, now dies in under 2
    > hours.
    >


    When it 'dies', does it halt, or lock up so that none of the console
    switches work unless you press the 'start' key to do a bus reset. If the
    latter, then it's hanging up in the microcode. If it runs for a couple
    of hours, then halts, then a strong possibility is memory (remember that
    every memory access is destructive, and the data must be written back).

    Try zeroing all memory (assuming 64Kb)

    175770 5000 clr r0
    177772 5020 clr (r0)+
    177774 776 br .-2

    Start at 175770, and let it run until it halts (a faction of a second).

    Then, load a trap catcher http://www.psych.usyd.edu.au/pdp-11/hints.html
    and finally a continuous loop:-

    0774 12706 mov #770,sp set the stack
    0776 00770
    1000 000777 br . infinite loop


    Start at 0774, then wait. When it halts, look at the adddress to see if
    it trapped or halted. The display will show the halt adrress + 2, and if
    it was via the 'trap catcher', then the display will show the vector + 4

    If the address is 10, then it was a bus error trap. Examine the stack at
    address 766, which will be the address of where it was when the trap
    happened.

    If the address is 14, then it's a reserved instruction. Check as above.

    If the address displayed at halt was 30, then there was a power fail
    interrupt and you have power supply problems

    If the address is above 1006, then check if it contains only a single
    bit after correcting the location by subtracting 2. So a halt at 10002
    or 20002 might indicate problems with an address line (or decoding in
    a memory module).

    Also, check the original contents of location 1000, which should still
    be 000777. If it has changed (usually by one bit), then it's a memory
    problem. You may pick up or loose a memory bit depending on the memory
    design and where the fault is (data latch, memory timing, noise margin
    etc)

  19. Re: 11/40 misbehaviour

    I am very sure I have some 8881 chips etc, that I scrounged years ago. They
    are new 'very old' stock. Please post back on this newsgroup if you have to
    replace chips and are looking for a source.
    Thanks.
    RHB wrote in message <221a1b70.0408120649.19421c51@posting.google.com>...
    >Rich Alderson wrote in message

    news:...
    >> dcs8506@bellsouth.net (RHB) writes:
    >>
    >> > Do you have a scope to look at levels on the unibus ?

    >>
    >> There is a scope available, yes. Please teach me to fish: What am I

    looking
    >> for?
    >>
    >> > Does the system run but die intermittently or is it flat out dead ?

    >>
    >> Runs intermittently. Was running for hours at a time, now dies in under

    2
    >> hours.

    >
    >The old DEC field service procedure on tough problems usually
    >entailed:
    >1. Check all the fans above each P/S, H7444, H745,the small one above
    >the 54-9728 regulator plus above and below the boards. Make sure
    >they are running.
    >
    >2. Check the power supply voltages at the CPU backplane, memory,
    >device backplanes not at each regulator. Believe pins A2 was +5VDC and
    >C1 was -15VDC. They mite look good at each power supply but be lower
    >at the backplane pins due to the cable loss effect.
    >
    >3. Un-interleave the memory and swap the starting address among stacks
    >and see if symptoms change.
    >
    >4. At the CPU backplane unibus slots A & B hang a scope probe on the
    >Unibus address pins and look at the "1" and "0" levels and the
    >quiescent levels. The old 8881 and 380 driver receivers chips ( I
    >think that was the part numbers) would deteriorate over time and hold
    >down the "1" level to -2vdc or less and cause traps to 4, etc. Or the
    >"0" level would not be 0 VDC but maybe -.8, etc.You can compare know
    >good signals to suspect ones. I can get back to you on exact pin
    >numbers to check as I don't have it right in front of me. If you find
    >a suspect signal you then pop driver/receiver boards out till the
    >suspect level goes back to normal.
    >
    >5. Obviously shorten the bus as previous posts suggest.
    >
    >6. Check and clean dirty contacts as previous post suggest.
    >
    >7. This mite be a stretch but here goes....back in the days of tubes
    >and discreet components one the procedure was to run the intermittent
    >failing program and lightly tap a few times on the board handles with
    >a screwdriver and see if the error occurs. Many a bad board was
    >detected this way without diags. Step 2 of this procedure was to use a
    >heat gun or hand held hair dryer and blow hot air on suspect boards.
    >Step 3 was equally or more bizarre and will detail that later if need
    >be.
    >
    >Granted this might seem like a fishing expedition but after 30+ years
    >of fixing PDP's these worked and still do. Good luck.




+ Reply to Thread