PC164 -- Memory Problem -- SROM Codes - Linux

This is a discussion on PC164 -- Memory Problem -- SROM Codes - Linux ; -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have a PC164 with AlphaBIOS. I had a copy of WIN NT4 on it, I now want to install linux on. It had worked fine except for a memory timing issue that was ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: PC164 -- Memory Problem -- SROM Codes

  1. PC164 -- Memory Problem -- SROM Codes

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    I have a PC164 with AlphaBIOS. I had a copy of WIN NT4 on it, I now want to
    install linux on. It had worked fine except for a memory timing issue that
    was fixed in service pack 3. It hasn't been booting, No video, no serial
    console, usually no beeps(I got some once and it's never been duplicated --
    it was 5 or 6 beeps). So I dropped down to the SROM level in order to find
    out what the board is doing. The debug port displaya the following
    sequence:

    21164A.01.02.03.04.05.06.07.08.0b.3f.05.3f.05.3f.0 5.3f.05.3f.05.3f.05.3f.05
    ....

    >From the manual:


    00 Firmware initialization is complete
    01 CPU speed detected
    02 CPU speed converted
    03 Configuration jumpers read
    04 Bcache configuration value computed
    05 Bcache control value computed
    06 Bcache turned off
    07 Memory timing registers written
    08 Memory control register written
    09 Memory bank 0 register written
    0B DRAMs awakened
    0C Memory sized and memory bank 0 written
    0F Bcache turned on
    13 All of memory rewritten (good data parity written)
    14 Memory errors cleared; start reading system ROM
    3F Fatal error. Second code identifies source of error:
    05 = No memory found
    06 = Checksum error detected when image was
    read back from memory
    07 = Could not determine the SIMM type

    It appears this memory initialization is having a problem. I can also
    duplicate the problem by removing all the memory.

    I also have not checked anything in power plane to make sure power if
    correctly being supplied.

    If any of you have an idea of what is going on, I would appreciate some
    help. One person has suggested that I might need to upload a new SROM
    image via the SROM Port/Debug Monitor.

    If this is the case does anyone know where a copy of the debug monitor
    and/or the srom images can be found(at least for a pc164)? I assume
    that a standard MMJ cable will work fine for uploading (hardware flow
    control?). Is this right? I had to make a custom cable and a don't have a
    proper "SROM cable". It gets the postcodes, but I have not tried anything
    else.

    Thanks again.


    - --
    Philip Thiem -- Icequake dot net Administrator
    Email To -- witwerg at icequake dot net
    Isn't it obvious lumberjacks love traffic lights?
    GPG Pub Key Archived at wwwkeys.us.pgp.net
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFAIsiOoGWoM/hCe+YRAuY5AJ0f7yx/aUX29/nCsliy5ylU6QHbUgCdETJL
    XfglzrnJG4El0I0tcmxZUUI=
    =UJPD
    -----END PGP SIGNATURE-----

  2. Re: PC164 -- Memory Problem -- SROM Codes

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Update to my previous post.

    I've since run into some more data.

    I got into the mini-debugger. And mt running over very short ranges
    of memory seems to work, but anything major shows a number of wrong memory
    values (Sometime 0's turn into F's!). I'm thinking that I may have a cache
    issue. Has anyone else experienced bad cache with these boards? Care to
    share your expereiences? The power supply is good and the memory is know to
    be good. That leaves the CPU, cache, and mobo. Anything last ditch would
    be appreciated.

    - --
    Philip Thiem -- Icequake.net Administrator
    Isn't it obvious lumberjacks love traffic lights?
    GPG Pub Key Archived at wwwkeys.us.pgp.net
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFAJvVqoGWoM/hCe+YRAv5SAJ4o2m9cd2X9yEsl+jwH/A+N2Olj6ACdFaML
    +hG6isCU86qmaGEBhQKCfNg=
    =IVQ4
    -----END PGP SIGNATURE-----

  3. Re: PC164 -- Memory Problem -- SROM Codes

    Philip Thiem wrote:
    >
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > Update to my previous post.
    >
    > I've since run into some more data.
    >
    > I got into the mini-debugger. And mt running over very short ranges
    > of memory seems to work, but anything major shows a number of wrong memory
    > values (Sometime 0's turn into F's!). I'm thinking that I may have a cache
    > issue. Has anyone else experienced bad cache with these boards? Care to
    > share your expereiences? The power supply is good and the memory is know to
    > be good. That leaves the CPU, cache, and mobo. Anything last ditch would
    > be appreciated.
    >


    YES. Exactly the same symptoms. From one hour to the other SIMMs that
    used to work fine did not work at all, got the beep codes for bad
    memory, SROM said the same.
    Got started with the SROM mini-debugger, and found the same behaviour.
    Everything beyond 8 or 16KB in the mt (memory test) showed up
    miscompares and stuck bits.
    I was even able to chase it down to the guilty cache chip and the bit
    line.

    There is an undocumented way to disable the Bcache, it was a setting of
    the Bcache speed that is not documented, let's see... CF1 in.

    Now, replacing the bad chip would be possible, one could swap it for the
    one that is used as tag RAM, but only if you really lucky w.r.t the bad
    bit. The likelihood to find such chips nowadays is extremely dim, as
    they are synchronous 36 bit high speed SRAMs. PC-class COAST boards used
    pipelined burst chips with just 32 bits. What might work (but I haven't
    yet checked it) is using cache chips from dead Pentium II processors
    (Slot 1), they have 36 bit, but I'm not so sure about the PB versus
    synchronous business.

    And no, I haven't ventured so far in the desoldering and resoldering of
    that chip...

    --
    Michael Joosten, SBS C-LAB, joost@c-lab.de
    Fuerstenallee 11, 33094 Paderborn, Germany
    Phone: +49 5251 606127, Fax: +49 5251 606065
    C-LAB is a cooperation of University Paderborn & SIEMENS

  4. Re: PC164 -- Memory Problem -- SROM Codes

    Thanks for the information. I found it useful. However,
    I did try the CF1 jumper, and had no change in symptoms. Perhaps I have
    a different revision of the PCB? Were you able to determine
    where which was the bad line from the debugger, or did you have to
    pull out someting on the lines of a logic probe? Just curious.
    If it is the motherboard or the cache, then fixing it isn't a near term
    option and I'll find another board. Desoldering and replacing the chip
    looks like quiet a job, and as stated a replacement isn't going to be easy
    to find. Just to rule out the cpu, I have an additional working CPU on the
    way. I'll swap it out an see if there is a change (I'd like to have a
    replacement on hand anyway). Though I must admit I have a hard time
    believing that the CPU is acting up, since the mini-debugger is able to
    function.

    Philip Thiem


    Michael Joosten wrote:

    > Philip Thiem wrote:
    >>
    >> -----BEGIN PGP SIGNED MESSAGE-----
    >> Hash: SHA1
    >>
    >> Update to my previous post.
    >>
    >> I've since run into some more data.
    >>
    >> I got into the mini-debugger. And mt running over very short ranges
    >> of memory seems to work, but anything major shows a number of wrong
    >> memory
    >> values (Sometime 0's turn into F's!). I'm thinking that I may have a
    >> cache
    >> issue. Has anyone else experienced bad cache with these boards? Care to
    >> share your expereiences? The power supply is good and the memory is know
    >> to
    >> be good. That leaves the CPU, cache, and mobo. Anything last ditch
    >> would be appreciated.
    >>

    >
    > YES. Exactly the same symptoms. From one hour to the other SIMMs that
    > used to work fine did not work at all, got the beep codes for bad
    > memory, SROM said the same.
    > Got started with the SROM mini-debugger, and found the same behaviour.
    > Everything beyond 8 or 16KB in the mt (memory test) showed up
    > miscompares and stuck bits.
    > I was even able to chase it down to the guilty cache chip and the bit
    > line.
    >
    > There is an undocumented way to disable the Bcache, it was a setting of
    > the Bcache speed that is not documented, let's see... CF1 in.
    >
    > Now, replacing the bad chip would be possible, one could swap it for the
    > one that is used as tag RAM, but only if you really lucky w.r.t the bad
    > bit. The likelihood to find such chips nowadays is extremely dim, as
    > they are synchronous 36 bit high speed SRAMs. PC-class COAST boards used
    > pipelined burst chips with just 32 bits. What might work (but I haven't
    > yet checked it) is using cache chips from dead Pentium II processors
    > (Slot 1), they have 36 bit, but I'm not so sure about the PB versus
    > synchronous business.
    >
    > And no, I haven't ventured so far in the desoldering and resoldering of
    > that chip...
    >


    --
    Philip Thiem -- Icequake.net Administrator
    Isn't it obvious lumberjacks love traffic lights?
    GPG Pub Key Archived at wwwkeys.us.pgp.net

  5. Re: PC164 -- Memory Problem -- SROM Codes

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Thanks for the infomration, I fount it useful. However "CF1 in" didn't seem
    to help me. Perhaps I have a different revision of the PCB. Just curious,
    were you able to determine the bad line from the debugger or did you have
    to resort to something on the lines of a logic probe? If it's the
    motherboard or the cache (and I can't figure out how to disable it), I'll
    probably end up retiring the board.

    I had wanted a backup CPU around, so I have another working one on the way.
    I'll swap it out just to rule out the possibility; though I admit I have a
    hard time believing that the CPU is acting up since the debugger works ok.
    The only thing I could think of would be of the on chip-data cache.


    Philip Thiem

    Michael Joosten wrote:


    > YES. Exactly the same symptoms. From one hour to the other SIMMs that
    > used to work fine did not work at all, got the beep codes for bad
    > memory, SROM said the same.
    > Got started with the SROM mini-debugger, and found the same behaviour.
    > Everything beyond 8 or 16KB in the mt (memory test) showed up
    > miscompares and stuck bits.
    > I was even able to chase it down to the guilty cache chip and the bit
    > line.
    >
    > There is an undocumented way to disable the Bcache, it was a setting of
    > the Bcache speed that is not documented, let's see... CF1 in.
    >
    > Now, replacing the bad chip would be possible, one could swap it for the
    > one that is used as tag RAM, but only if you really lucky w.r.t the bad
    > bit. The likelihood to find such chips nowadays is extremely dim, as
    > they are synchronous 36 bit high speed SRAMs. PC-class COAST boards used
    > pipelined burst chips with just 32 bits. What might work (but I haven't
    > yet checked it) is using cache chips from dead Pentium II processors
    > (Slot 1), they have 36 bit, but I'm not so sure about the PB versus
    > synchronous business.
    >
    > And no, I haven't ventured so far in the desoldering and resoldering of
    > that chip...
    >


    - --
    Philip Thiem -- Icequake.net Administrator
    Isn't it obvious lumberjacks love traffic lights?
    GPG Pub Key Archived at wwwkeys.us.pgp.net
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)

    iD8DBQFAKb/YoGWoM/hCe+YRAjzzAJwNfsQ1+7hH6qovdKxzXqSJCP3cZACffq3W
    w9dO/+BVJEIhza6ICV8vlyQ=
    =LF+F
    -----END PGP SIGNATURE-----

  6. Re: PC164 -- Memory Problem -- SROM Codes

    Philip Thiem wrote:
    >
    > Thanks for the information. I found it useful. However,
    > I did try the CF1 jumper, and had no change in symptoms. Perhaps I have
    > a different revision of the PCB? Were you able to determine
    > where which was the bad line from the debugger, or did you have to
    > pull out someting on the lines of a logic probe? Just curious.


    In fact, I just switched the crate into SROM mode and tried it again.
    Though I cannot enter anything anymore (right, I also soldered a
    cable with a DB9 instead of hunting for MMJ stuff, which is no-no here,
    and the cable broke somewhere), I still get the output:

    21164A.01.02.03.04.05.06.07.08.09.0b.0c.0f.13.14.1 5.3f.06.15.17.18.3a


    So, this really seems to be different error here. Having a whole set of
    bits reverting from 0 to 1 (resulting in a F) probably means some
    trouble with the memory controller, I'd guess. Do you get 5 or 6 beeps?
    With my broken cache, I get 6.

    As for hunting down the chip: There is perhaps still the zip/tgz with
    the schematics of the PC164 in PostScript. So, I could identify the
    actual chip. Then, I used the loop commands and the write and read
    commands to check that zeroes were actually written to the bit lines of
    the SRAM in question and then see that zeroes were read back. You don't
    even need a scope, a simple DMM will give an average value and that
    already shows you what's going on.

    > If it is the motherboard or the cache, then fixing it isn't a near term
    > option and I'll find another board. Desoldering and replacing the chip
    > looks like quiet a job, and as stated a replacement isn't going to be easy
    > to find.


    I'd check the SIMM connectors twice. I'm not sure what happens if a
    whole SIMM is missing, there should be four bytes in limbo - but they
    could be spread over the whole bus width of 128 or 256 bits.

    Just to rule out the cpu, I have an additional working CPU on the
    > way. I'll swap it out an see if there is a change (I'd like to have a
    > replacement on hand anyway). Though I must admit I have a hard time
    > believing that the CPU is acting up, since the mini-debugger is able to
    > function.


    The mini-debugger is loaded from a serial EEPROM, so there could be
    quite a lot of stuff (pins, periphery) damaged and it would still be
    running!
    I'd try contact spray first. So, it might even be a defective CPU.


    > --
    > Philip Thiem -- Icequake.net Administrator
    > Isn't it obvious lumberjacks love traffic lights?
    > GPG Pub Key Archived at wwwkeys.us.pgp.net


    Nice pic of that slipped icebear!

    --
    Michael Joosten, SBS C-LAB, joost@c-lab.de
    Fuerstenallee 11, 33094 Paderborn, Germany
    Phone: +49 5251 606127, Fax: +49 5251 606065
    C-LAB is a cooperation of University Paderborn & SIEMENS

+ Reply to Thread