VAX floating-point instruction timing? - DEC

This is a discussion on VAX floating-point instruction timing? - DEC ; Does anybody know the cycle counts for the various VAX floating-point instructions? Any model will do, as I don't have proper info on any of them! Also, integer mul/div cycle counts for VAX 11/780, 750, 730 would also be very ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 24

Thread: VAX floating-point instruction timing?

  1. VAX floating-point instruction timing?

    Does anybody know the cycle counts for the various VAX floating-point
    instructions?

    Any model will do, as I don't have proper info on any of them!

    Also, integer mul/div cycle counts for VAX 11/780, 750, 730 would also be
    very much appreciated

    -Peter

  2. Re: VAX floating-point instruction timing?

    In article , "Peter \"Firefly\" Lund" writes:
    > Does anybody know the cycle counts for the various VAX floating-point
    > instructions?
    >
    > Any model will do, as I don't have proper info on any of them!
    >
    > Also, integer mul/div cycle counts for VAX 11/780, 750, 730 would also be
    > very much appreciated
    >


    DEC never published counts for any of the VAX instructions. A trivial
    program will demonstrate how much it depends on instruction cache hits,
    data cache hits, interupts (some of the more complex instructions can be
    interupted partially executed), page faults (can cause all or part of an
    instruction to be re-executed), software implemented subsets, FPA options,
    and in many cases data.

    About all you can count on is that a NOP in the instruction cache
    should be one cycle.

    Cycle counts are about as usefull as Marketing Instructions Per
    Second, which have the same and a few more issues.


  3. Re: VAX floating-point instruction timing?

    On Tue, 5 Sep 2006, Bob Koehler wrote:

    > DEC never published counts for any of the VAX instructions. A trivial


    I have the cycle counts for the VLSI VAXen's integer instructions (except
    for mul/div in most cases). Depending on the chip, they might or might
    not include the time it takes to decode the operand specifiers and fetch
    the operands.

    > program will demonstrate how much it depends on instruction cache hits,
    > data cache hits, interupts (some of the more complex instructions can be
    > interupted partially executed), page faults (can cause all or part of an
    > instruction to be re-executed), software implemented subsets, FPA options,
    > and in many cases data.


    I find it hard not to read this as "you are stupid or naive, kid, let me
    tell you how it works".

    I hope that was not the intent

    > About all you can count on is that a NOP in the instruction cache
    > should be one cycle.


    On a microcoded machine? Not necessarily

    (Yes, it was one cycle on the VLSI VAXen -- was it on VAX 11/780, too? I
    know next to nothing about its cycle times, unfortunately.)

    > Cycle counts are about as usefull as Marketing Instructions Per
    > Second, which have the same and a few more issues.


    They are very useful if you are trying to understand the various
    microarchitectures used.

    They are also very useful for optimization purposes, particularly if the
    microarchitecture is simple. "Pairing"/multiple issue rules (Pentium,
    68060, Alpha 21064, Alpha 21164) make it harder. Out-of-order execution
    makes it almost impossible -- but cycle counts and issue rates are still
    very useful hints, you just need to profile more carefully.

    -Peter

  4. Re: VAX floating-point instruction timing?

    Peter "Firefly" Lund wrote:
    > On Tue, 5 Sep 2006, Bob Koehler wrote:
    >
    >> DEC never published counts for any of the VAX instructions. A trivial

    >
    > I have the cycle counts for the VLSI VAXen's integer instructions
    > (except for mul/div in most cases).


    The VLSI-based VAX systems used microcode, too.

    >> About all you can count on is that a NOP in the instruction cache
    >> should be one cycle.

    >
    > On a microcoded machine? Not necessarily


    I'd have to check the architecture specs to see if that was a certainty, but
    I'd tend to assume that some VAX systems used more than one cycle for a NOP. It
    may well have been less than a cycle in some cases, depending on whether it was
    compressed out of the instruction stream. (On VAX 9000?) NOP was also useful
    for branch alignment, as unaligned branches were slower than aligned branches.

    > (Yes, it was one cycle on the VLSI VAXen -- was it on VAX 11/780, too?
    > I know next to nothing about its cycle times, unfortunately.)


    IIRC, it was a 188 bit microword on the VAX-11/780, and 80 on the VAX-11/750.

    And not all cycles and not all ticks are necessarily the same length.

    >> Cycle counts are about as usefull as Marketing Instructions Per
    >> Second, which have the same and a few more issues.

    >
    > They are very useful if you are trying to understand the various
    > microarchitectures used.


    And various VAX families used different microarchitectures, and instruction
    subsets can and did vary across the VAX family. In various cases, it was faster
    to execute a series of discrete instructions than some of the more complex
    instructions. CRC comes to mind here.

    > They are also very useful for optimization purposes, particularly if the
    > microarchitecture is simple.


    I've never heard a VAX referred to as "simple," and a VAX microarchitecture
    is not "simple.". (Dig around for any discussions of the WCS Writable Control
    Store option if you want details of what one VAX looked like. The WCS allowed
    user-written microcode for the VAX-11/780, and there were some folks that wanted
    or needed that extra margin of performance enough to use WCS.)

    I remember the "fun" that this tuning approach (and WCS, for that matter) led
    to decades ago. (And back when computes where truly scarce.) Sure, you can
    tune an instruction stream for one VAX model, but the effort could be wasted on
    the next. The VAX-11/790 (officially released as the VAX 8600) had its own
    coding oddities, for instance, and the subset VAX systems have their own
    considerations -- most VAX systems had instruction oddities, and you'll find
    occasional artifacts of a few of these lurking within present-day OpenVMS VAX
    itself.

    And then there's the old 700 microinstructions per programmer per year
    benchmark... (Maybe from Bell Labs?)

    > "Pairing"/multiple issue rules (Pentium,
    > 68060, Alpha 21064, Alpha 21164) make it harder. Out-of-order execution
    > makes it almost impossible -- but cycle counts and issue rates are still
    > very useful hints, you just need to profile more carefully.


    And you have to expect to toss all your careful profiling work out and start
    over again either periodically -- there were several releases of microcode for
    various VAX systems, and more than a few of the instruction tuners had to
    re-tune -- or when you moved to a different VAX platform. (Or when the data was
    aligned versus unaligned.)

    The old black processor binders (black five-ring binders with a white grill
    pattern) had the microcode listings for various of the VAX-11 series boxes. And
    the WCS option had tools and such for one of the boxes.

    BitSavers () claims to have:

    dec/vax/AA-H306B_780uprogToolsMar82.pdf VAX-11/780 Microprogramming Tools User's
    Guide (Mar 1982)



  5. Re: VAX floating-point instruction timing?

    On Tue, 5 Sep 2006, Hoff Hoffman wrote:

    > The VLSI-based VAX systems used microcode, too.


    I know. Bob Supnik released the microcode source code for those.

    >>> About all you can count on is that a NOP in the instruction cache
    >>> should be one cycle.

    >>
    >> On a microcoded machine? Not necessarily

    >
    > I'd have to check the architecture specs to see if that was a certainty,
    > but I'd tend to assume that some VAX systems used more than one cycle for a
    > NOP. It may well have been less than a cycle in some cases, depending on
    > whether it was compressed out of the instruction stream.


    It seems to have cost exactly one (micro)cycle, not counting instruction
    fetch time, on the VLSI machines. Wouldn't be surprised if the really old
    machines took more.

    > (On VAX 9000?) NOP
    > was also useful for branch alignment, as unaligned branches were slower than
    > aligned branches.
    >
    >> (Yes, it was one cycle on the VLSI VAXen -- was it on VAX 11/780, too? I
    >> know next to nothing about its cycle times, unfortunately.)

    >
    > IIRC, it was a 188 bit microword on the VAX-11/780, and 80 on the
    > VAX-11/750.


    188 is grotesque! I hope that meant that at least they didn't have to
    decode any fields of the microword. The VLSI CPUs used around 40 bits per
    microinstruction (but used (simple) decoders).

    > And not all cycles and not all ticks are necessarily the same length.


    Huh?!

    Do we talk past each other? What I call a cycle is a clock tick, in which
    a single microinstruction can be executed. I think some people call them
    ticks and reserve the word cycle for however long it takes to execute a
    macroinstruction.

    As far as I know, all the VAX implementations were synchronous so all
    clock ticks were equally long.

    I think some PDPs were partially asynchronous, though.

    >>> Cycle counts are about as usefull as Marketing Instructions Per
    >>> Second, which have the same and a few more issues.

    >>
    >> They are very useful if you are trying to understand the various
    >> microarchitectures used.

    >
    > And various VAX families used different microarchitectures, and instruction
    > subsets can and did vary across the VAX family. In various cases, it was
    > faster to execute a series of discrete instructions than some of the more
    > complex instructions. CRC comes to mind here.


    Sure. CRC and POLY (and queue manipulation, privilege changes, EDITPC,
    and vector stuff) don't interest me much, though.

    >> They are also very useful for optimization purposes, particularly if the
    >> microarchitecture is simple.

    >
    > I've never heard a VAX referred to as "simple," and a VAX microarchitecture
    > is not "simple.".


    Compared to a modern CPU, they were.

    I have reason to believe the microarchitectures were more complicated than
    necessary. If you take the Pentium microarchitecture, strip out one of
    the two pipelines and change it so it fits the VAX architecture instead of
    the IA32 architecture (basically yank out one set of idiosyncracies and
    stuff in another), then you should end up with something both faster and
    simpler than what DEC used. And you should be able to use that basic
    microarchitecture over a much wider range of implementations, even TTL.

    > (Dig around for any discussions of the WCS Writable
    > Control Store option if you want details of what one VAX looked like. The
    > WCS allowed user-written microcode for the VAX-11/780, and there were some
    > folks that wanted or needed that extra margin of performance enough to use
    > WCS.)


    Yes, I read some articles on that. People writing Prolog accelerators,
    for example. It sounds like ego stroking for clever little engineers to
    me

    > And you have to expect to toss all your careful profiling work out and
    > start over again either periodically -- there were several releases of
    > microcode for various VAX systems, and more than a few of the instruction
    > tuners had to re-tune -- or when you moved to a different VAX platform. (Or
    > when the data was aligned versus unaligned.)


    Make sure your data is aligned, then

    This is no different from having to reoptimize when going from one x86
    generation to the next -- or from Intel to Cyrix to AMD to VIA to IIT.

    There's a fun story about the Watcom compiler for x86. It was not very
    popular but everybody knew it generated the best machine code bar none.
    They redid their optimizations for the 486 and got a modest speed
    improvement after trying really, really hard. Then the Pentium came out
    and they had to find a way to optimize for that. They succeed -- and the
    Pentium-optimized code happened to run much faster on the 486 than their
    previus 486-optimized code

    > The old black processor binders (black five-ring binders with a white grill
    > pattern) had the microcode listings for various of the VAX-11 series boxes.
    > And the WCS option had tools and such for one of the boxes.
    >
    > BitSavers () claims to have:
    >
    > dec/vax/AA-H306B_780uprogToolsMar82.pdf VAX-11/780 Microprogramming Tools
    > User's Guide (Mar 1982)


    Thank you

    -Peter

  6. Re: VAX floating-point instruction timing?

    I don't have figures for the number of cycles, but there
    are measurements for the elapsed time needed for a number
    of floating point operations.

    I wrote a test suite a couple of decades ago, and the results
    for running the tests have been posted in a number of places.
    Encompasserve is probably the easiest way to get them. You
    can telnet to encompasserve.org and read the login banner to
    find out how to get a free account.

    Bart.

  7. Re: VAX floating-point instruction timing?

    On Wed, 6 Sep 2006, Bart Z. Lederman wrote:

    > I wrote a test suite a couple of decades ago, and the results
    > for running the tests have been posted in a number of places.
    > Encompasserve is probably the easiest way to get them. You
    > can telnet to encompasserve.org and read the login banner to
    > find out how to get a free account.


    Thanks

    Another thing I was wondering -- how well does lmbench run on an old VAX
    machine?

    -Peter

  8. Re: VAX floating-point instruction timing?

    On Wed, 6 Sep 2006, Bart Z. Lederman wrote:

    > Encompasserve is probably the easiest way to get them. You
    > can telnet to encompasserve.org and read the login banner to
    > find out how to get a free account.


    Well, I got the account now (LUND) but I don't really know VMS :/

    Gotta start learning, I guess

    -Peter

  9. Re: VAX floating-point instruction timing?

    koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes:

    > In article , "Peter \"Firefly\" Lund" writes:
    >> Does anybody know the cycle counts for the various VAX floating-point
    >> instructions?
    >>
    >> Any model will do, as I don't have proper info on any of them!
    >>
    >> Also, integer mul/div cycle counts for VAX 11/780, 750, 730 would also be
    >> very much appreciated


    > DEC never published counts for any of the VAX instructions. A
    > trivial program will demonstrate how much it depends on
    > instruction cache hits, data cache hits, interupts (some of the
    > more complex instructions can be interupted partially executed),
    > page faults (can cause all or part of an instruction to be
    > re-executed), software implemented subsets, FPA options, and in
    > many cases data.


    I think the 780 prints had cycle data in the u-code flow diagrams.

    --
    Paul Repacholi 1 Crescent Rd.,
    +61 (08) 9257-1001 Kalamunda.
    West Australia 6076
    comp.os.vms,- The Older, Grumpier Slashdot
    Raw, Cooked or Well-done, it's all half baked.
    EPIC, The Architecture of the future, always has been, always will be.

  10. Re: VAX floating-point instruction timing?

    Peter "Firefly" Lund wrote:
    > On Tue, 5 Sep 2006, Hoff Hoffman wrote:


    >> And not all cycles and not all ticks are necessarily the same length.

    >
    > Huh?!
    >
    > Do we talk past each other? What I call a cycle is a clock tick, in
    > which a single microinstruction can be executed. I think some people
    > call them ticks and reserve the word cycle for however long it takes to
    > execute a macroinstruction.


    Not all clock cycles are the same.

    Not all clock ticks are the same.

    >
    > As far as I know, all the VAX implementations were synchronous so all
    > clock ticks were equally long.


    AFAIK, you're wrong. There were stretched clock cycles on at least one VAX
    box; there was a VAX with two different lengths for its clock cycle, depending
    on what is going on in the i-stream. (And I'm not talking about the
    lower-performing microcode, that is a whole different discussion and a whole
    different matter.)

    There are also what amount to stretched (or runt, I've forgotten which) clock
    ticks on Alpha systems, too.

    Most folks assume that the VAX systems are all similar, and from a user-level
    environment that's true. That's not the case, however, from the system-level
    environment, as there are some very large implementation differences among the
    various VAX families.

    >>> They are also very useful for optimization purposes, particularly if
    >>> the microarchitecture is simple.

    >>
    >> I've never heard a VAX referred to as "simple," and a VAX
    >> microarchitecture is not "simple.".

    >
    > Compared to a modern CPU, they were.



    In some ways, and not in others.


    > I have reason to believe the microarchitectures were more complicated
    > than necessary. If you take the Pentium microarchitecture, strip out
    > one of the two pipelines and change it so it fits the VAX architecture
    > instead of the IA32 architecture (basically yank out one set of
    > idiosyncracies and stuff in another), then you should end up with
    > something both faster and simpler than what DEC used. And you should be
    > able to use that basic microarchitecture over a much wider range of
    > implementations, even TTL.


    IA-32 is not what I would have implemented. (IA-32e is closer.)

    As far as building your own microcode "real" VAX (as I might infer the goal
    to be here), do have fun with that. The other obvious approach to building your
    own ("real") VAX is via an FPGA VAX, of course.

    There are certainly ways to greatly simplify the VAX implementations and the
    architectures -- that said, the installed base tended to preclude that sort of
    thing. If you're going to break compatibility (a little), you might as well
    break it (a lot).

    Bootstrapping even OpenVMS VAX is only testing a very small part of the VAX
    architecture, for instance. Applications tended to depend on far more of the
    architecture, some more so than others.


    > Yes, I read some articles on that. People writing Prolog accelerators,
    > for example. It sounds like ego stroking for clever little engineers to
    > me


    Thirty years ago, folks usually wrote assembler or microcode because they
    needed to, and they needed the extra performance for various tasks. (I was
    consulting for a place that was accepting gobs of data off communications links,
    and they were running the SBI and the VAX-11/780 processor and the comm-boards
    flat-out -- the speeds and feeds are nothing now, but were right at the SBI
    bandwidth back then. WCS was one of the few ways available toward additional
    performance, and was the choice at this site prior to the availability of the
    VAX-11/785. I expect they continued to use WCS after the inevitable series of
    hardware upgrades, too.)

    There are still applications that are processor-limited, but there are
    relatively fewer of them -- various other considerations have moved to the
    forefront at many sites.


    As for ego, IIRC, Hydrogen, stupidity and ego are universal constants.



  11. Re: VAX floating-point instruction timing?

    On Wed, 6 Sep 2006, Hoff Hoffman wrote:

    > Not all clock cycles are the same.
    >
    > Not all clock ticks are the same.


    So what do you mean by each of the words "cycle" and "tick" -- just to
    make sure I don't misunderstand anything?

    >> As far as I know, all the VAX implementations were synchronous so all
    >> clock ticks were equally long.

    >
    > AFAIK, you're wrong. There were stretched clock cycles on at least one VAX
    > box; there was a VAX with two different lengths for its clock cycle,
    > depending on what is going on in the i-stream. (And I'm not talking about
    > the lower-performing microcode, that is a whole different discussion and a
    > whole different matter.)
    >
    > There are also what amount to stretched (or runt, I've forgotten which)
    > clock ticks on Alpha systems, too.


    Are you talking about timeborrowing? That is, where some pipe stage
    actually requires a bit more than a cycle and a neighbouring stage
    requires a bit less and they play a bit with the clock arrival times to
    make it work?

    Or do you actually mean that the cycle length of a clock to the same pipe
    stage may differ from one cycle to the next?

    If the latter, then I'm very sure you are wrong about the Alpha.

    > Most folks assume that the VAX systems are all similar, and from a


    I know they aren't.

    Some microcoded everything, some thankfully used traps and emulation in
    macrocode. Some had PDP-11 compatibility, others didn't. Some had vector
    instructions, most thankfully didn't. Some had virtualization support,
    some didn't. Many different buses were used. Some had a PDP-11 as a
    frontend processor, some in the form of J-11 single-chip CPU. Some
    trapped to "BIOS" code in ROM for the console stuff. Some used TTL chips,
    some used ECL macro arrays in various technologies, some used custom ECL
    chips (VAX 9000 went really crazy there). Some used custom CMOS on
    anywhere from a handful of chips to just one.

    Some microcoded the operand decoding and the operation executing in the
    same state machine.

    Others had a semi-independent state machine for instruction fetch, operand
    decode and operand fetch/store/address computation.

    The number of internal registers available to the microcode varied.
    How many ports they had varied.

    Some had single-bit shifters, some hard funnel shifters. Some had to
    implement floating-point on top of a single-bit shifter and an adder,
    others had dedicated chips with booth multipliers using redundant
    representation.

    Oh, and write buffers. Took DEC a real long time to put a
    decent number of write buffers into their machines, which seems odd given how
    many writes in a row you need to handle exceptions and CALLS/CALLG.

    Etc.

    >>> I've never heard a VAX referred to as "simple," and a VAX
    >>> microarchitecture is not "simple.".

    >>
    >> Compared to a modern CPU, they were.

    >
    >
    > In some ways, and not in others.


    Actually, in all ways.

    > As far as building your own microcode "real" VAX (as I might infer the goal
    > to be here), do have fun with that. The other obvious approach to building
    > your own ("real") VAX is via an FPGA VAX, of course.


    I know, but that wouldn't be quite so fun

    I don't expect it to take all that much space or require all that many
    chips. I can use SRAMs for the registers, for the register renamer, for
    flag generation (as a PLA replacement), for the microopstore (that keeps
    mostly linear sequences of microops for complicated operand specifiers,
    complicated instructions, boot, and the
    trap/exception/fault/interrupt/machine check stuff).

    I expect to use an 8051 to write the right content into the SRAMs during
    boot, either from a set of (E?)EPROMs or downloaded from a PC.

    (I am also writing a Verilog version of the microarchitecture, to make
    unit testing easier.)

    > There are certainly ways to greatly simplify the VAX implementations and
    > the architectures -- that said, the installed base tended to preclude that
    > sort of thing. If you're going to break compatibility (a little), you might
    > as well break it (a lot).


    I am not even /considering/ changing the architecture. I want it to be
    completely compatible. Yeah, I know, they weren't completely compatible
    with each other but they all implemented the architecture modulo various
    subsets and the inevitable bug that would creep in. It's going to look
    most like a CVAX:

    o Console stuff in ROM (and I'm not going to copy the actual VAX monitors
    - as long as it can do character I/O then I'm happy)
    o no virtualization support
    o no vector support
    o 30-bit physical address space
    o no PDP-11 compatibility mode
    o little to no multicpu support
    o strong memory ordering
    o cache coherence, both internally, and with I/O
    o single-level TLBs, shared between code and data -- direct-mapped like
    the 730, 8800, uVAX I, and the VAX 9000.

    Multiplication/division and floating-point are probably going to be
    implemented last. And they are going to be slooow because my shifter is
    only going to handle one bit at a time -- but some of the VLSI
    implementations had the same problem.

    Exceptions should end up being lightning fast compared to the old
    implementations because I do not intend to have a "modification stack"
    that gets updated by autoincrement/autodecrement addressing opspecs and
    then have to be unwound by the exception-handling microcode. Instead, I
    intend to use register renaming in such a way that 1) every register
    modification goes to a new physical register and 2) there are sufficiently
    many more physical than architected registers that all the original
    register values are kept untouched by even the worst case autoupdating
    instruction. When the exception comes, the register renamer can then
    "just" rollback to before the first register write.

    The same register renamer should also allow a special lightning fast
    interrupt mode where the interrupt code gets to do whatever it likes with
    5-8 fresh registers it doesn't have to save first. Such an interrupt mode
    would have been very handy for the system-on-a-chip implementations (for
    any low-cost VAX, actually).

    It should also allow for any temporaries needed by the microops that an
    instruction breaks down to.

    If I get this far, then I might consider adding:
    o a bit in the page table so it is cheaper to distinguish between
    "read/executed", "dirty (written to)" and "not touched at all".
    o a 4K page size. Perhaps combined with a traditional page table tree,
    perhaps not.
    o a CPUID-like instruction with flags indicating what stuff is supported
    and what isn't. Having to look at model numbers and revision numbers
    for that isn't particularly clever.
    o IEEE floating-point
    o real sqrt, 1/x, 1/sqrt, sin/cos/tan instructions

    I might (probably will) implement page table snooping so REI doesn't have
    to flush the TLBs. If I add caches, they will be virtually indexed and
    physically tagged and I will have to implement some snooping stuff that
    automatically invalidates "mirrored" cache lines so no byte of memory can
    be in more than one cache line at a time.

    > Bootstrapping even OpenVMS VAX is only testing a very small part of the VAX
    > architecture, for instance. Applications tended to depend on far more of the
    > architecture, some more so than others.


    What I would really like to get my grubby hands on is AXE, the
    architectural verifier they used at DEC

    > Thirty years ago, folks usually wrote assembler or microcode because they
    > needed to, and they needed the extra performance for various tasks. (I was
    > consulting for a place that was accepting gobs of data off communications
    > links, and they were running the SBI and the VAX-11/780 processor and the
    > comm-boards flat-out -- the speeds and feeds are nothing now, but were right
    > at the SBI bandwidth back then. WCS was one of the few ways available toward
    > additional performance, and was the choice at this site prior to the
    > availability of the VAX-11/785. I expect they continued to use WCS after the
    > inevitable series of hardware upgrades, too.)


    Wasn't it a problem that the microcode was *hard* to port between
    generations? And that the programmers often implemented too much in
    microcode so the implementation took too long (so a new machine generation
    would arrive by the time they were done)?

    > There are still applications that are processor-limited, but there are
    > relatively fewer of them -- various other considerations have moved to the
    > forefront at many sites.


    I'd say gcc often is processor limited

    -Peter

  12. Re: VAX floating-point instruction timing?

    On Tue, 5 Sep 2006, Hoff Hoffman wrote:

    > The old black processor binders (black five-ring binders with a white grill
    > pattern) had the microcode listings for various of the VAX-11 series boxes.
    > And the WCS option had tools and such for one of the boxes.
    >
    > BitSavers () claims to have:
    >
    > dec/vax/AA-H306B_780uprogToolsMar82.pdf VAX-11/780 Microprogramming Tools
    > User's Guide (Mar 1982)


    Got it -- and the other 780 pdf's.

    I am apparently now in the possession of a complete set of schematics and
    a parts list for the VAX 11/780

    Apparently, the 780 had 96-bit microinstructions but there were other
    control ROMs here and there, among other things to decode the operand size
    and type. And the floating-point accelerator had 48-bit
    microinstructions.

    Oooh, and there are (very!) nice block diagrams

    Here's a quote from EK-11780-UG-001_780hwUG.pdf, p.1-11:

    1.3.5 Floating-Point Accelerator Option
    The floating-point accelerator is an optional high-speed
    processor extension. When included in the processor, the floating-point
    accelerator executes the addition, subtraction, multiplication, and
    division instructions that operate on single- and double-precision
    floating-point operands, including the special EMOD and POLY instructions
    in both single- and double-precision formats. Additionally, the
    floating-point accelerator enhances the performance of 32-bit integer
    multiply instructions.

    The processor does not have to include the floating-point accelerator to
    execute floating-point operand instructions. The floating-point
    accelerator can be added or removed without changing any existing
    software.

    When the floating-point accelerator is included in the processor, a
    floating-point operand register-to-register add instruction takes as
    little as 800 ns to execute. A register-to-register multiply instruction
    takes as little as 1 us. The inner loop of the POLY instruction takes
    approximately 1 us/degree of polynomial.

    [end quote]

    So now I have a very good idea of how fast the floating-point and
    multiplication stuff was on the VAX 11/780 (with FPA) and also of how it
    was implemented

    Haven't found the actual microcode yet, though. Maybe it'll turn up
    later.

    Thank you for the help!

    -Peter

  13. Re: VAX floating-point instruction timing?

    Peter "Firefly" Lund wrote:
    > On Wed, 6 Sep 2006, Hoff Hoffman wrote:
    >
    >> Not all clock cycles are the same.
    >>
    >> Not all clock ticks are the same.

    >
    > So what do you mean by each of the words "cycle" and "tick" -- just to
    > make sure I don't misunderstand anything?


    The system clock rotates forward as ticks arrive, and not all ticks are the
    same size on Alpha. On Alpha, the clock ticks arrive periodically, but there's
    potentially a weird-length clock "fix-up" tick.

    VAX processor cycles process parts of microcode, and not all cycles are the same.

    > Or do you actually mean that the cycle length of a clock to the same
    > pipe stage may differ from one cycle to the next?


    The cycle varied.

    The microword itself could ask for a longer cycle.

    320 vs 480ns, IIRC.

    Guess which VAX. :-)

    > If the latter, then I'm very sure you are wrong about the Alpha.


    AFAIK, Alpha isn't microcoded.

    There is PALcode, but that's not microcode.

    Alpha clock ticks vary.

    AFAIK, Alpha cycle times do not vary.


    >> Most folks assume that the VAX systems are all similar, and from a

    >
    > I know they aren't.
    >
    > Some microcoded everything, some thankfully used traps and emulation in
    > macrocode. Some had PDP-11 compatibility, others didn't.


    All had PDP-11 compatibility. Some had it in hardware. Most didn't.

    > Some had
    > vector instructions, most thankfully didn't. Some had virtualization
    > support, some didn't.


    I'm not aware of any VAX systems that shipped with virtualization support.

    > What I would really like to get my grubby hands on is AXE, the
    > architectural verifier they used at DEC


    The AXE and MAX architecture tools are not cleared for release. I know. I
    tried. (I know where the test suites are, and where the results are.)

    > Wasn't it a problem that the microcode was *hard* to port between
    > generations? And that the programmers often implemented too much in
    > microcode so the implementation took too long (so a new machine
    > generation would arrive by the time they were done)?


    And folks that tuned instruction paths also got into trouble, too.




  14. Re: VAX floating-point instruction timing?

    On Wed, 6 Sep 2006, Hoff Hoffman wrote:

    >> So what do you mean by each of the words "cycle" and "tick" -- just to
    >> make sure I don't misunderstand anything?

    >
    > The system clock rotates forward as ticks arrive, and not all ticks are the
    > same size on Alpha. On Alpha, the clock ticks arrive periodically, but
    > there's potentially a weird-length clock "fix-up" tick.
    >
    > VAX processor cycles process parts of microcode, and not all cycles are the
    > same.


    Ok, I give up. I don't understand what you are saying. Could you
    rephrase it using more current terminology, please?

    Or use an example but be /very/ explicit and detailed?

    >> Or do you actually mean that the cycle length of a clock to the same pipe
    >> stage may differ from one cycle to the next?

    >
    > The cycle varied.
    >
    > The microword itself could ask for a longer cycle.
    >
    > 320 vs 480ns, IIRC.
    >
    > Guess which VAX. :-)


    VAX-11/750?

    >
    >> If the latter, then I'm very sure you are wrong about the Alpha.

    >
    > AFAIK, Alpha isn't microcoded.
    >
    > There is PALcode, but that's not microcode.


    I actually knew that -- it is a bit harder to read than most Alpha machine
    code but it is not impossible.

    > Alpha clock ticks vary.


    And again I give up. Are you talking about time borrowing or not?

    >>> Most folks assume that the VAX systems are all similar, and from a

    >>
    >> I know they aren't.
    >>
    >> Some microcoded everything, some thankfully used traps and emulation in
    >> macrocode. Some had PDP-11 compatibility, others didn't.

    >
    > All had PDP-11 compatibility. Some had it in hardware. Most didn't.


    Ok, fine. Some had to use a pure software emulator, others didn't.
    In other words, some had microarchitectural/microcode support for it and
    others didn't.

    >> Some had vector instructions, most thankfully didn't. Some had
    >> virtualization support, some didn't.

    >
    > I'm not aware of any VAX systems that shipped with virtualization support.


    I'll have to check with the architecture reference (STD-032) but I'm
    pretty sure one or two got out.

    -Peter

  15. Re: VAX floating-point instruction timing?

    In article <44ff8f23@usenet01.boi.hp.com>, Hoff Hoffman writes:
    >
    > Guess which VAX. :-)
    >


    Well, there were microcode stalls in some of the VAX 11 series, but
    I think they were in terms of even microcode cycles.

    Nmemonics released an accelerator for the 11/750 that supposedly
    had the stalls removed, but some customers found some disk drives
    wouldn't work with it. They pulled the adds for a couple of
    months and the new adds came out with mention of a supported
    peripherals list.

    The 11/785 was rumored to be an 11/780 with the stalls removed,
    pieces like the massbus adapter had been worked on to handle this.

    And there were rumors that the 11/750 originally ran faster than the
    11/780 during development, but DEC added stalls just to make it
    slower. I think more likely DEC wanted those disks to work.

    So I would guess the 9000 which had a "RISC-like" processor in
    each CPU block might be the one with all the weirdest stuff.


  16. Re: VAX floating-point instruction timing?

    Peter "Firefly" Lund wrote:
    > On Wed, 6 Sep 2006, Hoff Hoffman wrote:
    >> VAX processor cycles process parts of microcode, and not all cycles
    >> are the same.

    >
    > Ok, I give up. I don't understand what you are saying. Could you
    > rephrase it using more current terminology, please?



    Microcode requires some number of cycles (usually one or more, but possibly
    less when there are some brains in the i-stream; quite obviously a typical
    microcode-based box will require more than one cycle for various operations, as
    otherwise you have something which looks like the EPIC VLIW "superscaler
    microcode engines") to interpret and perform a microcode-based instruction, but
    there are VAX systems where the duration of the cycles involved in this process
    -- not the numbers of cycles required, but the divisor off the crystal that is
    used to generate the length of a cycle -- can change. The cycle duration is
    individually variable in duration, and -- in the case of the particular VAX
    processor(s) involved -- in two sizes, and as determined by the microcode itself.

    There is a standard clock cycle, and -- when the microcode needs it -- there
    is a stretched clock cycle. More often than not, the particular VAX involved
    runs on the shorter of the two clock cycle durations. Obviously. But when the
    microcode needs it, the particular microengine can wait slightly longer for
    completion of the particular microinstruction.








  17. Re: VAX floating-point instruction timing?

    On Thu, 7 Sep 2006, Hoff Hoffman wrote:

    > Microcode requires some number of cycles (usually one or more, but possibly
    > less when there are some brains in the i-stream; quite obviously a typical
    > microcode-based box will require more than one cycle for various operations,
    > as otherwise you have something which looks like the EPIC VLIW "superscaler
    > microcode engines") to interpret and perform a microcode-based instruction,
    > but there are VAX systems where the duration of the cycles involved in this
    > process -- not the numbers of cycles required, but the divisor off the
    > crystal that is used to generate the length of a cycle -- can change. The
    > cycle duration is individually variable in duration, and -- in the case of
    > the particular VAX processor(s) involved -- in two sizes, and as determined
    > by the microcode itself.
    >
    > There is a standard clock cycle, and -- when the microcode needs it --
    > there is a stretched clock cycle. More often than not, the particular VAX
    > involved runs on the shorter of the two clock cycle durations. Obviously.
    > But when the microcode needs it, the particular microengine can wait slightly
    > longer for completion of the particular microinstruction.


    Thank you, that explanation worked for me

    It sounds like stalls would be simpler (but perhaps a bit slower) -- and
    then again, no I can see how they could also be more complicated depending
    on the microarchitecture.

    -Peter

  18. Re: VAX floating-point instruction timing?

    Bob Koehler wrote:


    > The 11/785 was rumored to be an 11/780 with the stalls removed,
    > pieces like the massbus adapter had been worked on to handle this.


    Nope. The 780 was implemented in standard TTL. The 785 was intended to
    be the same design, using Fast TTL to achieve a 50% gain in speed with
    negligible design effort.

    (It didn't work out as planned, because cranking up the clock speed
    exposed a number of signal paths that would not work at the higher
    clock rate due to crosstalk. IIRC, most of the CPU boards required
    re-layout to improve the signal paths; for sure, the FPU boards all
    needed tweaking before the 785 delivered the same results as the 780.)

    --
    Cheers, Bob

  19. Re: VAX floating-point instruction timing?

    In message
    Bob Willard wrote:

    > Bob Koehler wrote:
    >
    >
    >> The 11/785 was rumored to be an 11/780 with the stalls removed,
    >> pieces like the massbus adapter had been worked on to handle this.

    >
    > Nope. The 780 was implemented in standard TTL. The 785 was intended to
    > be the same design, using Fast TTL to achieve a 50% gain in speed with
    > negligible design effort.
    >
    > (It didn't work out as planned, because cranking up the clock speed
    > exposed a number of signal paths that would not work at the higher
    > clock rate due to crosstalk. IIRC, most of the CPU boards required
    > re-layout to improve the signal paths; for sure, the FPU boards all
    > needed tweaking before the 785 delivered the same results as the 780.)
    >


    Would this be related to a couple of engineers arriving to do an ECO
    on our 785, which consisted of some wire-wrapping on, I believe, the
    backplane?

    It seemed to work OK both before and after surgery.


    --
    Alan Adams, from Northamptonshire
    alan.adams@orchard-way.freeserve.co.uk
    http://www.nckc.org.uk/

  20. Re: VAX floating-point instruction timing?

    In article , Alan Adams writes:
    >
    > Would this be related to a couple of engineers arriving to do an ECO
    > on our 785, which consisted of some wire-wrapping on, I believe, the
    > backplane?
    >
    > It seemed to work OK both before and after surgery.


    I think it's related to upgrading some old 11/780 we had to a later
    rev, which required backplane changes, microcode changes, and later
    revs of the massbus adapter.

    In the process we found two 11/780 with broken backplanes and one
    with incompatable revs on various parts.

    I was surprised how low the price for a new backplane was.


+ Reply to Thread
Page 1 of 2 1 2 LastLast