Compiler error with Inline assembly - Linux

This is a discussion on Compiler error with Inline assembly - Linux ; Hi, I'm trying to implement AMD's suggestion for multiplying two 64-bit operands into a 128-bit result. This is the code they give (in the "Software Optimization Guide for AMD Hammer Processor"): ; 64bitalu_64x64 (int * a, int * b, int ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: Compiler error with Inline assembly

  1. Compiler error with Inline assembly

    Hi,

    I'm trying to implement AMD's suggestion for multiplying two
    64-bit operands into a 128-bit result.

    This is the code they give (in the "Software Optimization Guide
    for AMD Hammer Processor"):

    ; 64bitalu_64x64 (int * a, int * b, int *c)
    mov rax, [rcx]
    mul [rdx]
    mov [r8], rax
    mov [r8+8], rdx

    They put more stuff, since that's a procedure. What I'm trying
    to do is to "inline" this in a function mul64x64 (int * a, int * b,
    int *c)
    that returns with *c containing the result of *a times *b.

    I translated the syntax following instructions from some online
    tutorials, so here's what I have:

    void mul128 (int * a, int * b, int * c)
    {
    __asm__ (
    "mov (%rcx), %rax\n"
    "mul (%rdx)\n"
    "mov %rax, (%r8)\n"
    "mov %rdx, 0x8(%r8)\n");
    }

    I get the following error for the second line, "mul (%rdx)":

    Error: no instruction mnemonic suffix given and no register operands;
    can't size instruction

    I tried adding %rax and (%rax) before and after (all four
    combinations),
    but nothing works.

    Any ideas?

    Thanks,

    Carlos
    --

  2. Re: Compiler error with Inline assembly

    On Nov 10, 1:26*pm, Carlos Moreno wrote:

    > I'm trying to implement AMD's suggestion for multiplying two
    > 64-bit operands into a 128-bit result.
    >
    > This is the code they give (in the "Software Optimization Guide
    > for AMD Hammer Processor"):
    >
    > ; 64bitalu_64x64 (int * a, int * b, int *c)
    > mov rax, [rcx]
    > mul [rdx]
    > mov [r8], rax
    > mov [r8+8], rdx


    This code is messed up. In one place, it uses 'a', 'b', and 'c', and
    then later it uses 'rax', 'rdx', and 'rcx'. You need to fix the code
    to be consistent.

    > They put more stuff, since that's a procedure. *What I'm trying
    > to do is to "inline" this in a function mul64x64 (int * a, int * b,
    > int *c)
    > that returns with *c containing the result of *a times *b.
    >
    > I translated the syntax following instructions from some online
    > tutorials, so here's what I have:
    >
    > void mul128 (int * a, int * b, int * c)
    > {
    > * * __asm__ (
    > * * "mov (%rcx), %rax\n"
    > * * "mul (%rdx)\n"
    > * * "mov %rax, (%r8)\n"
    > * * "mov %rdx, 0x8(%r8)\n");
    >
    > }


    You forgot to use 'a' 'b' and 'c', like you promised and used 'rax',
    'rcx', and 'rdx' instead. What said 'a' was in 'rax'?

    > I get the following error for the second line, "mul (%rdx)":
    >
    > Error: no instruction mnemonic suffix given and no register operands;
    > can't size instruction
    >
    > I tried adding %rax and (%rax) before and after (all four
    > combinations),
    > but nothing works.
    >
    > Any ideas?


    You need to use proper GCC inline syntax. You need to map the input
    and output parameters to registers, and you need to tell GCC what can
    and cannot be considered valid across the inline assembly.

    http://www.ibiblio.org/gferg/ldp/GCC...bly-HOWTO.html

    DS

  3. Re: Compiler error with Inline assembly

    Carlos Moreno writes:

    > Hi,
    >
    > I'm trying to implement AMD's suggestion for multiplying two
    > 64-bit operands into a 128-bit result.
    >
    > This is the code they give (in the "Software Optimization Guide
    > for AMD Hammer Processor"):
    >
    > ; 64bitalu_64x64 (int * a, int * b, int *c)
    > mov rax, [rcx]
    > mul [rdx]
    > mov [r8], rax
    > mov [r8+8], rdx
    >
    > They put more stuff, since that's a procedure. What I'm trying
    > to do is to "inline" this in a function mul64x64 (int * a, int * b,
    > int *c)
    > that returns with *c containing the result of *a times *b.


    Why do you pass pointers to 'a' and 'b'? Furthermore, int is 32 bits,
    not 64. Use long for 64 bits (or use int64_t). To return 128 bits,
    you need to specify 'c' as a pointer to long/int64_t, and make sure it
    has room for two elements. I'd use a prototype like this:

    void mul128(int64_t a, int64_t b, int64_t c[2]);

    > I translated the syntax following instructions from some online
    > tutorials, so here's what I have:
    >
    > void mul128 (int * a, int * b, int * c)
    > {
    > __asm__ (
    > "mov (%rcx), %rax\n"
    > "mul (%rdx)\n"
    > "mov %rax, (%r8)\n"
    > "mov %rdx, 0x8(%r8)\n");
    > }
    >
    > I get the following error for the second line, "mul (%rdx)":
    >
    > Error: no instruction mnemonic suffix given and no register operands;
    > can't size instruction
    >
    > I tried adding %rax and (%rax) before and after (all four
    > combinations), but nothing works.


    You need to change mul to mulq for 64-bit multiplication (mull for
    32-bit). Your function also makes invalid assumptions about which
    registers hold the operands (if the function is inlined, they could be
    anywhere). The proper way is to let gcc tell you.

    The function should look something like this (untested):

    void mul128(int64_t a, int64_t b, int64_t c[2])
    {
    __asm__ (
    "imulq %1 \n\t"
    "mov %rax, (%2) \n\t"
    "mov %rdx, 0x8(%2) \n\t"
    : "a"(a), "g"(b) : "r"(c));
    }

    --
    Måns Rullgård
    mans@mansr.com

  4. Re: Compiler error with Inline assembly

    Måns Rullgård writes:

    > Carlos Moreno writes:
    >
    >> Hi,
    >>
    >> I'm trying to implement AMD's suggestion for multiplying two
    >> 64-bit operands into a 128-bit result.
    >>
    >> This is the code they give (in the "Software Optimization Guide
    >> for AMD Hammer Processor"):
    >>
    >> ; 64bitalu_64x64 (int * a, int * b, int *c)
    >> mov rax, [rcx]
    >> mul [rdx]
    >> mov [r8], rax
    >> mov [r8+8], rdx
    >>
    >> They put more stuff, since that's a procedure. What I'm trying
    >> to do is to "inline" this in a function mul64x64 (int * a, int * b,
    >> int *c)
    >> that returns with *c containing the result of *a times *b.

    >
    > Why do you pass pointers to 'a' and 'b'? Furthermore, int is 32 bits,
    > not 64. Use long for 64 bits (or use int64_t). To return 128 bits,
    > you need to specify 'c' as a pointer to long/int64_t, and make sure it
    > has room for two elements. I'd use a prototype like this:
    >
    > void mul128(int64_t a, int64_t b, int64_t c[2]);
    >
    >> I translated the syntax following instructions from some online
    >> tutorials, so here's what I have:
    >>
    >> void mul128 (int * a, int * b, int * c)
    >> {
    >> __asm__ (
    >> "mov (%rcx), %rax\n"
    >> "mul (%rdx)\n"
    >> "mov %rax, (%r8)\n"
    >> "mov %rdx, 0x8(%r8)\n");
    >> }
    >>
    >> I get the following error for the second line, "mul (%rdx)":
    >>
    >> Error: no instruction mnemonic suffix given and no register operands;
    >> can't size instruction
    >>
    >> I tried adding %rax and (%rax) before and after (all four
    >> combinations), but nothing works.

    >
    > You need to change mul to mulq for 64-bit multiplication (mull for
    > 32-bit). Your function also makes invalid assumptions about which
    > registers hold the operands (if the function is inlined, they could be
    > anywhere). The proper way is to let gcc tell you.
    >
    > The function should look something like this (untested):


    Sorry, it should be more like this:

    void mul128(int64_t a, int64_t b, int64_t c[2])
    {
    __asm__ (
    "imulq %1 \n\t"
    "mov %rax, (%2) \n\t"
    "mov %rdx, 0x8(%2) \n\t"
    :: "a"(a), "g"(b), "r"(c) : "memory");
    }

    --
    Måns Rullgård
    mans@mansr.com

  5. Re: Compiler error with Inline assembly

    Carlos Moreno wrote:

    > Hi,
    >
    > I'm trying to implement AMD's suggestion for multiplying two
    > 64-bit operands into a 128-bit result.


    GCC supports operations on 128 bit integers (at least on the AMD64
    architecture); for instance:

    #include
    typedef unsigned int uint128_t __attribute__((mode(TI)));

    uint128_t mul128(uint64_t a, uint64_t b)
    {
    return (uint128_t) a * b;
    }

    results in the following code:

    mul128:
    movq %rsi, %rax
    mulq %rdi
    ret

    --
    Huibert
    "The Commercial Channel! All commercials all the time.
    An eternity of useless products to rot your skeevy little mind, forever!"
    -- Mike the TV (Reboot)

  6. Re: Compiler error with Inline assembly

    On Nov 10, 5:06*pm, Måns Rullgård wrote:

    > > ; 64bitalu_64x64 (int * a, int * b, int *c)
    > > mov rax, [rcx]
    > > mul [rdx]
    > > mov [r8], rax
    > > mov [r8+8], rdx

    >
    > > They put more stuff, since that's a procedure. *What I'm trying
    > > to do is to "inline" this in a function mul64x64 (int * a, int * b,
    > > int *c)
    > > that returns with *c containing the result of *a times *b.

    >
    > Why do you pass pointers to 'a' and 'b'?


    Well, it wasn't me :-)

    That was more or less verbatim the example from AMD's document;
    they wrote it as i64alu64x64 (int * a, int * b, int *c)

    Which I assumed would be directly associated to the way a C
    compiler would treat the parameters --- IOW, I assumed that
    everything was already in place and that that little fragment
    of ASM code would deal directly with parameters passed by a
    C or C++ written with that same prototype.

    >*Furthermore, int is 32 bits,
    > not 64. *Use long for 64 bits


    Actually, long is *also* 32 bits!!

    Anyway yes, I was aware of that --- but again, I had assumed
    that since AMD's sample was written like that, this was already
    taken into account; in my test program, I was trying to test it
    like this:

    int a[2] = {0xBLAH, 0xBLAL}; // Hope you understand the pun!
    int b[2] = { ... };
    int c[4];

    mul128 (a, b, c);

    And was assuming that it would work.

    > The function should look something like this (untested):
    > [ ... ]


    I'm going to give it a try (well, the corrected version you sent
    in your follow-up message).

    Thanks!

    Carlos
    --

  7. Re: Compiler error with Inline assembly


    > > I'm trying to implement AMD's suggestion for multiplying two
    > > 64-bit operands into a 128-bit result.

    >
    > GCC supports operations on 128 bit integers (at least on the AMD64
    > architecture);


    Aaahhh, thank you!!!

    I was actually going to include this as a "corollary" to my question
    (whether there was some way to do that without having to resort
    to assembler), but then forgot to do it ... Again, thanks for
    bringing
    it up! However:

    > * #include
    > * typedef unsigned int uint128_t __attribute__((mode(TI)));


    Could you explain (or point me to the right reading) this line?
    (that is, I'm familiar with typedef, but not sure about the 128-bit
    issue, and pretty much the rest of the line, starting at the
    __attribute__ )

    Thanks again!

    Carlos
    --

  8. Re: Compiler error with Inline assembly

    Carlos Moreno writes:

    > On Nov 10, 5:06*pm, Måns Rullgård wrote:
    >
    >> > ; 64bitalu_64x64 (int * a, int * b, int *c)
    >> > mov rax, [rcx]
    >> > mul [rdx]
    >> > mov [r8], rax
    >> > mov [r8+8], rdx

    >>
    >> > They put more stuff, since that's a procedure. *What I'm trying
    >> > to do is to "inline" this in a function mul64x64 (int * a, int * b,
    >> > int *c)
    >> > that returns with *c containing the result of *a times *b.

    >>
    >> Why do you pass pointers to 'a' and 'b'?

    >
    > Well, it wasn't me :-)
    >
    > That was more or less verbatim the example from AMD's document;
    > they wrote it as i64alu64x64 (int * a, int * b, int *c)


    Never take examples as anything more than that, especially not
    examples in CPU manuals.

    > Which I assumed would be directly associated to the way a C
    > compiler would treat the parameters --- IOW, I assumed that
    > everything was already in place and that that little fragment
    > of ASM code would deal directly with parameters passed by a
    > C or C++ written with that same prototype.


    If inlined, there is no call, so the standard calling conventions
    don't apply.

    >>*Furthermore, int is 32 bits, not 64. *Use long for 64 bits

    >
    > Actually, long is *also* 32 bits!!


    On Linux it is, and you're posting in a Linux group.

    --
    Måns Rullgård
    mans@mansr.com

  9. Re: Compiler error with Inline assembly


    > >>*Furthermore, int is 32 bits, not 64. *Use long for 64 bits

    >
    > > Actually, long is *also* 32 bits!!

    >
    > On Linux it is, and you're posting in a Linux group.


    ???

    Let me clarify --- as I mentioned, I was aware that int is 32 bits
    (with gcc/g++, on either an Intel 32-bit or AMD 64-bit machine).

    You mentioned that I should use long for 64-bit, but I insist: with
    gcc/g++, on an AMD64 *with a 64-bit Linux distribution*, long
    *is still* 32-bits!! One has to use "long long" to get the 64-bit
    (or of course, as you pointed out, int64_t or uint64_t)

    Did I misunderstand what you were saying?

    Thanks,

    Carlos
    --

  10. Re: Compiler error with Inline assembly

    Carlos Moreno wrote:

    >
    >> >>Furthermore, int is 32 bits, not 64. Â*Use long for 64 bits

    >>
    >> > Actually, long is *also* 32 bits!!

    >>
    >> On Linux it is, and you're posting in a Linux group.

    >
    > ???
    >
    > Let me clarify --- as I mentioned, I was aware that int is 32 bits
    > (with gcc/g++, on either an Intel 32-bit or AMD 64-bit machine).
    >
    > You mentioned that I should use long for 64-bit, but I insist: with
    > gcc/g++, on an AMD64 *with a 64-bit Linux distribution*, long
    > *is still* 32-bits!! One has to use "long long" to get the 64-bit
    > (or of course, as you pointed out, int64_t or uint64_t)



    According to the x86-64 documentation a long is 8 bytes/64-bit.
    http://www.x86-64.org/documentation/abi.pdf

    You're probably not using the processor or the toolchain in 64-bit mode.

    Are you using the compat/IA-32 mode?

    Windows does things differently - they have an IL4P8 architecture, because
    of some design issues. Linux and BSD generally use I4LP8.


    --George

  11. Re: Compiler error with Inline assembly


    > > on an AMD64 *with a 64-bit Linux distribution*, long
    > > *is still* 32-bits!! * One has to use "long long" to get the 64-bit
    > > (or of course, as you pointed out, int64_t or uint64_t)

    >
    > According to the x86-64 documentation a long is 8 bytes/64-bit.http://www..x86-64.org/documentation/abi.pdf
    >
    > You're probably not using the processor or the toolchain in 64-bit mode.


    *Major* D'OH !!!! That was an extreme and complete brain-shutdown
    from
    my part --- *of course* long int is 64-bit !!!

    My confusion was that, when I first got my very first AMD64 processor
    (a few
    years ago), I was expecting *int* to be 64-bit... You know, the usual
    idea that
    the size of int is supposed to be the "native" processor register
    width, etc.
    With 16-bit processors, int used to be 16-bits; with 32-bit
    processor, it is
    32, so I was expecting int to be 64 bits (and possibly long int to be
    128 bits).
    Right now, my brain "mistranslated" that memory of my initial
    surprise....

    Oh well, sorry for the confusion and the noise! :-(

    Carlos
    --

  12. Re: Compiler error with Inline assembly

    Am Mon, 10 Nov 2008 15:18:04 -0800 schrieb Carlos Moreno:

    > mul [rdx]


    you also forgot th value

    mul %rdx,%rax or similar

  13. Re: Compiler error with Inline assembly

    Carlos Moreno wrote:

    >> typedef unsigned int uint128_t __attribute__((mode(TI)));

    >
    > Could you explain (or point me to the right reading) this line?
    > (that is, I'm familiar with typedef, but not sure about the 128-bit
    > issue, and pretty much the rest of the line, starting at the
    > __attribute__ )


    Attributes are the way GCC allows you to request features that are not
    covered by the standard. In this case it requests that type type
    "uint128_t" will be using "mode" "TI".

    Modes are what GCC uses to represent the sizes of the various types, in
    this case TI means that GCC will allocate 4 (tetra) words (or 16 bytes)
    of storage to the type.

    You can read more about attributes (and other extensions) in the GCC
    documentation:

    info '(gcc)C Extensions'

    (Of course, using extension ties you to the GCC compiler, but then
    again, so will inline assembly.)

    --
    Huibert
    "The Commercial Channel! All commercials all the time.
    An eternity of useless products to rot your skeevy little mind, forever!"
    -- Mike the TV (Reboot)

  14. Re: Compiler error with Inline assembly

    In message , Måns Rullgård
    wrote:


    > Sorry, it should be more like this:
    >
    > void mul128(int64_t a, int64_t b, int64_t c[2])
    > {
    > __asm__ (
    > "imulq %1 \n\t"
    > "mov %rax, (%2) \n\t"
    > "mov %rdx, 0x8(%2) \n\t"
    > :: "a"(a), "g"(b), "r"(c) : "memory");
    > }


    Still broken, it trashes %rax and %rdx.

    The memory clobber is overkill, if you specify memory output operands gcc
    will do the right thing. Better still leave the storing into the array to
    C/C++ code so that the compiler can optimise it away.

+ Reply to Thread