L1,L2 caches and MMU - Linux

This is a discussion on L1,L2 caches and MMU - Linux ; Nick Maclaren wrote: > In article , > Joe Pfeiffer writes: > |> > |> Associative main memory is sort of outside the mainline at this point... > > It has existed, and there are reasons to believe that it ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 30 of 30

Thread: L1,L2 caches and MMU

  1. Re: L1,L2 caches and MMU

    Nick Maclaren wrote:
    > In article <1bac347hx5.fsf@viper.cs.nmsu.edu>,
    > Joe Pfeiffer writes:
    > |>
    > |> Associative main memory is sort of outside the mainline at this point...
    >
    > It has existed, and there are reasons to believe that it should be
    > 'rediscovered', but I don't expect to see it back again. I don't
    > see that it would help with the 'memory wall' without changes in
    > programming paradigms, either. But you have to admit that it does
    > meet the criterion of being neither real nor virtual memory :-)


    "Associative Memory: Perfectly suited to speed up your Object-Oriented DB!"

    Terje

    --
    -
    "almost all programming can be viewed as an exercise in caching"

  2. Re: L1,L2 caches and MMU


    In article <92d424-9p1.ln1@osl016lin.hda.hydro.com>,
    Terje Mathisen writes:
    |>
    |> "Associative Memory: Perfectly suited to speed up your Object-Oriented DB!"

    Nothing wrong with that. Vector systems were perfectly suited to speed
    up linear algebra. But we come back to viability.

    As I have said before, I would like to see if moving associative lookup
    into application space would fly. Technically, I think that it would,
    but I doubt that it would get past the politics.


    Regards,
    Nick Maclaren.

  3. Re: L1,L2 caches and MMU

    Nick Maclaren wrote:
    > As I have said before, I would like to see if moving associative lookup
    > into application space would fly. Technically, I think that it would,
    > but I doubt that it would get past the politics.


    Associative memory would in fact be a good fit for perl programmers, who
    are very comfortable with associative arrays (mis-)used for pretty much
    everything. :-)

    Terje

    --
    -
    "almost all programming can be viewed as an exercise in caching"

  4. Re: L1,L2 caches and MMU

    On Tue, 07 Nov 2006 20:57:05 +0100, Terje Mathisen wrote:

    > Nick Maclaren wrote:
    >> As I have said before, I would like to see if moving associative lookup
    >> into application space would fly. Technically, I think that it would,
    >> but I doubt that it would get past the politics.

    >
    > Associative memory would in fact be a good fit for perl programmers, who
    > are very comfortable with associative arrays (mis-)used for pretty much
    > everything. :-)


    Or LUA, where dictionaries (associative arrays) are the only data
    structure.

    However, I have to wonder whether hardware "associative memory" would
    really be much faster than a good hash or tree based lookup. It seems as
    though it would be just as likely to make everything else slower and more
    expensive. Putting something like a crypto-style hash algorithm into an
    instruction might work just as well, or better, and be useful for other
    things to boot (like actual crypto work).

    Cheers,

    --
    Andrew


  5. Re: L1,L2 caches and MMU

    Andrew Reilly wrote:
    > On Tue, 07 Nov 2006 20:57:05 +0100, Terje Mathisen wrote:
    >
    >> Nick Maclaren wrote:
    >>> As I have said before, I would like to see if moving associative lookup
    >>> into application space would fly. Technically, I think that it would,
    >>> but I doubt that it would get past the politics.

    >> Associative memory would in fact be a good fit for perl programmers, who
    >> are very comfortable with associative arrays (mis-)used for pretty much
    >> everything. :-)

    >
    > Or LUA, where dictionaries (associative arrays) are the only data
    > structure.
    >
    > However, I have to wonder whether hardware "associative memory" would
    > really be much faster than a good hash or tree based lookup. It seems as
    > though it would be just as likely to make everything else slower and more
    > expensive. Putting something like a crypto-style hash algorithm into an
    > instruction might work just as well, or better, and be useful for other
    > things to boot (like actual crypto work).


    Right.

    If you can do the hashing in time comparable to the cost of a
    (cache-missing) memory access, then the overhead isn't bad at all, and
    you still have _much_ cheaper ram for all your other needs.

    Terje

    --
    -
    "almost all programming can be viewed as an exercise in caching"

  6. Re: L1,L2 caches and MMU


    In article ,
    Terje Mathisen writes:
    |> Andrew Reilly wrote:
    |> >
    |> > However, I have to wonder whether hardware "associative memory" would
    |> > really be much faster than a good hash or tree based lookup. It seems as
    |> > though it would be just as likely to make everything else slower and more
    |> > expensive. Putting something like a crypto-style hash algorithm into an
    |> > instruction might work just as well, or better, and be useful for other
    |> > things to boot (like actual crypto work).
    |>
    |> Right.
    |>
    |> If you can do the hashing in time comparable to the cost of a
    |> (cache-missing) memory access, then the overhead isn't bad at all, and
    |> you still have _much_ cheaper ram for all your other needs.

    Yes. However, that is not what I favour, which is support for an
    associative indirection operation. In THAT case, the correct comparison
    is to whether your hashed lookup in software can be brought down to the
    cost of a L3 cache HIT access. And I'll bet that you can't do it :-)

    My estimate is that it could reduce the cost of such accesses (including
    sparse matrix handling) by a factor of about 10, but it is also the key
    component needed to abolish TLB misses. The latter would not usually
    be a massive performance gain, but - oh! - what a simplification!


    Regards,
    Nick Maclaren.

  7. Re: L1,L2 caches and MMU

    Nick Maclaren wrote:
    > In article ,
    > Terje Mathisen writes:
    > |> Andrew Reilly wrote:
    > |> >
    > |> > However, I have to wonder whether hardware "associative memory" would
    > |> > really be much faster than a good hash or tree based lookup. It seems as
    > |> > though it would be just as likely to make everything else slower and more
    > |> > expensive. Putting something like a crypto-style hash algorithm into an
    > |> > instruction might work just as well, or better, and be useful for other
    > |> > things to boot (like actual crypto work).
    > |>
    > |> Right.
    > |>
    > |> If you can do the hashing in time comparable to the cost of a
    > |> (cache-missing) memory access, then the overhead isn't bad at all, and
    > |> you still have _much_ cheaper ram for all your other needs.
    >
    > Yes. However, that is not what I favour, which is support for an
    > associative indirection operation.



    Perhaps its a language or terminology thing, but I don't understand what
    you are asking for here. Can you give more specifics of what you are
    suggesting? What syntax to invoke the operation, what the operation
    does, etc. Thanks

    --
    - Stephen Fuld
    (e-mail address disguised to prevent Spam)

  8. Re: L1,L2 caches and MMU


    In article ,
    Stephen Fuld writes:
    |>
    |> Perhaps its a language or terminology thing, but I don't understand what
    |> you are asking for here. Can you give more specifics of what you are
    |> suggesting? What syntax to invoke the operation, what the operation
    |> does, etc. Thanks

    Consider the following:

    Each process has a hardware lookup table, which is initialised by calling
    a special (unprivileged) instruction that sets the address of a callback
    routine, which is called on misses (within the process).

    This is used by an instruction that takes an 8-byte value and returns an
    8-byte value from the lookup table. Just that. If there is no such
    value, the callback routine is called and the instruction restarted
    (or, if the callback says "no", 0 is returned).

    The callback routine enters new translations by passing a pair of 8-byte
    values and a (say) 2-byte priority. This replaces the value of lowest
    priority below its priority or, failing that, the least recently used
    value of its priority. Or some similar algorithm.

    The specification says that it is unspecified when entries are dropped
    from the hardware lookup table, so the callback routine must be usable
    for reloading at any time. Also CPUs may have different sizes of lookup
    table.

    Whatever. There are numerous variations that could be argued to be
    better. The point is that this provides a cheap TLB-style lookup,
    needing no hardware that isn't well understood and not causing any
    problem for security, threading etc.


    Regards,
    Nick Maclaren.

  9. Re: L1,L2 caches and MMU


    Terje Mathisen wrote:
    > If you can do the hashing in time comparable to the cost of a
    > (cache-missing) memory access, then the overhead isn't bad at all, and
    > you still have _much_ cheaper ram for all your other needs.


    I'd be curious about not the cost in time, but the cost in power.
    Memory subsystems already chew up a lot of power (FB DIMMs especially),
    and its not clear to me whether the heat/power costs incurred will
    scale up with memory size, or are fixed WRT memory.

    DK


  10. Re: L1,L2 caches and MMU

    xu_feng_xu@yahoo.com wrote:

    > Hi,
    >
    > Does L1 "memory cache" cache virtual addresses or physical addresses?
    > I.E is L1 cache located after the memory management unit (MMU) and the
    > TLB or before? What about L2 cache? which relative location gives
    > better performances? I am more interested on intel x86 processors.


    The most common arrangement is CPU - MMU - Caches - Memory
    So the cache works on physical addresses.

    However, on some architectures (most notably ARM), the 1st level
    cache existed historically before the MMU, so they have strapped
    the MMU on the memory facing side of the cache.

    In other words, the cache is caching logical addresses. This has
    fun implications when running a multitasking OS where all userspace
    applications start at the same logical address. The ARM-Linux
    implementation invalidates the cache when switching tasks.

    ARM have come up with a 'solution' to this. There is a mode
    which effectively replaces some of the logical address bits with
    a process ID. This effectively produces different logical
    addresses for the individual processes. The drawback is that
    because of the reassigned bits, processes are limited to (IIRC)
    32 MBytes.

    >
    > Another question regarding the TLB content on a context switch. Does
    > the system stores its entire content in memory and then loaded it back
    > once the process is re-activated? Or instead it simply invalidates its
    > content so the process addresses miss always on a TLB once it is
    > restarted


    The contents are not explicitly stored in memory before invalidation,
    as they already are there in the form of the page tables.

    Kind regards,

    Iwo


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2