Re: [9fans] Plan 9 and multicores/parallelism/concurrency? - Plan9

This is a discussion on Re: [9fans] Plan 9 and multicores/parallelism/concurrency? - Plan9 ; On Mon, 2008-07-14 at 12:35 -0400, erik quanstrom wrote: > > > Plan 9 makes it easy via 9p, its file system/resource sharing > > > protocol. In plan 9, things like graphics and network drivers export a > > ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

  1. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

    On Mon, 2008-07-14 at 12:35 -0400, erik quanstrom wrote:
    > > > Plan 9 makes it easy via 9p, its file system/resource sharing
    > > > protocol. In plan 9, things like graphics and network drivers export a
    > > > 9p interface (a filetree). Furthermore, 9p is network transparent
    > > > which means accesses to remote resources look exactly like accesses to
    > > > local resources, and this is the main trick - processes do not care
    > > > whether the file they are interested in is being served by the kernel,
    > > > a userspace process, or a machine half way across the world.

    > >
    > > All very true. And it sure does provide enormous benefits on distributed
    > > memory architectures. But do you know of any part that would be
    > > beneficial for highly-SMP systems?

    >
    > do you have some reason to believe that 9p (or just read and write)
    > is not effective on such a machine?


    I have some (not a whole lot, since I haven't looked at source code
    for a while) reason to believe that the current 9P implementation
    doesn't seem to exploit the opportunity when both ends happen to run
    on the same shared memory. I would love to be proved wrong. Although,
    the higher level issue that I have with 9P on a shared memory
    architectures is the fact that file and communication abstractions
    might not be the best way to represent the shared memory resources
    to begin with. IOW, mmap()-like things might be a closer match.

    > since scheduling would be the main shared resource, do you think
    > it would be the limiting factor?


    Yes. And that's where the comment in my first email came from:
    scheduling is a tricky thing on a shared memory NUMA-like systems.
    Solaris's scheduler is not shy when it comes to big iron (100+ CPU SMP
    boxes) but even it had to be heavily tuned when a Batoka box first
    came to the labs. When you have physcical threads (CPUs), virtual
    threads and a non trivial memory hierarchy -- the decision of what
    is the best place (hardware-wise) for a give thread to run becomes
    a non-trivial one. Kernels that can track affinity properly rule
    the day. I don't think that Plan9 scheduler has had an
    opportunity to be tuned for such an environment. Same goes for
    virtual memory page related algorithms.

    Here's a decent (albeit brief) overview of what kernel has to
    face these days in order to be reasonably savvy on shared memory,
    multicore architectures with NUMA-like memory hierarchy:
    http://www.redhat.com/promo/summit/2...Hot_Topics.pdf
    Start from slide #13.

    Thanks,
    Roman.



  2. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

    On Mon, Jul 14, 2008 at 4:33 PM, Roman V. Shaposhnik wrote:
    > the day. I don't think that Plan9 scheduler has had an
    > opportunity to be tuned for such an environment. Same goes for
    > virtual memory page related algorithms.


    The scheduling code does have a heuristic for processor affinity, so
    there's a model for what to tune when you have the MSMP machine to
    play with.

    --Joel


  3. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

    On Mon, 14 Jul 2008 13:33:01 PDT "Roman V. Shaposhnik" wrote:
    > Solaris's scheduler is not shy when it comes to big iron (100+ CPU SMP
    > boxes) but even it had to be heavily tuned when a Batoka box first
    > came to the labs. When you have physcical threads (CPUs), virtual
    > threads and a non trivial memory hierarchy -- the decision of what
    > is the best place (hardware-wise) for a give thread to run becomes
    > a non-trivial one. Kernels that can track affinity properly rule
    > the day.


    I suspect a lot of this complexity will end up being dropped
    when you don't have to worry about efficiently using the last
    N% of cpu cycles. When your bottleneck is memory bandwidth
    using core 100% is not going to happen in general. And I am
    not sure thread placement belongs in the kernel. Why not let
    an application manage its allocation of h/w thread x cycle
    resources? I am not even sure a full kernel belongs on every
    core.

    Unlike you I think the kernel should do even less as more and
    more cores are added. It should basically stay out of the
    way. Less government, more privatization :-) So may be
    the plan9 kernel would a better starting point than a Unix
    kernel.


  4. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?


    On 15-Jul-08, at 1:01 AM, Bakul Shah wrote:
    >
    > I suspect a lot of this complexity will end up being dropped
    > when you don't have to worry about efficiently using the last
    > N% of cpu cycles.


    Would that I weren't working on a multi-core graphics part... That N%
    is what the game is all about.

    > When your bottleneck is memory bandwidth
    > using core 100% is not going to happen in general.


    But in most cases, that memory movement has to share the bus with
    increasingly remote cache accesses, which in turn take bandwidth.
    Affinity is a serious win for reducing on-chip bandwidth usage in
    cache-coherent many-core systems.

    > And I am
    > not sure thread placement belongs in the kernel. Why not let
    > an application manage its allocation of h/w thread x cycle
    > resources? I am not even sure a full kernel belongs on every
    > core.


    I'm still looking for the right scheduler, in kernel or user space,
    that lets me deal with affinitizing 3 resources that run at different
    granularities: per-core cache, hardware-thread-to-core, and cross-chip
    caches. There's a rough hierarchy implied by these three resources,
    and perfect scheduling might be possible in a purely cooperative
    world, but reality imposes pre-emption and resource virtualization.

    > Unlike you I think the kernel should do even less as more and
    > more cores are added. It should basically stay out of the
    > way. Less government, more privatization :-) So may be
    > the plan9 kernel would a better starting point than a Unix
    > kernel.


    Agreed, less and less in the kernel, but *enough*. I like resource
    virtualization, and as long as it gets affinity right, I win.

    Paul




  5. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

    On Tue, 15 Jul 2008 10:50:46 PDT Paul Lalonde wrote:
    >
    > On 15-Jul-08, at 1:01 AM, Bakul Shah wrote:
    > >
    > > I suspect a lot of this complexity will end up being dropped
    > > when you don't have to worry about efficiently using the last
    > > N% of cpu cycles.

    >
    > Would that I weren't working on a multi-core graphics part... That N%
    > is what the game is all about.


    I was really wondering about what might happen when there are
    100s of cores per die. My reasoning was that more and more
    cores can be (and will be) put on a die but a corresponding
    increase in off chip memory bandwidth will not be possible so
    at some point memory bottleneck will prevent 100% use of
    cores even if you assume ideal placement of threads and no
    thread movement to a different core.

    > > When your bottleneck is memory bandwidth
    > > using core 100% is not going to happen in general.

    >
    > But in most cases, that memory movement has to share the bus with
    > increasingly remote cache accesses, which in turn take bandwidth.
    > Affinity is a serious win for reducing on-chip bandwidth usage in
    > cache-coherent many-core systems.


    I was certainly not suggesting moving threads around. I was
    speculating that as the number of cores goes up perhaps the
    kernel is not the right place to do affinity scheduling or
    much any sophisticated scheduling.

    > > And I am
    > > not sure thread placement belongs in the kernel. Why not let
    > > an application manage its allocation of h/w thread x cycle
    > > resources? I am not even sure a full kernel belongs on every
    > > core.

    >
    > I'm still looking for the right scheduler, in kernel or user space,
    > that lets me deal with affinitizing 3 resources that run at different
    > granularities: per-core cache, hardware-thread-to-core, and cross-chip
    > caches. There's a rough hierarchy implied by these three resources,
    > and perfect scheduling might be possible in a purely cooperative
    > world, but reality imposes pre-emption and resource virtualization.


    Some friends of mine are able to sqeeze a lot of parallelism
    out supposedly hard to parallelize code. But this is in a
    purely cooperative worlds where they assume threads don't
    move and where machines are dedicated to specific tasks.

    > > Unlike you I think the kernel should do even less as more and
    > > more cores are added. It should basically stay out of the
    > > way. Less government, more privatization :-) So may be
    > > the plan9 kernel would a better starting point than a Unix
    > > kernel.

    >
    > Agreed, less and less in the kernel, but *enough*. I like resource
    > virtualization, and as long as it gets affinity right, I win.



  6. Re: [9fans] Plan 9 and multicores/parallelism/concurrency?

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1


    On Jul 17, 2008, at 12:29 PM, Bakul Shah wrote:
    > My reasoning was that more and more
    > cores can be (and will be) put on a die but a corresponding
    > increase in off chip memory bandwidth will not be possible so
    > at some point memory bottleneck will prevent 100% use of
    > cores even if you assume ideal placement of threads and no
    > thread movement to a different core.


    As the number of cores increases you have to hugely increase the
    amount of cache - you need cache enough for a large enough working
    set to keep a core busy during the long wait for its next slice of
    bandwidth (figurative slice - the multiplexing clearly should finer
    grained). Latency hiding on those fetches is critically important.

    >
    > I was certainly not suggesting moving threads around. I was
    > speculating that as the number of cores goes up perhaps the
    > kernel is not the right place to do affinity scheduling or
    > much any sophisticated scheduling.


    Largely agreed. The real tension is in virtualizing the resources,
    which beats against affinity. Affinity is clearly an early loser in
    oversubscribed situations, but it would be a major win to have a
    scheduler (in or out of kernel) that could degrade intelligently in
    the face of oversubscription, instead of the hard wall you get when
    you throw away affinity.

    > Some friends of mine are able to sqeeze a lot of parallelism
    > out supposedly hard to parallelize code. But this is in a
    > purely cooperative worlds where they assume threads don't
    > move and where machines are dedicated to specific tasks.


    Envy.

    The other part not to forget about is data-parallel. At least in
    graphics we get to recast most of our heavy loads to data-parallel,
    which has huge benefits. If you can manage data-parallel with a nice
    task DAG and decent load-balancing you can do wonders at keeping data
    on-chip while pushing lots of flops.

    Paul, patiently awaiting hardware announcements so he can talk freely.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.3 (Darwin)

    iD8DBQFIgA6hpJeHo/Fbu1wRAvZUAJ0WxfsfPHZJSclLwhgLj8ibkdgDiwCgx80y
    7WT72MW7TsELUwi7jSATr/8=
    =5nHw
    -----END PGP SIGNATURE-----


+ Reply to Thread