[RFC] [PATCH 0/5 V2] Huge page backed user-space stacks - Kernel

This is a discussion on [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks - Kernel ; On Thursday 31 July 2008 21:27, Mel Gorman wrote: > On (31/07/08 16:26), Nick Piggin didst pronounce: > > I imagine it should be, unless you're using a CPU with seperate TLBs for > > small and huge pages, and ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 34 of 34

Thread: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

  1. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thursday 31 July 2008 21:27, Mel Gorman wrote:
    > On (31/07/08 16:26), Nick Piggin didst pronounce:


    > > I imagine it should be, unless you're using a CPU with seperate TLBs for
    > > small and huge pages, and your large data set is mapped with huge pages,
    > > in which case you might now introduce *new* TLB contention between the
    > > stack and the dataset

    >
    > Yes, this can happen particularly on older CPUs. For example, on my
    > crash-test laptop the Pentium III there reports
    >
    > TLB and cache info:
    > 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
    > 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries


    Oh? Newer CPUs tend to have unified TLBs?


    > > Also, interestingly I have actually seen some CPUs whos memory operations
    > > get significantly slower when operating on large pages than small (in the
    > > case when there is full TLB coverage for both sizes). This would make
    > > sense if the CPU only implements a fast L1 TLB for small pages.

    >
    > It's also possible there is a micro-TLB involved that only support small
    > pages.


    That is the case on a couple of contemporary CPUs I've tested with
    (although granted they are engineering samples, but I don't expect
    that to be the cause)


    > > So for the vast majority of workloads, where stacks are relatively small
    > > (or slowly changing), and relatively hot, I suspect this could easily
    > > have no benefit at best and slowdowns at worst.

    >
    > I wouldn't expect an application with small stacks to request its stack
    > to be backed by hugepages either. Ideally, it would be enabled because a
    > large enough number of DTLB misses were found to be in the stack
    > although catching this sort of data is tricky.


    Sure, as I said, I have nothing against this functionality just because
    it has the possibility to cause a regression. I was just pointing out
    there are a few possibilities there, so it will take a particular type
    of app to take advantage of it. Ie. it is not something you would ever
    just enable "just in case the stack starts thrashing the TLB".


    > > But I'm not saying that as a reason not to merge it -- this is no
    > > different from any other hugepage allocations and as usual they have to
    > > be used selectively where they help.... I just wonder exactly where huge
    > > stacks will help.

    >
    > Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks.
    > Computations that partition problems with recursion I would expect to
    > benefit as well as some JVMs that heavily use the stack (see how many docs
    > suggest setting ulimit -s unlimited). Bit out there, but stack-based
    > languages would stand to gain by this. The potential gap is for threaded
    > apps as there will be stacks that are not the "main" stack. Backing those
    > with hugepages depends on how they are allocated (malloc, it's easy,
    > MAP_ANONYMOUS not so much).


    Oh good, then there should be lots of possibilities to demonstrate it.

    Thanks,
    Nick
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (31/07/08 21:51), Nick Piggin didst pronounce:
    > On Thursday 31 July 2008 21:27, Mel Gorman wrote:
    > > On (31/07/08 16:26), Nick Piggin didst pronounce:

    >
    > > > I imagine it should be, unless you're using a CPU with seperate TLBs for
    > > > small and huge pages, and your large data set is mapped with huge pages,
    > > > in which case you might now introduce *new* TLB contention between the
    > > > stack and the dataset

    > >
    > > Yes, this can happen particularly on older CPUs. For example, on my
    > > crash-test laptop the Pentium III there reports
    > >
    > > TLB and cache info:
    > > 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
    > > 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries

    >
    > Oh? Newer CPUs tend to have unified TLBs?
    >


    I've seen more unified DTLBs (ITLB tends to be split) than not but it could
    just be where I'm looking. For example, on the machine I'm writing this
    (Core Duo), it's

    TLB and cache info:
    51: Instruction TLB: 4KB and 2MB or 4MB pages, 128 entries
    5b: Data TLB: 4KB and 4MB pages, 64 entries

    DTLB is unified there but on my T60p laptop where I guess they want the CPU
    to be using less power and be cheaper, it's

    TLB info
    Instruction TLB: 4K pages, 4-way associative, 128 entries.
    Instruction TLB: 4MB pages, fully associative, 2 entries
    Data TLB: 4K pages, 4-way associative, 128 entries.
    Data TLB: 4MB pages, 4-way associative, 8 entries

    So I would expect huge pages to be slower there than in other cases.
    On one Xeon, I see 32 entries for huge pages and 256 for small pages so
    it's not straight-forward to predict. On another Xeon, I see the DLB is 64
    entries unified.

    To make all this more complex, huge pages can be a win because less L2 cache
    is consumed on page table information. The gains are due to fewer access to
    main memory and less to do with TLB misses. So lets say we do have a TLB
    that is set-associative with very few large page entries, it could still
    end up winning because the increased usage of L2 offset the increased TLB
    misses. Predicting when huge pages are a win and when they are a loss is
    just not particularly straight-forward.

    >
    > > > Also, interestingly I have actually seen some CPUs whos memory operations
    > > > get significantly slower when operating on large pages than small (in the
    > > > case when there is full TLB coverage for both sizes). This would make
    > > > sense if the CPU only implements a fast L1 TLB for small pages.

    > >
    > > It's also possible there is a micro-TLB involved that only support small
    > > pages.

    >
    > That is the case on a couple of contemporary CPUs I've tested with
    > (although granted they are engineering samples, but I don't expect
    > that to be the cause)
    >


    I found it hard to determine if the CPU I was using at a uTLB or not. The
    manuals didn't cover the subject but it was a theory as to why large pages
    might be slower on a particular CPU. Whatever the reason, I'm ok
    admitting that large pages can be slower on smaller data sets and in
    other situations for whatever reason. It's not a major surprise.

    >
    > > > So for the vast majority of workloads, where stacks are relatively small
    > > > (or slowly changing), and relatively hot, I suspect this could easily
    > > > have no benefit at best and slowdowns at worst.

    > >
    > > I wouldn't expect an application with small stacks to request its stack
    > > to be backed by hugepages either. Ideally, it would be enabled because a
    > > large enough number of DTLB misses were found to be in the stack
    > > although catching this sort of data is tricky.

    >
    > Sure, as I said, I have nothing against this functionality just because
    > it has the possibility to cause a regression. I was just pointing out
    > there are a few possibilities there, so it will take a particular type
    > of app to take advantage of it. Ie. it is not something you would ever
    > just enable "just in case the stack starts thrashing the TLB".
    >


    No, it's something you'd enable because you know your app is using a lot
    of stack. If you are lazy, you might do a test run of the app with it
    enabled for the sake of curiousity and take the option that's faster

    >
    > > > But I'm not saying that as a reason not to merge it -- this is no
    > > > different from any other hugepage allocations and as usual they have to
    > > > be used selectively where they help.... I just wonder exactly where huge
    > > > stacks will help.

    > >
    > > Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks.
    > > Computations that partition problems with recursion I would expect to
    > > benefit as well as some JVMs that heavily use the stack (see how many docs
    > > suggest setting ulimit -s unlimited). Bit out there, but stack-based
    > > languages would stand to gain by this. The potential gap is for threaded
    > > apps as there will be stacks that are not the "main" stack. Backing those
    > > with hugepages depends on how they are allocated (malloc, it's easy,
    > > MAP_ANONYMOUS not so much).

    >
    > Oh good, then there should be lots of possibilities to demonstrate it.
    >


    There should

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thu, 2008-07-31 at 14:50 +0100, Mel Gorman wrote:
    > On (31/07/08 21:51), Nick Piggin didst pronounce:
    > > On Thursday 31 July 2008 21:27, Mel Gorman wrote:
    > > > On (31/07/08 16:26), Nick Piggin didst pronounce:

    > >
    > > > > I imagine it should be, unless you're using a CPU with seperate TLBs for
    > > > > small and huge pages, and your large data set is mapped with huge pages,
    > > > > in which case you might now introduce *new* TLB contention between the
    > > > > stack and the dataset
    > > >
    > > > Yes, this can happen particularly on older CPUs. For example, on my
    > > > crash-test laptop the Pentium III there reports
    > > >
    > > > TLB and cache info:
    > > > 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
    > > > 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries

    > >
    > > Oh? Newer CPUs tend to have unified TLBs?
    > >

    >
    > I've seen more unified DTLBs (ITLB tends to be split) than not but it could
    > just be where I'm looking. For example, on the machine I'm writing this
    > (Core Duo), it's
    >
    > TLB and cache info:
    > 51: Instruction TLB: 4KB and 2MB or 4MB pages, 128 entries
    > 5b: Data TLB: 4KB and 4MB pages, 64 entries
    >
    > DTLB is unified there but on my T60p laptop where I guess they want the CPU
    > to be using less power and be cheaper, it's
    >
    > TLB info
    > Instruction TLB: 4K pages, 4-way associative, 128 entries.
    > Instruction TLB: 4MB pages, fully associative, 2 entries
    > Data TLB: 4K pages, 4-way associative, 128 entries.
    > Data TLB: 4MB pages, 4-way associative, 8 entries


    Clearly I've been living under a rock, but I didn't know one could get
    such nicely formatted info.

    In case I'm not the only one, a bit of googling turned up "x86info",
    courtesy of davej - apt-get'able and presumably yum'able too.

    cheers

    --
    Michael Ellerman
    OzLabs, IBM Australia Development Lab

    wwweb: http://michael.ellerman.id.au
    phone: +61 2 6212 1183 (tie line 70 21183)

    We do not inherit the earth from our ancestors,
    we borrow it from our children. - S.M.A.R.T Person

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)

    iD8DBQBIkc0XdSjSd0sB4dIRAjiFAKC/PVum23IXYylNmt+uZqHg6DDT4wCdGQqp
    QYIRIcLptVPjBPD/Yma2Tqs=
    =HNGl
    -----END PGP SIGNATURE-----


  4. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thu, 2008-07-31 at 11:31 +0100, Mel Gorman wrote:
    > We are a lot more reliable than we were although exact quantification is
    > difficult because it's workload dependent. For a long time, I've been able
    > to test bits and pieces with hugepages by allocating the pool at the time
    > I needed it even after days of uptime. Previously this required a reboot.


    This is also a pretty big expansion of fs/hugetlb/ use outside of the
    filesystem itself. It is hacking the existing shared memory
    kernel-internal user to spit out effectively anonymous memory.

    Where do we draw the line where we stop using the filesystem for this?
    Other than the immediate code reuse, does it gain us anything?

    I have to think that actually refactoring the filesystem code and making
    it usable for really anonymous memory, then using *that* in these
    patches would be a lot more sane. Especially for someone that goes to
    look at it in a year.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (04/08/08 14:10), Dave Hansen didst pronounce:
    > On Thu, 2008-07-31 at 11:31 +0100, Mel Gorman wrote:
    > > We are a lot more reliable than we were although exact quantification is
    > > difficult because it's workload dependent. For a long time, I've been able
    > > to test bits and pieces with hugepages by allocating the pool at the time
    > > I needed it even after days of uptime. Previously this required a reboot.

    >
    > This is also a pretty big expansion of fs/hugetlb/ use outside of the
    > filesystem itself. It is hacking the existing shared memory
    > kernel-internal user to spit out effectively anonymous memory.
    >
    > Where do we draw the line where we stop using the filesystem for this?
    > Other than the immediate code reuse, does it gain us anything?
    >
    > I have to think that actually refactoring the filesystem code and making
    > it usable for really anonymous memory, then using *that* in these
    > patches would be a lot more sane. Especially for someone that goes to
    > look at it in a year.
    >


    See, that's great until you start dealing with MAP_SHARED|MAP_ANONYMOUS.
    To get that right between children, you end up something very fs-like
    when the child needs to fault in a page that is already populated by the
    parent. I strongly suspect we end up back at hugetlbfs backing it :/

    If you were going to do such a thing, you'd end up converting something
    like ramfs to hugetlbfs and sharing that.


    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Tue, 2008-08-05 at 12:11 +0100, Mel Gorman wrote:
    > See, that's great until you start dealing with MAP_SHARED|MAP_ANONYMOUS.
    > To get that right between children, you end up something very fs-like
    > when the child needs to fault in a page that is already populated by the
    > parent. I strongly suspect we end up back at hugetlbfs backing it :/


    Yeah, but the case I'm worried about is plain anonymous. We already
    have the fs to back SHARED|ANONYMOUS, and they're not really
    anonymous.

    This patch *really* needs anonymous pages, and it kinda shoehorns them
    in with the filesystem. Stacks aren't shared at all, so this is a
    perfect example of where we can forget the fs, right?

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (05/08/08 09:12), Dave Hansen didst pronounce:
    > On Tue, 2008-08-05 at 12:11 +0100, Mel Gorman wrote:
    > > See, that's great until you start dealing with MAP_SHARED|MAP_ANONYMOUS.
    > > To get that right between children, you end up something very fs-like
    > > when the child needs to fault in a page that is already populated by the
    > > parent. I strongly suspect we end up back at hugetlbfs backing it :/

    >
    > Yeah, but the case I'm worried about is plain anonymous. We already
    > have the fs to back SHARED|ANONYMOUS, and they're not really
    > anonymous.
    >
    > This patch *really* needs anonymous pages, and it kinda shoehorns them
    > in with the filesystem. Stacks aren't shared at all, so this is a
    > perfect example of where we can forget the fs, right?
    >


    Ok sure, you could do direct inserts for MAP_PRIVATE as conceptually it
    suits this patch. However, I don't see what you gain. By reusing hugetlbfs,
    we get things like proper reservations which we can do for MAP_PRIVATE these
    days. Again, we could call that sort of thing directly if the reservation
    layer was split out separate from hugetlbfs but I still don't see the gain
    for all that churn.

    What am I missing?

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Tue, 2008-08-05 at 17:28 +0100, Mel Gorman wrote:
    > Ok sure, you could do direct inserts for MAP_PRIVATE as conceptually it
    > suits this patch. However, I don't see what you gain. By reusing hugetlbfs,
    > we get things like proper reservations which we can do for MAP_PRIVATE these
    > days. Again, we could call that sort of thing directly if the reservation
    > layer was split out separate from hugetlbfs but I still don't see the gain
    > for all that churn.
    >
    > What am I missing?


    This is good for getting us incremental functionality. It is probably
    the smallest amount of code to get it functional.

    My concern is that we're going down a path that all large page usage
    should be through the one and only filesystem. Once we establish that
    dependency, it is going to be awfully hard to undo it; just think of all
    of the inherent behavior in hugetlbfs. So, we better be sure that the
    filesystem really is the way to go, especially if we're going to start
    having other areas of the kernel depend on it internally.

    That said, this particular patch doesn't appear *too* bound to hugetlb
    itself. But, some of its limitations *do* come from the filesystem,
    like its inability to handle VM_GROWS...

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (05/08/08 10:53), Dave Hansen didst pronounce:
    > On Tue, 2008-08-05 at 17:28 +0100, Mel Gorman wrote:
    > > Ok sure, you could do direct inserts for MAP_PRIVATE as conceptually it
    > > suits this patch. However, I don't see what you gain. By reusing hugetlbfs,
    > > we get things like proper reservations which we can do for MAP_PRIVATE these
    > > days. Again, we could call that sort of thing directly if the reservation
    > > layer was split out separate from hugetlbfs but I still don't see the gain
    > > for all that churn.
    > >
    > > What am I missing?

    >
    > This is good for getting us incremental functionality. It is probably
    > the smallest amount of code to get it functional.
    >


    I'm not keen on the idea of introducing another specialised path just for
    stacks. Testing coverage is tricky enough as it is and problems still slip
    through occasionally. Maybe going through hugetlbfs is less than ideal,
    but at least it is a shared path.

    > My concern is that we're going down a path that all large page usage
    > should be through the one and only filesystem. Once we establish that
    > dependency, it is going to be awfully hard to undo it;


    Not much harder than it is to write any alternative in the first place
    :/

    > just think of all
    > of the inherent behavior in hugetlbfs. So, we better be sure that the
    > filesystem really is the way to go, especially if we're going to start
    > having other areas of the kernel depend on it internally.
    >


    So far, it is working out as a decent model. It is able to track reservations
    and deal with the differences between SHARED and PRIVATE without massive
    difficulties. While we could add another specialised path to directly insert
    the pages into pagetables for private mappings, I find it hard to justify
    adding more test coverage problems. There might be minimal gains to be had
    in lock granularity but that's about it.

    > That said, this particular patch doesn't appear *too* bound to hugetlb
    > itself. But, some of its limitations *do* come from the filesystem,
    > like its inability to handle VM_GROWS...
    >


    The lack of VM_GROWSX is an issue, but on its own it does not justify
    the amount of churn necessary to support direct pagetable insertions for
    MAP_ANONYMOUS|MAP_PRIVATE. I think we'd need another case or two that would
    really benefit from direct insertions to pagetables instead of hugetlbfs so
    that the path would get adequately tested.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    Andrew Morton writes:

    > Do we expect that this change will be replicated in other
    > memory-intensive apps? (I do).


    The catch with 2MB pages on x86 is that x86 CPUs generally have
    much less 2MB TLB entries than 4K entries. So if you're unlucky
    and access a lot of mappings you might actually thrash more with
    them. That is why they are not necessarily an universal win.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Wed, 2008-08-06 at 10:02 +0100, Mel Gorman wrote:
    > > That said, this particular patch doesn't appear *too* bound to hugetlb
    > > itself. But, some of its limitations *do* come from the filesystem,
    > > like its inability to handle VM_GROWS...

    >
    > The lack of VM_GROWSX is an issue, but on its own it does not justify
    > the amount of churn necessary to support direct pagetable insertions for
    > MAP_ANONYMOUS|MAP_PRIVATE. I think we'd need another case or two that would
    > really benefit from direct insertions to pagetables instead of hugetlbfs so
    > that the path would get adequately tested.


    I'm jumping around here a bit, but I'm trying to get to the core of what
    my problem with these patches is. I'll see if I can close the loop
    here.

    The main thing this set of patches does that I care about is take an
    anonymous VMA and replace it with a hugetlb VMA. It does this on a
    special cue, but does it nonetheless.

    This patch has crossed a line in that it is really the first
    *replacement* of a normal VMA with a hugetlb VMA instead of the creation
    of the VMAs at the user's request. I'm really curious what the plan is
    to follow up on this. Will this stack stuff turn out to be one-off
    code, or is this *the* route for getting transparent large pages in the
    future?

    Because of the limitations like its inability to grow the VMA, I can't
    imagine that this would be a generic mechanism that we can use
    elsewhere.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (06/08/08 12:50), Dave Hansen didst pronounce:
    > On Wed, 2008-08-06 at 10:02 +0100, Mel Gorman wrote:
    > > > That said, this particular patch doesn't appear *too* bound to hugetlb
    > > > itself. But, some of its limitations *do* come from the filesystem,
    > > > like its inability to handle VM_GROWS...

    > >
    > > The lack of VM_GROWSX is an issue, but on its own it does not justify
    > > the amount of churn necessary to support direct pagetable insertions for
    > > MAP_ANONYMOUS|MAP_PRIVATE. I think we'd need another case or two that would
    > > really benefit from direct insertions to pagetables instead of hugetlbfs so
    > > that the path would get adequately tested.

    >
    > I'm jumping around here a bit, but I'm trying to get to the core of what
    > my problem with these patches is. I'll see if I can close the loop
    > here.
    >
    > The main thing this set of patches does that I care about is take an
    > anonymous VMA and replace it with a hugetlb VMA. It does this on a
    > special cue, but does it nonetheless.
    >


    This is not actually a new thing. For a long time now, it has been possible to
    back malloc() with hugepages at a userspace level using the morecore glibc
    hook. That is replacing anonymous memory with a file-backed VMA. It happens
    in a different place but it's just as deliberate as backing stack and the
    end result is very similar. As the file is ram-based, it doesn't have the
    same types of consequences like dirty page syncing that you'd ordinarily
    watch for when moving from anonymous to file-backed memory.

    > This patch has crossed a line in that it is really the first
    > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
    > of the VMAs at the user's request.


    We crossed that line with morecore, it's back there somewhere. We're just
    doing in kernel this time because backing stacks with hugepages in userspace
    turned out to be a hairy endevour.

    Properly supporting anonymous hugepages would either require larger
    changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
    Neither is a particularly appealing approach, nor is it likely to be a
    very popular one.

    > I'm really curious what the plan is
    > to follow up on this. Will this stack stuff turn out to be one-off
    > code, or is this *the* route for getting transparent large pages in the
    > future?
    >


    Conceivably, we could also implement MAP_LARGEPAGE for MAP_ANONYMOUS
    which would use the same hugetlb_file_setup() as for shmem and stacks
    with this patch. It would be a reliavely straight-forward patch if reusing
    hugetlb_file_setup() as the flags can be passed through almost verbatim. In
    that case, hugetlbfs still makes a good fit without making direct pagetable
    insertions necessary.

    > Because of the limitations like its inability to grow the VMA, I can't
    > imagine that this would be a generic mechanism that we can use
    > elsewhere.
    >


    What other than a stack even cares about VM_GROWSDOWN working? Besides,
    VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
    the file and moving the offset backwards (yeah ok, it ain't the prettiest
    but it's less churn than introducing significantly different codepaths). It's
    just not something that needs to be supported at first cut.

    brk() if you wanted to back hugepages with it conceivably needs a resizing
    VMA but in that case it's growing up in which case just extend the end of
    the VMA and increase the size of the file.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On Thu, 2008-08-07 at 17:06 +0100, Mel Gorman wrote:
    > On (06/08/08 12:50), Dave Hansen didst pronounce:
    > > The main thing this set of patches does that I care about is take an
    > > anonymous VMA and replace it with a hugetlb VMA. It does this on a
    > > special cue, but does it nonetheless.

    >
    > This is not actually a new thing. For a long time now, it has been possible to
    > back malloc() with hugepages at a userspace level using the morecore glibc
    > hook. That is replacing anonymous memory with a file-backed VMA. It happens
    > in a different place but it's just as deliberate as backing stack and the
    > end result is very similar. As the file is ram-based, it doesn't have the
    > same types of consequences like dirty page syncing that you'd ordinarily
    > watch for when moving from anonymous to file-backed memory.


    Yes, it has already been done in userspace. That's fine. It isn't
    adding any complexity to the kernel. This is adding behavior that the
    kernel has to support as well as complexity.

    > > This patch has crossed a line in that it is really the first
    > > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
    > > of the VMAs at the user's request.

    >
    > We crossed that line with morecore, it's back there somewhere. We're just
    > doing in kernel this time because backing stacks with hugepages in userspace
    > turned out to be a hairy endevour.
    >
    > Properly supporting anonymous hugepages would either require larger
    > changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
    > Neither is a particularly appealing approach, nor is it likely to be a
    > very popular one.


    I agree. It is always much harder to write code that can work
    generically´╗┐ (and get it accepted) than just write the smallest possible
    hack and stick it in fs/exec.c.

    Could this patch at least get fixed up to look like it could be used
    more generically? Some code to look up and replace anonymous VMAs with
    hugetlb-backed ones.
    ´╗┐
    > > Because of the limitations like its inability to grow the VMA, I can't
    > > imagine that this would be a generic mechanism that we can use
    > > elsewhere.

    >
    > What other than a stack even cares about VM_GROWSDOWN working? Besides,
    > VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
    > the file and moving the offset backwards (yeah ok, it ain't the prettiest
    > but it's less churn than introducing significantly different codepaths). It's
    > just not something that needs to be supported at first cut.
    >
    > brk() if you wanted to back hugepages with it conceivably needs a resizing
    > VMA but in that case it's growing up in which case just extend the end of
    > the VMA and increase the size of the file.


    I'm more worried about a small huge page size (say 64k) and not being
    able to merge the VMAs. I guess it could start in the *middle* of a
    file and map both directions.

    I guess you could always just have a single (very sparse) hugetlb file
    per mm to do all of this 'anonymous' hugetlb memory memory stuff, and
    just map its offsets 1:1 on to the process's virtual address space.
    That would make sure you could always merge VMAs, no matter how they
    grew together.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

    On (07/08/08 10:29), Dave Hansen didst pronounce:
    > On Thu, 2008-08-07 at 17:06 +0100, Mel Gorman wrote:
    > > On (06/08/08 12:50), Dave Hansen didst pronounce:
    > > > The main thing this set of patches does that I care about is take an
    > > > anonymous VMA and replace it with a hugetlb VMA. It does this on a
    > > > special cue, but does it nonetheless.

    > >
    > > This is not actually a new thing. For a long time now, it has been possible to
    > > back malloc() with hugepages at a userspace level using the morecore glibc
    > > hook. That is replacing anonymous memory with a file-backed VMA. It happens
    > > in a different place but it's just as deliberate as backing stack and the
    > > end result is very similar. As the file is ram-based, it doesn't have the
    > > same types of consequences like dirty page syncing that you'd ordinarily
    > > watch for when moving from anonymous to file-backed memory.

    >
    > Yes, it has already been done in userspace. That's fine. It isn't
    > adding any complexity to the kernel. This is adding behavior that the
    > kernel has to support as well as complexity.
    >


    The complexity is minimal and the progression logical.
    hugetlb_file_setup() is the API shmem was using to create a file on an
    internal mount suitable for MAP_SHARED. This patchset adds support for
    MAP_PRIVATE and the additional complexity is a lot less than supporting
    direct pagetable inserts.

    > > > This patch has crossed a line in that it is really the first
    > > > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
    > > > of the VMAs at the user's request.

    > >
    > > We crossed that line with morecore, it's back there somewhere. We're just
    > > doing in kernel this time because backing stacks with hugepages in userspace
    > > turned out to be a hairy endevour.
    > >
    > > Properly supporting anonymous hugepages would either require larger
    > > changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
    > > Neither is a particularly appealing approach, nor is it likely to be a
    > > very popular one.

    >
    > I agree. It is always much harder to write code that can work
    > generically??? (and get it accepted) than just write the smallest possible
    > hack and stick it in fs/exec.c.
    >
    > Could this patch at least get fixed up to look like it could be used
    > more generically? Some code to look up and replace anonymous VMAs with
    > hugetlb-backed ones???


    Ok, this latter point can be looked into at least although the
    underlying principal may still be using hugetlb_file_setup() rather than
    direct pagetable insertions.

    > > > Because of the limitations like its inability to grow the VMA, I can't
    > > > imagine that this would be a generic mechanism that we can use
    > > > elsewhere.

    > >
    > > What other than a stack even cares about VM_GROWSDOWN working? Besides,
    > > VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
    > > the file and moving the offset backwards (yeah ok, it ain't the prettiest
    > > but it's less churn than introducing significantly different codepaths). It's
    > > just not something that needs to be supported at first cut.
    > >
    > > brk() if you wanted to back hugepages with it conceivably needs a resizing
    > > VMA but in that case it's growing up in which case just extend the end of
    > > the VMA and increase the size of the file.

    >
    > I'm more worried about a small huge page size (say 64k) and not being
    > able to merge the VMAs. I guess it could start in the *middle* of a
    > file and map both directions.
    >
    > I guess you could always just have a single (very sparse) hugetlb file
    > per mm to do all of this 'anonymous' hugetlb memory memory stuff, and
    > just map its offsets 1:1 on to the process's virtual address space.
    > That would make sure you could always merge VMAs, no matter how they
    > grew together.
    >


    That's an interesting idea. It isn't as straight-forward as it sounds
    due to reservation tracking but at the face of it, I can't see why it
    couldn't be made work.

    --
    Mel Gorman
    Part-time Phd Student Linux Technology Center
    University of Limerick IBM Dublin Software Lab
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2