Re: SLUB defrag pull request? - Kernel

This is a discussion on Re: SLUB defrag pull request? - Kernel ; On Wed, 22 Oct 2008, Miklos Szeredi wrote: > Why? The kmem_cache_free() doesn't touch the contents of the object, > does it? Because filesystem code may be running on other processors which may be freeing the dentry. >> Because the ...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 47

Thread: Re: SLUB defrag pull request?

  1. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Miklos Szeredi wrote:

    > Why? The kmem_cache_free() doesn't touch the contents of the object,
    > does it?


    Because filesystem code may be running on other processors which may be
    freeing the dentry.

    >> Because the slab starts out with a series of objects left in a slab. It
    >> needs to do build a list of objects etc in a way that is independent as
    >> possible from the user of the slab page. It does that by locking the slab
    >> page so that free operations stall until the reference has been
    >> established. If it would not be shutting off frees then the objects could
    >> vanish under us.

    >
    > It doesn't matter. All we care about is that the dentry is on the
    > lru: it's cached but unused. Every other state (being created,
    > active, being freed, freed) is uninteresting.


    We cannot figure out that it is on the lru if we do not have a stable
    reference to the object.

    > Sure, and all that is possible without doing this messy 2 phase thing.
    > Unless I'm still missing something obvious...


    Obviously one cannot free or handle an object that may be concurrently
    freed on another processor.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Christoph Lameter wrote:
    > On Wed, 22 Oct 2008, Miklos Szeredi wrote:
    >
    > > Why? The kmem_cache_free() doesn't touch the contents of the object,
    > > does it?

    >
    > Because filesystem code may be running on other processors which may be
    > freeing the dentry.


    You are not actually listening to what I'm saying. Please read the
    question carefully again.

    Thanks,
    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Christoph Lameter wrote:
    > On Wed, 22 Oct 2008, Miklos Szeredi wrote:
    >
    > > You are not actually listening to what I'm saying. Please read the
    > > question carefully again.

    >
    > That is the impression that I got from you too. I have listed the options
    > to get a reliable reference to an object and you seem to just skip over
    > it.


    Because you don't _need_ a reliable reference to access the contents
    of the dentry. The dentry is still there after being freed, as long
    as the underlying slab is there and isn't being reused for some other
    purpose. But you can easily ensure that from the slab code.

    Hmm?

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Miklos Szeredi wrote:

    > You are not actually listening to what I'm saying. Please read the
    > question carefully again.


    That is the impression that I got from you too. I have listed the options
    to get a reliable reference to an object and you seem to just skip over
    it.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: SLUB defrag pull request?

    On Wed, Oct 22, 2008 at 11:26 PM, Miklos Szeredi wrote:
    > Because you don't _need_ a reliable reference to access the contents
    > of the dentry. The dentry is still there after being freed, as long
    > as the underlying slab is there and isn't being reused for some other
    > purpose. But you can easily ensure that from the slab code.
    >
    > Hmm?


    Actually, when debugging is enabled, it's customary to poison the
    object, for example (see free_debug_processing() in mm/slub.c). So we
    really can't "easily ensure" that in the allocator unless we by-pass
    all the current debugging code.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Pekka Enberg wrote:
    > On Wed, Oct 22, 2008 at 11:26 PM, Miklos Szeredi wrote:
    > > Because you don't _need_ a reliable reference to access the contents
    > > of the dentry. The dentry is still there after being freed, as long
    > > as the underlying slab is there and isn't being reused for some other
    > > purpose. But you can easily ensure that from the slab code.
    > >
    > > Hmm?

    >
    > Actually, when debugging is enabled, it's customary to poison the
    > object, for example (see free_debug_processing() in mm/slub.c). So we
    > really can't "easily ensure" that in the allocator unless we by-pass
    > all the current debugging code.


    Thank you, that does actually answer my question. I would still think
    it's a good sacrifice to no let the dentries be poisoned for the sake
    of a simpler dentry defragmenter.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Pekka Enberg wrote:

    > Actually, when debugging is enabled, it's customary to poison the
    > object, for example (see free_debug_processing() in mm/slub.c). So we
    > really can't "easily ensure" that in the allocator unless we by-pass
    > all the current debugging code.


    We may be talking of different frees here. Maybe what he means by freeing
    is that the object was put on the lru? And we understand a kfree().
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Miklos Szeredi wrote:

    >> That is the impression that I got from you too. I have listed the options
    >> to get a reliable reference to an object and you seem to just skip over
    >> it.

    >
    > Because you don't _need_ a reliable reference to access the contents
    > of the dentry. The dentry is still there after being freed, as long
    > as the underlying slab is there and isn't being reused for some other
    > purpose. But you can easily ensure that from the slab code.


    With the two callbacks that I described that would take the global
    lock? That was already discussed before. Please read! It does not scale
    and the lock would have to be acquired before objects in a slab page are
    scanned and handled in any way.

    Without that locking any other processor can go into reclaim and start
    evicting the dentries that we are operating upon.

    Freeing in the slab sense means that a kfree ran to get rid of the
    object.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: SLUB defrag pull request?

    Hi Miklos,

    On Thu, Oct 23, 2008 at 12:04 AM, Miklos Szeredi wrote:
    >> Actually, when debugging is enabled, it's customary to poison the
    >> object, for example (see free_debug_processing() in mm/slub.c). So we
    >> really can't "easily ensure" that in the allocator unless we by-pass
    >> all the current debugging code.

    >
    > Thank you, that does actually answer my question. I would still think
    > it's a good sacrifice to no let the dentries be poisoned for the sake
    > of a simpler dentry defragmenter.


    To be honest, I haven't paid enough attention to the discussion to see
    how much simpler it would be. But I don't like the idea of forcibly
    disabling debugging for slab caches because of a new core feature in
    the allocator. Keep in mind that it's not just dentries we're talking
    about here, we're defragmenting inodes as well.

    Pekka
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Miklos Szeredi wrote:

    >> Actually, when debugging is enabled, it's customary to poison the
    >> object, for example (see free_debug_processing() in mm/slub.c). So we
    >> really can't "easily ensure" that in the allocator unless we by-pass
    >> all the current debugging code.


    Plus the allocator may be reusing parts of the freed object for a freelist
    etc even if the object is not poisoned.

    > Thank you, that does actually answer my question. I would still think
    > it's a good sacrifice to no let the dentries be poisoned for the sake
    > of a simpler dentry defragmenter.


    You can simplify defrag by not doing anything in the get() method. That
    means some of the objects passed to the kick() method may be already have
    been freed in the interim.

    The kick method then must be able to determine if the object has already
    been freed (or is undergoing freeing) by inspecting the object contents
    (allocations are held off until kick() is complete). It then needs to free
    only the objects that are still allocated.

    That way you could get to a one stage system.... If the dentry code can
    give us that then the approach would become much simpler.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: SLUB defrag pull request?

    On Wed, 22 Oct 2008, Christoph Lameter wrote:
    > On Wed, 22 Oct 2008, Miklos Szeredi wrote:
    >
    > >> Actually, when debugging is enabled, it's customary to poison the
    > >> object, for example (see free_debug_processing() in mm/slub.c). So we
    > >> really can't "easily ensure" that in the allocator unless we by-pass
    > >> all the current debugging code.

    >
    > Plus the allocator may be reusing parts of the freed object for a freelist
    > etc even if the object is not poisoned.


    Actually, no: looking at the slub code it already makes sure that
    objects are neither poisoned, nor touched in any way _if_ there is a
    constructor for the object. And for good reason too, otherwise a
    reused object would contain rubbish after a second allocation.

    Come on guys, you should be the experts in this thing!

    So again, just checking d_lru should do work fine. There's absolutely
    no need to mess with extra references in a separate phase, which leads
    to lots of complications.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: SLUB defrag pull request?

    On Thu, 23 Oct 2008, Miklos Szeredi wrote:

    > So again, just checking d_lru should do work fine. There's absolutely
    > no need to mess with extra references in a separate phase, which leads
    > to lots of complications.


    Then try it the way I outlined it by skipping the get() stage. You just
    need to add checks for the poison in case debugging is on and then you
    should be fine.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: SLUB defrag pull request?

    Hi Miklos,

    On Thu, 2008-10-23 at 00:10 +0200, Miklos Szeredi wrote:
    > Actually, no: looking at the slub code it already makes sure that
    > objects are neither poisoned, nor touched in any way _if_ there is a
    > constructor for the object. And for good reason too, otherwise a
    > reused object would contain rubbish after a second allocation.


    There's no inherent reason why we cannot poison slab caches with a
    constructor. As a matter of fact SLAB does it which is probably why I
    got confused here. The only thing that needs to disable slab poisoning
    by design is SLAB_DESTROY_BY_RCU.

    But for SLUB, you're obviously right.

    On Thu, 2008-10-23 at 00:10 +0200, Miklos Szeredi wrote:
    > Come on guys, you should be the experts in this thing!


    Yeah, I know. Yet you're stuck with us. That's sad.

    Pekka

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: SLUB defrag pull request?

    On Thu, 23 Oct 2008, Pekka Enberg wrote:
    > On Thu, 2008-10-23 at 00:10 +0200, Miklos Szeredi wrote:
    > > Actually, no: looking at the slub code it already makes sure that
    > > objects are neither poisoned, nor touched in any way _if_ there is a
    > > constructor for the object. And for good reason too, otherwise a
    > > reused object would contain rubbish after a second allocation.

    >
    > There's no inherent reason why we cannot poison slab caches with a
    > constructor.


    Right, it just needs to call the constructor for every allocation.

    > > Come on guys, you should be the experts in this thing!

    >
    > Yeah, I know. Yet you're stuck with us. That's sad.


    No, I was a bit rude, sorry.

    I think the _real_ problem is that instead of fancy features like this
    defragmenter, SLUB should first concentrate on getting the code solid
    enough to replace the other allocators.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: SLUB defrag pull request?

    On Thu, 23 Oct 2008, Miklos Szeredi wrote:

    > I think the _real_ problem is that instead of fancy features like this
    > defragmenter, SLUB should first concentrate on getting the code solid
    > enough to replace the other allocators.


    Solid? What is not solid? The SLUB design was made in part because of the
    defrag problems that were not easy to solve with SLAB. The ability to lock
    down a slab allows stabilizing objects. We discussed solutions to the
    fragmentation problem for years and did not get anywhere with SLAB.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: SLUB defrag pull request?

    On Thu, Oct 23, 2008 at 4:40 PM, Christoph Lameter
    wrote:
    > Solid? What is not solid? The SLUB design was made in part because of the
    > defrag problems that were not easy to solve with SLAB. The ability to lock
    > down a slab allows stabilizing objects. We discussed solutions to the
    > fragmentation problem for years and did not get anywhere with SLAB.


    I'd assume he's talking about the Intel-reported regression that's yet
    to be resolved.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: SLUB defrag pull request?

    On Thu, Oct 23, 2008 at 5:09 PM, Christoph Lameter
    wrote:
    > Got a draft of a patch here that does freelist handling differently. Instead
    > of building linked lists it uses free objects to build arrays of pointers to
    > free objects. That improves cache cold free behavior since the object
    > contents itself does not have to be touched on free.
    >
    > The problem looks like its freeing objects on a different processor that
    > where it was used last. With the pointer array it is only necessary to touch
    > the objects that contain the arrays.


    Interesting. SLAB gets away with this because of per-cpu caches or
    because it uses the bufctls instead of a freelist?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: SLUB defrag pull request?

    On Thu, 23 Oct 2008, Pekka Enberg wrote:

    > On Thu, Oct 23, 2008 at 4:40 PM, Christoph Lameter
    > wrote:
    >> Solid? What is not solid? The SLUB design was made in part because of the
    >> defrag problems that were not easy to solve with SLAB. The ability to lock
    >> down a slab allows stabilizing objects. We discussed solutions to the
    >> fragmentation problem for years and did not get anywhere with SLAB.

    >
    > I'd assume he's talking about the Intel-reported regression that's yet
    > to be resolved.


    On that subject:

    Got a draft of a patch here that does freelist handling differently.
    Instead of building linked lists it uses free objects to build arrays of
    pointers to free objects. That improves cache cold free behavior since the
    object contents itself does not have to be touched on free.

    The problem looks like its freeing objects on a different processor that
    where it was used last. With the pointer array it is only necessary to
    touch the objects that contain the arrays.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: SLUB defrag pull request?

    On Thu, 23 Oct 2008, Pekka Enberg wrote:

    >> The problem looks like its freeing objects on a different processor that
    >> where it was used last. With the pointer array it is only necessary to touch
    >> the objects that contain the arrays.

    >
    > Interesting. SLAB gets away with this because of per-cpu caches or
    > because it uses the bufctls instead of a freelist?


    Exactly. Slab adds a special management structure to each slab page that
    contains the freelist and other stuff. Freeing first occurs to a per cpu
    queue that contains an array of pointers. Then later the objects are moved
    from the pointer array into the management structure for the slab.

    What we could do for SLUB is to generate a linked list of pointer arrays
    in the free objects of a slab page. If all objects are allocated then no
    pointer array is needed. The first object freed would become the first
    pointer array. If that is found to be exhausted then the object currently
    being freed is becoming the next pointer array and we put a link to the
    old one into the object as well.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: SLUB defrag pull request?

    Christoph Lameter a crit :
    > On Thu, 23 Oct 2008, Pekka Enberg wrote:
    >
    >>> The problem looks like its freeing objects on a different processor that
    >>> where it was used last. With the pointer array it is only necessary
    >>> to touch
    >>> the objects that contain the arrays.

    >>
    >> Interesting. SLAB gets away with this because of per-cpu caches or
    >> because it uses the bufctls instead of a freelist?

    >
    > Exactly. Slab adds a special management structure to each slab page that
    > contains the freelist and other stuff. Freeing first occurs to a per cpu
    > queue that contains an array of pointers. Then later the objects are
    > moved from the pointer array into the management structure for the slab.
    >
    > What we could do for SLUB is to generate a linked list of pointer arrays
    > in the free objects of a slab page. If all objects are allocated then no
    > pointer array is needed. The first object freed would become the first
    > pointer array. If that is found to be exhausted then the object
    > currently being freed is becoming the next pointer array and we put a
    > link to the old one into the object as well.
    >


    This idea is very nice, especially considering that many objects are freed
    by RCU, and their rcu_head (which is hot at kfree() time), might be far
    away the linked list anchor actually used in SLUB.

    At alloc time, I remember I added a prefetchw() call in SLAB in __cache_alloc(),
    this could explain some differences between SLUB and SLAB too, since SLAB
    gives a hint to processor to warm its cache.




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast