kmemcheck caught read from freed memory (cfq_free_io_context) - Kernel

This is a discussion on kmemcheck caught read from freed memory (cfq_free_io_context) - Kernel ; On Wed, Apr 02 2008, Peter Zijlstra wrote: > On Wed, 2008-04-02 at 03:40 -0700, Paul E. McKenney wrote: > > On Wed, Apr 02, 2008 at 09:17:10AM +0200, Jens Axboe wrote: > > > On Tue, Apr 01 2008, ...

+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast
Results 21 to 40 of 68

Thread: kmemcheck caught read from freed memory (cfq_free_io_context)

  1. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 03:40 -0700, Paul E. McKenney wrote:
    > > On Wed, Apr 02, 2008 at 09:17:10AM +0200, Jens Axboe wrote:
    > > > On Tue, Apr 01 2008, Peter Zijlstra wrote:
    > > > > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
    > > > > > Hi,
    > > > > >
    > > > > > This appeared in my logs:
    > > > > >
    > > > > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
    > > > > >
    > > > > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
    > > > > > EIP: 0060:[] EFLAGS: 00210202 CPU: 0
    > > > > > EIP is at call_for_each_cic+0x2d/0x44
    > > > > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
    > > > > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
    > > > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    > > > > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
    > > > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    > > > > > DR6: ffff4ff0 DR7: 00000400
    > > > > > [] kmemcheck_read+0xa8/0xe0
    > > > > > [] kmemcheck_access+0x1a5/0x244
    > > > > > [] do_page_fault+0x622/0x6fc
    > > > > > [] error_code+0x72/0x78
    > > > > > [] cfq_free_io_context+0xf/0x70
    > > > > > [] put_io_context+0x4f/0x58
    > > > > > [] exit_io_context+0x60/0x6c
    > > > > > [] do_exit+0x4d9/0x6f0
    > > > > > [] do_group_exit+0x29/0x88
    > > > > > [] sys_exit_group+0xf/0x14
    > > > > > [] sysenter_past_esp+0x6d/0xa4
    > > > > > [] 0xffffffff
    > > > > >
    > > > > > The error occurs in cfq_free_io_context()'s call to
    > > > > > call_for_each_cic() which looks like this:
    > > > > >
    > > > > > rcu_read_lock();
    > > > > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
    > > > > > func(ioc, cic);
    > > > > > called++;
    > > > > > }
    > > > > > rcu_read_unlock();
    > > > > >
    > > > > > The function that is called is cic_free_func(). It is postulated that
    > > > > > hlist_for_each_entry_rcu() will dereference the previously freed list
    > > > > > element to get the ->next pointer.
    > > > > >
    > > > > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
    > > > > > seemed evident that this list traversal should use
    > > > > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
    > > > > > pointer before the object is freed.
    > > > > >
    > > > > > Does this report seem to be valid?
    > > > > >
    > > > > > The kernel is 2.6.25-rc7.
    > > > >
    > > > > The missing hlist for loop would look something like so:
    > > > >
    > > > > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
    > > > > for (pos = (head)->first; \
    > > > > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
    > > > > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
    > > > > pos = n)
    > > >
    > > > Good catch, I wonder why it didn't complain in my testing. I've added a
    > > > patch to fix that, please see it here:
    > > >
    > > > http://git.kernel.dk/?p=linux-2.6-bl...059044083ea151

    > >
    > > I am still confused.
    > >
    > > o The hlist_for_each_entry_safe_rcu() is under rcu_read_lock().
    > >
    > > o The kmem_cache has SLAB_DESTROY_BY_RCU.
    > >
    > > o This means that a given slab should not be returned to the
    > > system until a grace period elapses.
    > >
    > > o So the bugginess (or not) of this code should not be affected
    > > by adding hlist_for_each_entry_safe_rcu() here.
    > >
    > > (I am not seeing the checks that would be needed to avoid
    > > something being kmem_cache_free()ed while being accessed,
    > > but might be missing something.)

    >
    > Agreed, when looking at this code its not making sense.


    Ditto agree

    > cfq_cic_lookup() is also mightily confusing. Only the actual
    > radix_tree_lookup() call is protected by RCU, I'm not seeing what
    > guarantees the existance of cic after rcu_read_unlock().
    >
    > Nor does it do a validation check to see if cic->key == cfqd, something
    > that would be needed when using SLAB_DESTROY_BY_RCU.


    It checks cic->key != NULL, it's set to NULL when it's invalid. Not sure
    if it could transition to some other cfqd and radix_tree_lookup() still
    returning the cic for the old key, if so it would need a check for ->key
    == cfqd there as well (like in the one-hit cache above, it checks ->key
    == cfqd).

    > This is most fishy code.


    Well, it's definitely not straight forward ;-)

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 14:11 +0300, Pekka Enberg wrote:
    > > Nothing. So you cannot access the object at all after you've called
    > > kmem_cache_free(). SLAB_RCU or no SLAB_RCU.


    On Wed, Apr 2, 2008 at 2:14 PM, Peter Zijlstra wrote:
    > Well, you can, but you have to validate you get the object you were
    > looking for.


    Yes. I keep forgetting that. Sorry.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 14:11 +0300, Pekka Enberg wrote:
    > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > - there's no bug there, at least related to hlist traversal and
    > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > the grace for freeing.

    >
    > On Wed, Apr 2, 2008 at 2:08 PM, Peter Zijlstra wrote:
    > > but what holds off the slab allocator re-issueing that same object and
    > > someone else writing other stuff into it?

    >
    > Nothing. So you cannot access the object at all after you've called
    > kmem_cache_free(). SLAB_RCU or no SLAB_RCU.


    Well, you can, but you have to validate you get the object you were
    looking for.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > - there's no bug there, at least related to hlist traversal and
    > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > the grace for freeing.


    On Wed, Apr 2, 2008 at 2:08 PM, Peter Zijlstra wrote:
    > but what holds off the slab allocator re-issueing that same object and
    > someone else writing other stuff into it?


    Nothing. So you cannot access the object at all after you've called
    kmem_cache_free(). SLAB_RCU or no SLAB_RCU.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > Hi Paul,
    > > >
    > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > wrote:
    > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > before gaining a reference to them and don't hold the reference past
    > > > > the matching rcu_read_unlock().
    > > >
    > > > No, kmemcheck is work in progress and does not know about
    > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > was because Peter, Vegard, and myself identified this particular
    > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > positives for RCU for now.

    > >
    > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > - there's no bug there, at least related to hlist traversal and
    > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > the grace for freeing.

    >
    > but what holds off the slab allocator re-issueing that same object and
    > someone else writing other stuff into it?


    Nothing, that's how rcu destry works here. But for the validation to be
    WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    NULL.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:20 +0200, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > Hi Paul,
    > > > > >
    > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > wrote:
    > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > the matching rcu_read_unlock().
    > > > > >
    > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > positives for RCU for now.
    > > > >
    > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > - there's no bug there, at least related to hlist traversal and
    > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > the grace for freeing.
    > > >
    > > > but what holds off the slab allocator re-issueing that same object and
    > > > someone else writing other stuff into it?

    > >
    > > Nothing, that's how rcu destry works here. But for the validation to be
    > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > NULL.
    > >

    >
    >
    > A B C
    >
    > cfq_cic_lookup(cfqd_1, ioc)
    >
    > rcu_read_lock()
    > cic = radix_tree_lookup(, cfqd_q);
    >
    > cfq_cic_free()
    >
    > cfq_cic_link(cfqd_2, ioc,)
    >
    > rcu_read_unlock()
    >
    >
    > and now we have that:
    >
    > cic->key == cfqd_2
    >
    >
    > I'm not seeing anything stopping this from happening.
    >
    > Which is also why we need hlist_for_each_safe_rcu() because as soon as
    > we kfree()d the thing, someone else might get the object and start
    > poking at the hlist pointers, wrecking out iteration.


    Or worse, when C doesn't happen and B free's the very last object and
    the slab does get returned, any usage of cic after rcu_read_unlock()
    might poke into free memory.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > Hi Paul,
    > > > >
    > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > wrote:
    > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > the matching rcu_read_unlock().
    > > > >
    > > > > No, kmemcheck is work in progress and does not know about
    > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > was because Peter, Vegard, and myself identified this particular
    > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > positives for RCU for now.
    > > >
    > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > - there's no bug there, at least related to hlist traversal and
    > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > the grace for freeing.

    > >
    > > but what holds off the slab allocator re-issueing that same object and
    > > someone else writing other stuff into it?

    >
    > Nothing, that's how rcu destry works here. But for the validation to be
    > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > NULL.
    >



    A B C

    cfq_cic_lookup(cfqd_1, ioc)

    rcu_read_lock()
    cic = radix_tree_lookup(, cfqd_q);

    cfq_cic_free()

    cfq_cic_link(cfqd_2, ioc,)

    rcu_read_unlock()


    and now we have that:

    cic->key == cfqd_2


    I'm not seeing anything stopping this from happening.

    Which is also why we need hlist_for_each_safe_rcu() because as soon as
    we kfree()d the thing, someone else might get the object and start
    poking at the hlist pointers, wrecking out iteration.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    > From: Peter Zijlstra
    > Date: Wed, Apr 02, 2008 12:59:21PM +0200
    >
    > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
    > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
    > > >
    > > > * Jens Axboe wrote:
    > > >
    > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
    > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
    > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
    > > > > > > patch to fix that, please see it here:
    > > > > >
    > > > > > You probably don't have kmemcheck in your kernel ;-)
    > > > >
    > > > > Ehm no, you are right
    > > >
    > > > ... and you can get kmemcheck by testing on x86.git/latest:
    > > >
    > > > http://people.redhat.com/mingo/x86.git/README
    > > >
    > > > ;-)

    > >
    > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > before gaining a reference to them and don't hold the reference past
    > > the matching rcu_read_unlock().

    >
    > I don't think it does.
    >
    > It would have to register an call_rcu callback itself in order to mark
    > it freed - and handle the race with the object being handed out again.
    >


    I had the same problem while debugging a cfq-derived i/o scheduler,
    and I found nothing preventing the reuse of the freed memory.
    The patch below seemed to fix the logic.

    Signed-off-by: Fabio Checconi
    ---
    diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
    index 0f962ec..f26da2b 100644
    --- a/block/cfq-iosched.c
    +++ b/block/cfq-iosched.c
    @@ -1143,24 +1143,37 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
    }

    /*
    - * Call func for each cic attached to this ioc. Returns number of cic's seen.
    + * Call func for each cic attached to this ioc.
    */
    -static unsigned int
    +static void
    call_for_each_cic(struct io_context *ioc,
    void (*func)(struct io_context *, struct cfq_io_context *))
    {
    struct cfq_io_context *cic;
    struct hlist_node *n;
    - int called = 0;

    rcu_read_lock();
    - hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
    + hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list)
    func(ioc, cic);
    - called++;
    - }
    rcu_read_unlock();
    +}
    +
    +static void cfq_cic_free_rcu(struct rcu_head *head)
    +{
    + struct cfq_io_context *cic;
    +
    + cic = container_of(head, struct cfq_io_context, rcu_head);
    +
    + kmem_cache_free(cfq_ioc_pool, cic);
    + elv_ioc_count_dec(ioc_count);
    +
    + if (ioc_gone && !elv_ioc_count_read(ioc_count))
    + complete(ioc_gone);
    +}

    - return called;
    +static void cfq_cic_free(struct cfq_io_context *cic)
    +{
    + call_rcu(&cic->rcu_head, cfq_cic_free_rcu);
    }

    static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
    @@ -1174,24 +1187,18 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
    hlist_del_rcu(&cic->cic_list);
    spin_unlock_irqrestore(&ioc->lock, flags);

    - kmem_cache_free(cfq_ioc_pool, cic);
    + cfq_cic_free(cic);
    }

    static void cfq_free_io_context(struct io_context *ioc)
    {
    - int freed;
    -
    /*
    - * ioc->refcount is zero here, so no more cic's are allowed to be
    - * linked into this ioc. So it should be ok to iterate over the known
    - * list, we will see all cic's since no new ones are added.
    + * ioc->refcount is zero here, or we are called from elv_unregister(),
    + * so no more cic's are allowed to be linked into this ioc. So it
    + * should be ok to iterate over the known list, we will see all cic's
    + * since no new ones are added.
    */
    - freed = call_for_each_cic(ioc, cic_free_func);
    -
    - elv_ioc_count_mod(ioc_count, -freed);
    -
    - if (ioc_gone && !elv_ioc_count_read(ioc_count))
    - complete(ioc_gone);
    + call_for_each_cic(ioc, cic_free_func);
    }

    static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
    @@ -1458,15 +1465,6 @@ cfq_get_queue(struct cfq_data *cfqd, int is_sync, struct io_context *ioc,
    return cfqq;
    }

    -static void cfq_cic_free(struct cfq_io_context *cic)
    -{
    - kmem_cache_free(cfq_ioc_pool, cic);
    - elv_ioc_count_dec(ioc_count);
    -
    - if (ioc_gone && !elv_ioc_count_read(ioc_count))
    - complete(ioc_gone);
    -}
    -
    /*
    * We drop cfq io contexts lazily, so we may find a dead one.
    */
    @@ -2138,7 +2136,7 @@ static int __init cfq_slab_setup(void)
    if (!cfq_pool)
    goto fail;

    - cfq_ioc_pool = KMEM_CACHE(cfq_io_context, SLAB_DESTROY_BY_RCU);
    + cfq_ioc_pool = KMEM_CACHE(cfq_io_context, 0);
    if (!cfq_ioc_pool)
    goto fail;

    @@ -2286,7 +2284,6 @@ static void __exit cfq_exit(void)
    smp_wmb();
    if (elv_ioc_count_read(ioc_count))
    wait_for_completion(ioc_gone);
    - synchronize_rcu();
    cfq_slab_kill();
    }

    diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
    index 1b4ccf2..50e448c 100644
    --- a/include/linux/iocontext.h
    +++ b/include/linux/iocontext.h
    @@ -54,6 +54,8 @@ struct cfq_io_context {

    void (*dtor)(struct io_context *); /* destructor */
    void (*exit)(struct io_context *); /* called on task exit */
    +
    + struct rcu_head rcu_head;
    };

    /*
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > Hi Paul,
    > > > > >
    > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > wrote:
    > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > the matching rcu_read_unlock().
    > > > > >
    > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > positives for RCU for now.
    > > > >
    > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > - there's no bug there, at least related to hlist traversal and
    > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > the grace for freeing.
    > > >
    > > > but what holds off the slab allocator re-issueing that same object and
    > > > someone else writing other stuff into it?

    > >
    > > Nothing, that's how rcu destry works here. But for the validation to be
    > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > NULL.
    > >

    >
    >
    > A B C
    >
    > cfq_cic_lookup(cfqd_1, ioc)
    >
    > rcu_read_lock()
    > cic = radix_tree_lookup(, cfqd_q);
    >
    > cfq_cic_free()
    >
    > cfq_cic_link(cfqd_2, ioc,)
    >
    > rcu_read_unlock()
    >
    >
    > and now we have that:
    >
    > cic->key == cfqd_2
    >
    >
    > I'm not seeing anything stopping this from happening.


    I don't follow your A-B-C here, what do they refer to?

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > Hi Paul,
    > > > > > >
    > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > wrote:
    > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > the matching rcu_read_unlock().
    > > > > > >
    > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > positives for RCU for now.
    > > > > >
    > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > the grace for freeing.
    > > > >
    > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > someone else writing other stuff into it?
    > > >
    > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > NULL.
    > > >

    > >
    > >
    > > A B C
    > >
    > > cfq_cic_lookup(cfqd_1, ioc)
    > >
    > > rcu_read_lock()
    > > cic = radix_tree_lookup(, cfqd_q);
    > >
    > > cfq_cic_free()
    > >
    > > cfq_cic_link(cfqd_2, ioc,)
    > >
    > > rcu_read_unlock()
    > >
    > >
    > > and now we have that:
    > >
    > > cic->key == cfqd_2
    > >
    > >
    > > I'm not seeing anything stopping this from happening.

    >
    > I don't follow your A-B-C here, what do they refer to?


    A does a radix_tree_lookup() of cfqd_1 (darn typos)
    B does a kfree of the same cic found by A
    C does an alloc and gets the same cic as freed by B and inserts it
    in a different location.

    So that when we return to A, cic->key == cfqd_2 even though we did a
    lookup for cfqd_1.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > > > Hi Paul,
    > > > > > > > >
    > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > > > wrote:
    > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > > > the matching rcu_read_unlock().
    > > > > > > > >
    > > > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > > > positives for RCU for now.
    > > > > > > >
    > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > > > the grace for freeing.
    > > > > > >
    > > > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > > > someone else writing other stuff into it?
    > > > > >
    > > > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > > > NULL.
    > > > > >
    > > > >
    > > > >
    > > > > A B C
    > > > >
    > > > > cfq_cic_lookup(cfqd_1, ioc)
    > > > >
    > > > > rcu_read_lock()
    > > > > cic = radix_tree_lookup(, cfqd_q);
    > > > >
    > > > > cfq_cic_free()
    > > > >
    > > > > cfq_cic_link(cfqd_2, ioc,)
    > > > >
    > > > > rcu_read_unlock()
    > > > >
    > > > >
    > > > > and now we have that:
    > > > >
    > > > > cic->key == cfqd_2
    > > > >
    > > > >
    > > > > I'm not seeing anything stopping this from happening.
    > > >
    > > > I don't follow your A-B-C here, what do they refer to?

    > >
    > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
    > > B does a kfree of the same cic found by A
    > > C does an alloc and gets the same cic as freed by B and inserts it
    > > in a different location.
    > >
    > > So that when we return to A, cic->key == cfqd_2 even though we did a
    > > lookup for cfqd_1.

    >
    > That I follow, my question was if A, B, and C refer to different
    > processes but with a shared io context? I'm assuming that is correct...


    Ah, yeah, whatever is needed to make this race happen :-)

    > And it does look buggy. It looks my assumption of what slab rcu destroy
    > did is WRONG, it should be replaced by a manual call_rcu() freeing
    > instead.


    Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
    I'm sure this is not the first (nor the last) time people get that
    wrong.

    This would be one of those things that score very low on Rusty's API
    list.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Fabio Checconi wrote:
    > > From: Peter Zijlstra
    > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
    > >
    > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
    > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
    > > > >
    > > > > * Jens Axboe wrote:
    > > > >
    > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
    > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
    > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
    > > > > > > > patch to fix that, please see it here:
    > > > > > >
    > > > > > > You probably don't have kmemcheck in your kernel ;-)
    > > > > >
    > > > > > Ehm no, you are right
    > > > >
    > > > > ... and you can get kmemcheck by testing on x86.git/latest:
    > > > >
    > > > > http://people.redhat.com/mingo/x86.git/README
    > > > >
    > > > > ;-)
    > > >
    > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > before gaining a reference to them and don't hold the reference past
    > > > the matching rcu_read_unlock().

    > >
    > > I don't think it does.
    > >
    > > It would have to register an call_rcu callback itself in order to mark
    > > it freed - and handle the race with the object being handed out again.
    > >

    >
    > I had the same problem while debugging a cfq-derived i/o scheduler,
    > and I found nothing preventing the reuse of the freed memory.
    > The patch below seemed to fix the logic.


    Thanks, from a first look this looks like it'll fix this bad rcu slab
    usage. I'll give it some closer scrutiny and testing.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > > Hi Paul,
    > > > > > > >
    > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > > wrote:
    > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > > the matching rcu_read_unlock().
    > > > > > > >
    > > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > > positives for RCU for now.
    > > > > > >
    > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > > the grace for freeing.
    > > > > >
    > > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > > someone else writing other stuff into it?
    > > > >
    > > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > > NULL.
    > > > >
    > > >
    > > >
    > > > A B C
    > > >
    > > > cfq_cic_lookup(cfqd_1, ioc)
    > > >
    > > > rcu_read_lock()
    > > > cic = radix_tree_lookup(, cfqd_q);
    > > >
    > > > cfq_cic_free()
    > > >
    > > > cfq_cic_link(cfqd_2, ioc,)
    > > >
    > > > rcu_read_unlock()
    > > >
    > > >
    > > > and now we have that:
    > > >
    > > > cic->key == cfqd_2
    > > >
    > > >
    > > > I'm not seeing anything stopping this from happening.

    > >
    > > I don't follow your A-B-C here, what do they refer to?

    >
    > A does a radix_tree_lookup() of cfqd_1 (darn typos)
    > B does a kfree of the same cic found by A
    > C does an alloc and gets the same cic as freed by B and inserts it
    > in a different location.
    >
    > So that when we return to A, cic->key == cfqd_2 even though we did a
    > lookup for cfqd_1.


    That I follow, my question was if A, B, and C refer to different
    processes but with a shared io context? I'm assuming that is correct...

    And it does look buggy. It looks my assumption of what slab rcu destroy
    did is WRONG, it should be replaced by a manual call_rcu() freeing
    instead.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > > > > Hi Paul,
    > > > > > > > > >
    > > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > > > > wrote:
    > > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > > > > the matching rcu_read_unlock().
    > > > > > > > > >
    > > > > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > > > > positives for RCU for now.
    > > > > > > > >
    > > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > > > > the grace for freeing.
    > > > > > > >
    > > > > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > > > > someone else writing other stuff into it?
    > > > > > >
    > > > > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > > > > NULL.
    > > > > > >
    > > > > >
    > > > > >
    > > > > > A B C
    > > > > >
    > > > > > cfq_cic_lookup(cfqd_1, ioc)
    > > > > >
    > > > > > rcu_read_lock()
    > > > > > cic = radix_tree_lookup(, cfqd_q);
    > > > > >
    > > > > > cfq_cic_free()
    > > > > >
    > > > > > cfq_cic_link(cfqd_2, ioc,)
    > > > > >
    > > > > > rcu_read_unlock()
    > > > > >
    > > > > >
    > > > > > and now we have that:
    > > > > >
    > > > > > cic->key == cfqd_2
    > > > > >
    > > > > >
    > > > > > I'm not seeing anything stopping this from happening.
    > > > >
    > > > > I don't follow your A-B-C here, what do they refer to?
    > > >
    > > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
    > > > B does a kfree of the same cic found by A
    > > > C does an alloc and gets the same cic as freed by B and inserts it
    > > > in a different location.
    > > >
    > > > So that when we return to A, cic->key == cfqd_2 even though we did a
    > > > lookup for cfqd_1.

    > >
    > > That I follow, my question was if A, B, and C refer to different
    > > processes but with a shared io context? I'm assuming that is correct...

    >
    > Ah, yeah, whatever is needed to make this race happen :-)


    The only place where you'll have multiple processes involved with this
    at all is if they share io contexts. That is also why the bug isn't that
    critical, since it's not possible right now (CLONE_IO flag must be
    used).

    > > And it does look buggy. It looks my assumption of what slab rcu destroy
    > > did is WRONG, it should be replaced by a manual call_rcu() freeing
    > > instead.

    >
    > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
    > I'm sure this is not the first (nor the last) time people get that
    > wrong.


    It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
    to be an 'easier' way of doing the call_rcu() manually. So it definitely
    needs more documentation.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:

    > > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
    > > I'm sure this is not the first (nor the last) time people get that
    > > wrong.

    >
    > It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
    > to be an 'easier' way of doing the call_rcu() manually. So it definitely
    > needs more documentation.
    >


    Ok I gave it a go, how bad is this text?

    Signed-off-by: Peter Zijlstra
    ---
    diff --git a/include/linux/slab.h b/include/linux/slab.h
    index f950a89..e049ddc 100644
    --- a/include/linux/slab.h
    +++ b/include/linux/slab.h
    @@ -25,6 +25,32 @@
    #define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
    #define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
    #define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
    +/*
    + * SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
    + *
    + * This delays freeing the SLAB page by a grace period, it does _NOT_
    + * delay object freeing. This means that if you do kmem_cache_free()
    + * that memory location is free to be reused at any time. Thus it may
    + * be possible to see another object there in the same RCU grace period.
    + *
    + * This feature only ensures the memory location backing the object
    + * stays valid, the trick to using this is relying on an independent
    + * object validation pass. Something like:
    + *
    + * rcu_read_lock()
    + * again:
    + * obj = lockless_lookup(key);
    + * if (obj) {
    + * if (!try_get_ref(obj)) // might fail for free objects
    + * goto again;
    + *
    + * if (obj->key != key) { // not the object we expected
    + * put_ref(obj);
    + * goto again;
    + * }
    + * }
    + * rcu_read_unlock();
    + */
    #define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
    #define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
    #define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
    >
    > > > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
    > > > I'm sure this is not the first (nor the last) time people get that
    > > > wrong.

    > >
    > > It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
    > > to be an 'easier' way of doing the call_rcu() manually. So it definitely
    > > needs more documentation.
    > >

    >
    > Ok I gave it a go, how bad is this text?


    I think it looks good. The key point is this:

    "This delays freeing the SLAB page by a grace period, it does _NOT_ delay
    object freeing."

    which is right in the front of the text and with sample validation
    below. So you can add my acked-by to that, if you want.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
    > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > > > > > Hi Paul,
    > > > > > > > > > >
    > > > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > > > > > wrote:
    > > > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > > > > > the matching rcu_read_unlock().
    > > > > > > > > > >
    > > > > > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > > > > > positives for RCU for now.
    > > > > > > > > >
    > > > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > > > > > the grace for freeing.
    > > > > > > > >
    > > > > > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > > > > > someone else writing other stuff into it?
    > > > > > > >
    > > > > > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > > > > > NULL.
    > > > > > > >
    > > > > > >
    > > > > > >
    > > > > > > A B C
    > > > > > >
    > > > > > > cfq_cic_lookup(cfqd_1, ioc)
    > > > > > >
    > > > > > > rcu_read_lock()
    > > > > > > cic = radix_tree_lookup(, cfqd_q);
    > > > > > >
    > > > > > > cfq_cic_free()
    > > > > > >
    > > > > > > cfq_cic_link(cfqd_2, ioc,)
    > > > > > >
    > > > > > > rcu_read_unlock()
    > > > > > >
    > > > > > >
    > > > > > > and now we have that:
    > > > > > >
    > > > > > > cic->key == cfqd_2
    > > > > > >
    > > > > > >
    > > > > > > I'm not seeing anything stopping this from happening.
    > > > > >
    > > > > > I don't follow your A-B-C here, what do they refer to?
    > > > >
    > > > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
    > > > > B does a kfree of the same cic found by A
    > > > > C does an alloc and gets the same cic as freed by B and inserts it
    > > > > in a different location.
    > > > >
    > > > > So that when we return to A, cic->key == cfqd_2 even though we did a
    > > > > lookup for cfqd_1.
    > > >
    > > > That I follow, my question was if A, B, and C refer to different
    > > > processes but with a shared io context? I'm assuming that is correct...

    > >
    > > Ah, yeah, whatever is needed to make this race happen :-)

    >
    > The only place where you'll have multiple processes involved with this
    > at all is if they share io contexts. That is also why the bug isn't that
    > critical, since it's not possible right now (CLONE_IO flag must be
    > used).


    There are 3 races here:

    1) A continues with another object than intended
    (requires CLONE_IO)

    2) A does hlist_for_each_rcu() and races with B,C so that
    we continue the iteration on a possibly unrelated list.

    3) cic is freed after the !cic->key check.

    I'm not familiar enough with the code yet to see if 3 really is an
    possibility. But from what I can see there is nothing guarding its
    existence.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Fabio Checconi wrote:
    > > > From: Peter Zijlstra
    > > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
    > > >
    > > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
    > > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
    > > > > >
    > > > > > * Jens Axboe wrote:
    > > > > >
    > > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
    > > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
    > > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
    > > > > > > > > patch to fix that, please see it here:
    > > > > > > >
    > > > > > > > You probably don't have kmemcheck in your kernel ;-)
    > > > > > >
    > > > > > > Ehm no, you are right
    > > > > >
    > > > > > ... and you can get kmemcheck by testing on x86.git/latest:
    > > > > >
    > > > > > http://people.redhat.com/mingo/x86.git/README
    > > > > >
    > > > > > ;-)
    > > > >
    > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > before gaining a reference to them and don't hold the reference past
    > > > > the matching rcu_read_unlock().
    > > >
    > > > I don't think it does.
    > > >
    > > > It would have to register an call_rcu callback itself in order to mark
    > > > it freed - and handle the race with the object being handed out again.
    > > >

    > >
    > > I had the same problem while debugging a cfq-derived i/o scheduler,
    > > and I found nothing preventing the reuse of the freed memory.
    > > The patch below seemed to fix the logic.

    >
    > Thanks, from a first look this looks like it'll fix this bad rcu slab
    > usage. I'll give it some closer scrutiny and testing.


    (CC reinstated, sometimes mutt is really annoying and drops the person
    you are replying too :-(

    Looks good and tests fine as well. I've applied it, on top of the
    hlist_for_each_entry_safe_rcu() fix.

    http://git.kernel.dk/?p=linux-2.6-bl...6312545f126661

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
    > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
    > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
    > > > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
    > > > > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
    > > > > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
    > > > > > > > > > > > Hi Paul,
    > > > > > > > > > > >
    > > > > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
    > > > > > > > > > > > wrote:
    > > > > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > > > > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > > > > > > > > the matching rcu_read_unlock().
    > > > > > > > > > > >
    > > > > > > > > > > > No, kmemcheck is work in progress and does not know about
    > > > > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
    > > > > > > > > > > > was because Peter, Vegard, and myself identified this particular
    > > > > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
    > > > > > > > > > > > positives for RCU for now.
    > > > > > > > > > >
    > > > > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
    > > > > > > > > > > - there's no bug there, at least related to hlist traversal and
    > > > > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
    > > > > > > > > > > the grace for freeing.
    > > > > > > > > >
    > > > > > > > > > but what holds off the slab allocator re-issueing that same object and
    > > > > > > > > > someone else writing other stuff into it?
    > > > > > > > >
    > > > > > > > > Nothing, that's how rcu destry works here. But for the validation to be
    > > > > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
    > > > > > > > > NULL.
    > > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > > A B C
    > > > > > > >
    > > > > > > > cfq_cic_lookup(cfqd_1, ioc)
    > > > > > > >
    > > > > > > > rcu_read_lock()
    > > > > > > > cic = radix_tree_lookup(, cfqd_q);
    > > > > > > >
    > > > > > > > cfq_cic_free()
    > > > > > > >
    > > > > > > > cfq_cic_link(cfqd_2, ioc,)
    > > > > > > >
    > > > > > > > rcu_read_unlock()
    > > > > > > >
    > > > > > > >
    > > > > > > > and now we have that:
    > > > > > > >
    > > > > > > > cic->key == cfqd_2
    > > > > > > >
    > > > > > > >
    > > > > > > > I'm not seeing anything stopping this from happening.
    > > > > > >
    > > > > > > I don't follow your A-B-C here, what do they refer to?
    > > > > >
    > > > > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
    > > > > > B does a kfree of the same cic found by A
    > > > > > C does an alloc and gets the same cic as freed by B and inserts it
    > > > > > in a different location.
    > > > > >
    > > > > > So that when we return to A, cic->key == cfqd_2 even though we did a
    > > > > > lookup for cfqd_1.
    > > > >
    > > > > That I follow, my question was if A, B, and C refer to different
    > > > > processes but with a shared io context? I'm assuming that is correct...
    > > >
    > > > Ah, yeah, whatever is needed to make this race happen :-)

    > >
    > > The only place where you'll have multiple processes involved with this
    > > at all is if they share io contexts. That is also why the bug isn't that
    > > critical, since it's not possible right now (CLONE_IO flag must be
    > > used).

    >
    > There are 3 races here:
    >
    > 1) A continues with another object than intended
    > (requires CLONE_IO)
    >
    > 2) A does hlist_for_each_rcu() and races with B,C so that
    > we continue the iteration on a possibly unrelated list.
    >
    > 3) cic is freed after the !cic->key check.
    >
    > I'm not familiar enough with the code yet to see if 3 really is an
    > possibility. But from what I can see there is nothing guarding its
    > existence.


    All 3 require CLONE_IO, because if that is not set, there's a 1:1
    mapping between a process and io context (no sharing occurs).

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: kmemcheck caught read from freed memory (cfq_free_io_context)

    On Wed, Apr 02 2008, Jens Axboe wrote:
    > On Wed, Apr 02 2008, Jens Axboe wrote:
    > > On Wed, Apr 02 2008, Fabio Checconi wrote:
    > > > > From: Peter Zijlstra
    > > > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
    > > > >
    > > > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
    > > > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
    > > > > > >
    > > > > > > * Jens Axboe wrote:
    > > > > > >
    > > > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
    > > > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
    > > > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
    > > > > > > > > > patch to fix that, please see it here:
    > > > > > > > >
    > > > > > > > > You probably don't have kmemcheck in your kernel ;-)
    > > > > > > >
    > > > > > > > Ehm no, you are right
    > > > > > >
    > > > > > > ... and you can get kmemcheck by testing on x86.git/latest:
    > > > > > >
    > > > > > > http://people.redhat.com/mingo/x86.git/README
    > > > > > >
    > > > > > > ;-)
    > > > > >
    > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
    > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
    > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
    > > > > > before gaining a reference to them and don't hold the reference past
    > > > > > the matching rcu_read_unlock().
    > > > >
    > > > > I don't think it does.
    > > > >
    > > > > It would have to register an call_rcu callback itself in order to mark
    > > > > it freed - and handle the race with the object being handed out again.
    > > > >
    > > >
    > > > I had the same problem while debugging a cfq-derived i/o scheduler,
    > > > and I found nothing preventing the reuse of the freed memory.
    > > > The patch below seemed to fix the logic.

    > >
    > > Thanks, from a first look this looks like it'll fix this bad rcu slab
    > > usage. I'll give it some closer scrutiny and testing.

    >
    > (CC reinstated, sometimes mutt is really annoying and drops the person
    > you are replying too :-(


    (for real...)

    >
    > Looks good and tests fine as well. I've applied it, on top of the
    > hlist_for_each_entry_safe_rcu() fix.
    >
    > http://git.kernel.dk/?p=linux-2.6-bl...6312545f126661
    >
    > --
    > Jens Axboe
    >


    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 4 FirstFirst 1 2 3 4 LastLast