__vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread - Kernel

This is a discussion on __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread - Kernel ; Hello all, When sysctl_overcommit_memory is set OVERCOMMIT_NEVER, __vm_enough_memory() refers current->mm. For example, # exportfs -i -o ... localhost:/tmpfs # mkdir /tmp/w # mount -o ... localhost:/tmpfs /tmp/w # yes > /tmp/w/fileA In this case, NFSD (kernel thread) calls shmem_file_write() or ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

  1. __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


    Hello all,

    When sysctl_overcommit_memory is set OVERCOMMIT_NEVER,
    __vm_enough_memory() refers current->mm.

    For example,
    # exportfs -i -o ... localhost:/tmpfs
    # mkdir /tmp/w
    # mount -o ... localhost:/tmpfs /tmp/w
    # yes > /tmp/w/fileA

    In this case, NFSD (kernel thread) calls shmem_file_write() or
    shmem_write_begin() and __vm_enough_memory() is called. But current->mm
    is NULL and the kernel crashes.
    If a user have to set OVERCOMMIT_NEVER, where should we fix?


    Junjiro R. Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

    > In this case, NFSD (kernel thread) calls shmem_file_write() or
    > shmem_write_begin() and __vm_enough_memory() is called. But current->mm
    > is NULL and the kernel crashes.
    > If a user have to set OVERCOMMIT_NEVER, where should we fix?


    Calling into the file system code assuming that current->mm is
    NULL isn't safe and hasn't been for a very long time since someone added
    the 3% hack.

    The shmem case is actually a bit special so my thoughts are:

    Make security_vm_enough_memory() WARN() if current->mm = NULL
    Make security_vm_enough_memory_mm() WARN() if the passed mm = NULL
    Add security_vm_enough_memory_fs() which does not do the warning test

    All would still call security->ops->vm_enough_memory and then
    __vm_enough_memory() would skip the 3% adjustment when the passed mm was
    NULL

    Does that sound sensible ?

    Alan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


    > I guess
    > - people don't care overcommit and leave it as default, so they don't
    > meet the problem
    > - people who cares overcommit has rich memory, and they don't meet the
    > problem too.


    Correction: the problem is unrelated to the memory size.
    I was sleepy when I wrote that.
    Regardless the memory size, when nfsd writes to tmpfs the system surely
    crashes.


    Junjiro R. Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

    > In your first option, write() to the exported tmpfs will produce the
    > warning on nfs server even if much memory is left. I don't think it is a


    No - the shmem routine would use vm_enough_memory_fs which wouldn't care
    if current->mm is NULL, but the others would check.

    > good idea.
    > I'd suggest to make __vm_enough_memory() would skip the 3% adjustment
    > only.
    >
    > --- /src/linux-2.6/linux-2.6.27/mm/mmap.c 2008-10-10 07:13:53.000000000 +0900
    > +++ /tmp/mmap.c 2008-10-22 08:07:09.000000000 +0900
    > @@ -173,9 +173,10 @@
    > allowed -= allowed / 32;
    > allowed += total_swap_pages;
    >
    > - /* Don't let a single process grow too big:
    > + /* Don't let a single user process grow too big:
    > leave 3% of the size of this process for other processes */
    > - allowed -= mm->total_vm / 32;
    > + if (mm)
    > + allowed -= mm->total_vm / 32;


    Doing this means we lose the ability to trap incorrect calls to
    vm_enough_memory.

    Alan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


    Alan Cox:
    > No - the shmem routine would use vm_enough_memory_fs which wouldn't care
    > if current->mm is NULL, but the others would check.


    Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
    Would you show me its draft or pseudo code?


    Junjiro R. Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

    On Wed, 22 Oct 2008 20:26:19 +0900
    hooanon05@yahoo.co.jp wrote:

    >
    > Alan Cox:
    > > No - the shmem routine would use vm_enough_memory_fs which wouldn't care
    > > if current->mm is NULL, but the others would check.

    >
    > Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
    > Would you show me its draft or pseudo code?


    This is the patch I propose:

    nfsd: Fix vm overcommit crash

    From: Alan Cox

    Junjiro R. Okajima reported a problem where knfsd crashes if you are using
    it to export shmemfs objects and run strict overcommit. In this situation
    the current->mm based modifier to the overcommit goes through a NULL
    pointer.

    We could simply check for NULL and skip the modifier but we've caught other
    real bugs in the past from mm being NULL here - cases where we did need a
    valid mm set up (eg the exec bug about a year ago).

    To preserve the checks and get the logic we want shuffle the checking
    around and add a new helper to the vm_ security wrappers

    Also fix a current->mm reference in nommu that should use the passed mm
    ---

    include/linux/security.h | 1 +
    mm/mmap.c | 3 ++-
    mm/nommu.c | 3 ++-
    mm/shmem.c | 4 ++--
    security/security.c | 9 +++++++++
    5 files changed, 16 insertions(+), 4 deletions(-)


    diff --git a/include/linux/security.h b/include/linux/security.h
    index f5c4a51..a2b8430 100644
    --- a/include/linux/security.h
    +++ b/include/linux/security.h
    @@ -1585,6 +1585,7 @@ int security_syslog(int type);
    int security_settime(struct timespec *ts, struct timezone *tz);
    int security_vm_enough_memory(long pages);
    int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
    +int security_vm_enough_memory_kern(long pages);
    int security_bprm_alloc(struct linux_binprm *bprm);
    void security_bprm_free(struct linux_binprm *bprm);
    void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
    diff --git a/mm/mmap.c b/mm/mmap.c
    index 74f4d15..de14ac2 100644
    --- a/mm/mmap.c
    +++ b/mm/mmap.c
    @@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

    /* Don't let a single process grow too big:
    leave 3% of the size of this process for other processes */
    - allowed -= mm->total_vm / 32;
    + if (mm)
    + allowed -= mm->total_vm / 32;

    /*
    * cast `allowed' as a signed long because vm_committed_space
    diff --git a/mm/nommu.c b/mm/nommu.c
    index 2696b24..7695dc8 100644
    --- a/mm/nommu.c
    +++ b/mm/nommu.c
    @@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

    /* Don't let a single process grow too big:
    leave 3% of the size of this process for other processes */
    - allowed -= current->mm->total_vm / 32;
    + if (mm)
    + allowed -= mm->total_vm / 32;

    /*
    * cast `allowed' as a signed long because vm_committed_space
    diff --git a/mm/shmem.c b/mm/shmem.c
    index d38d7e6..1677b3e 100644
    --- a/mm/shmem.c
    +++ b/mm/shmem.c
    @@ -162,7 +162,7 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
    static inline int shmem_acct_size(unsigned long flags, loff_t size)
    {
    return (flags & VM_ACCOUNT)?
    - security_vm_enough_memory(VM_ACCT(size)): 0;
    + security_vm_enough_memory_kern(VM_ACCT(size)): 0;
    }

    static inline void shmem_unacct_size(unsigned long flags, loff_t size)
    @@ -180,7 +180,7 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
    static inline int shmem_acct_block(unsigned long flags)
    {
    return (flags & VM_ACCOUNT)?
    - 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE) );
    + 0: security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_ SIZE));
    }

    static inline void shmem_unacct_blocks(unsigned long flags, long pages)
    diff --git a/security/security.c b/security/security.c
    index 255b085..c0acfa7 100644
    --- a/security/security.c
    +++ b/security/security.c
    @@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)

    int security_vm_enough_memory(long pages)
    {
    + WARN_ON(current->mm == NULL);
    return security_ops->vm_enough_memory(current->mm, pages);
    }

    int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
    {
    + WARN_ON(mm == NULL);
    return security_ops->vm_enough_memory(mm, pages);
    }

    +int security_vm_enough_memory_kern(long pages)
    +{
    + /* If current->mm is a kernel thread then we will pass NULL,
    + for this specific case that is fine */
    + return security_ops->vm_enough_memory(current->mm, pages);
    +}
    +
    int security_bprm_alloc(struct linux_binprm *bprm)
    {
    return security_ops->bprm_alloc_security(bprm);
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


    Alan Cox:
    > This is the patch I propose:

    :::
    > We could simply check for NULL and skip the modifier but we've caught other
    > real bugs in the past from mm being NULL here - cases where we did need a
    > valid mm set up (eg the exec bug about a year ago).


    OK.
    I didn't know there was an actual bug but now I understand why you want
    to insert WARN_ON().

    I have nothing to say about your patch.


    Thank you
    Junjiro R. Okajima
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

    On Wed, 22 Oct 2008, Alan Cox wrote:

    [Adding lsm list to the cc]

    > On Wed, 22 Oct 2008 20:26:19 +0900
    > hooanon05@yahoo.co.jp wrote:
    >
    > >
    > > Alan Cox:
    > > > No - the shmem routine would use vm_enough_memory_fs which wouldn't care
    > > > if current->mm is NULL, but the others would check.

    > >
    > > Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
    > > Would you show me its draft or pseudo code?

    >
    > This is the patch I propose:
    >
    > nfsd: Fix vm overcommit crash
    >
    > From: Alan Cox
    >
    > Junjiro R. Okajima reported a problem where knfsd crashes if you are using
    > it to export shmemfs objects and run strict overcommit. In this situation
    > the current->mm based modifier to the overcommit goes through a NULL
    > pointer.
    >
    > We could simply check for NULL and skip the modifier but we've caught other
    > real bugs in the past from mm being NULL here - cases where we did need a
    > valid mm set up (eg the exec bug about a year ago).
    >
    > To preserve the checks and get the logic we want shuffle the checking
    > around and add a new helper to the vm_ security wrappers
    >
    > Also fix a current->mm reference in nommu that should use the passed mm


    Looks ok to me.

    Acked-by: James Morris


    > ---
    >
    > include/linux/security.h | 1 +
    > mm/mmap.c | 3 ++-
    > mm/nommu.c | 3 ++-
    > mm/shmem.c | 4 ++--
    > security/security.c | 9 +++++++++
    > 5 files changed, 16 insertions(+), 4 deletions(-)
    >
    >
    > diff --git a/include/linux/security.h b/include/linux/security.h
    > index f5c4a51..a2b8430 100644
    > --- a/include/linux/security.h
    > +++ b/include/linux/security.h
    > @@ -1585,6 +1585,7 @@ int security_syslog(int type);
    > int security_settime(struct timespec *ts, struct timezone *tz);
    > int security_vm_enough_memory(long pages);
    > int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
    > +int security_vm_enough_memory_kern(long pages);
    > int security_bprm_alloc(struct linux_binprm *bprm);
    > void security_bprm_free(struct linux_binprm *bprm);
    > void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
    > diff --git a/mm/mmap.c b/mm/mmap.c
    > index 74f4d15..de14ac2 100644
    > --- a/mm/mmap.c
    > +++ b/mm/mmap.c
    > @@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
    >
    > /* Don't let a single process grow too big:
    > leave 3% of the size of this process for other processes */
    > - allowed -= mm->total_vm / 32;
    > + if (mm)
    > + allowed -= mm->total_vm / 32;
    >
    > /*
    > * cast `allowed' as a signed long because vm_committed_space
    > diff --git a/mm/nommu.c b/mm/nommu.c
    > index 2696b24..7695dc8 100644
    > --- a/mm/nommu.c
    > +++ b/mm/nommu.c
    > @@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
    >
    > /* Don't let a single process grow too big:
    > leave 3% of the size of this process for other processes */
    > - allowed -= current->mm->total_vm / 32;
    > + if (mm)
    > + allowed -= mm->total_vm / 32;
    >
    > /*
    > * cast `allowed' as a signed long because vm_committed_space
    > diff --git a/mm/shmem.c b/mm/shmem.c
    > index d38d7e6..1677b3e 100644
    > --- a/mm/shmem.c
    > +++ b/mm/shmem.c
    > @@ -162,7 +162,7 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
    > static inline int shmem_acct_size(unsigned long flags, loff_t size)
    > {
    > return (flags & VM_ACCOUNT)?
    > - security_vm_enough_memory(VM_ACCT(size)): 0;
    > + security_vm_enough_memory_kern(VM_ACCT(size)): 0;
    > }
    >
    > static inline void shmem_unacct_size(unsigned long flags, loff_t size)
    > @@ -180,7 +180,7 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
    > static inline int shmem_acct_block(unsigned long flags)
    > {
    > return (flags & VM_ACCOUNT)?
    > - 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE) );
    > + 0: security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_ SIZE));
    > }
    >
    > static inline void shmem_unacct_blocks(unsigned long flags, long pages)
    > diff --git a/security/security.c b/security/security.c
    > index 255b085..c0acfa7 100644
    > --- a/security/security.c
    > +++ b/security/security.c
    > @@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)
    >
    > int security_vm_enough_memory(long pages)
    > {
    > + WARN_ON(current->mm == NULL);
    > return security_ops->vm_enough_memory(current->mm, pages);
    > }
    >
    > int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
    > {
    > + WARN_ON(mm == NULL);
    > return security_ops->vm_enough_memory(mm, pages);
    > }
    >
    > +int security_vm_enough_memory_kern(long pages)
    > +{
    > + /* If current->mm is a kernel thread then we will pass NULL,
    > + for this specific case that is fine */
    > + return security_ops->vm_enough_memory(current->mm, pages);
    > +}
    > +
    > int security_bprm_alloc(struct linux_binprm *bprm)
    > {
    > return security_ops->bprm_alloc_security(bprm);
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >


    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread