[PATCH 1/2] coredump_filter: add hugepage dumping v3 - Kernel

This is a discussion on [PATCH 1/2] coredump_filter: add hugepage dumping v3 - Kernel ; Changelog ----------------- v2 -> v3 - separated /proc/[pid]/core_dump_filter bits into shared and private mapping pages. - updated document v1 -> v2 - updated document ============================================ Subject: [PATCH v3] coredump_filter: add hugepage core dumping Now, hugepage's vma has VM_RESERVED flag in ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: [PATCH 1/2] coredump_filter: add hugepage dumping v3

  1. [PATCH 1/2] coredump_filter: add hugepage dumping v3


    Changelog
    -----------------
    v2 -> v3
    - separated /proc/[pid]/core_dump_filter bits into
    shared and private mapping pages.
    - updated document

    v1 -> v2
    - updated document


    ============================================
    Subject: [PATCH v3] coredump_filter: add hugepage core dumping

    Now, hugepage's vma has VM_RESERVED flag in order not to being swapped.
    But VM_RESERVED vma isn't core dumped because this flag is often used for
    some kernel vmas (e.g. vmalloc, sound related).

    Then hugepage is never dumped and it can't be debugged easily.
    Many developers want hugepages to be included into core-dump.

    However, We can't read VM_RESERVED area in almost case because this area is
    often used for I/O mapping area.
    then these area reading may change device state and it is definitly undesiable
    side-effect.

    So, To add hugepage specific bit of the coredump filter is better.
    it will be able to hugepage core dumping and doesn't cause any side-effect
    to any I/O devices.

    In additional, libhugetlb use hugetlb private mapping pages as anonymous page.
    Then, hugepage private mapping pages should be core dumped by default.


    Then, /proc/[pid]/core_dump_filter has two new bits.

    - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
    - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no)


    I tested by following method.

    % ulimit -c unlimited
    % ./crash_hugepage 50
    % ./crash_hugepage 50 -p
    % ls -lh
    % gdb ./crash_hugepage core
    %
    % echo 0x43 > /proc/self/coredump_filter
    % ./crash_hugepage 50
    % ./crash_hugepage 50 -p
    % ls -lh
    % gdb ./crash_hugepage core

    crash_hugepage.c
    ------------------------------------------------
    #include
    #include
    #include
    #include
    #include

    #include "hugetlbfs.h"

    int main(int argc, char** argv){
    char* p;
    int ch;
    int mmap_flags = MAP_SHARED;
    int fd;
    int nr_pages;

    while((ch = getopt(argc, argv, "p")) != -1) {
    switch (ch) {
    case 'p':
    mmap_flags &= ~MAP_SHARED;
    mmap_flags |= MAP_PRIVATE;
    break;
    default:
    /* nothing*/
    break;
    }
    }
    argc -= optind;
    argv += optind;

    if (argc == 0){
    printf("need # of pages\n");
    exit(1);
    }

    nr_pages = atoi(argv[0]);
    if (nr_pages < 2) {
    printf("nr_pages must >2\n");
    exit(1);
    }

    fd = hugetlbfs_unlinked_fd();
    p = mmap(NULL, nr_pages * gethugepagesize(),
    PROT_READ|PROT_WRITE, mmap_flags, fd, 0);

    sleep(2);

    *(p + gethugepagesize()) = 1; /* COW */
    sleep(2);

    /* crash! */
    *(int*)0 = 1;

    return 0;
    }

    -----------------------------------------------------


    Signed-off-by: KOSAKI Motohiro
    CC: Kawai Hidehiro
    CC: Hugh Dickins
    CC: Adam Litke
    CC: Mel Gorman

    ---
    Documentation/filesystems/proc.txt | 15 ++++++++++-----
    fs/binfmt_elf.c | 12 ++++++++++--
    include/linux/sched.h | 7 +++++--
    3 files changed, 25 insertions(+), 9 deletions(-)

    Index: b/Documentation/filesystems/proc.txt
    ================================================== =================
    --- a/Documentation/filesystems/proc.txt 2008-09-25 21:19:13.000000000 +0900
    +++ b/Documentation/filesystems/proc.txt 2008-09-25 21:21:05.000000000 +0900
    @@ -2408,24 +2408,29 @@ will be dumped when the process is
    of memory types. If a bit of the bitmask is set, memory segments of the
    corresponding memory type are dumped, otherwise they are not dumped.

    -The following 4 memory types are supported:
    +The following 7 memory types are supported:
    - (bit 0) anonymous private memory
    - (bit 1) anonymous shared memory
    - (bit 2) file-backed private memory
    - (bit 3) file-backed shared memory
    - (bit 4) ELF header pages in file-backed private memory areas (it is
    effective only if the bit 2 is cleared)
    + - (bit 5) hugetlb private memory
    + - (bit 6) hugetlb shared memory

    Note that MMIO pages such as frame buffer are never dumped and vDSO pages
    are always dumped regardless of the bitmask status.

    -Default value of coredump_filter is 0x3; this means all anonymous memory
    -segments are dumped.
    + Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
    + effected by bit 5-6.
    +
    +Default value of coredump_filter is 0x23; this means all anonymous memory
    +segments and hugetlb private memory are dumped.

    If you don't want to dump all shared memory segments attached to pid 1234,
    -write 1 to the process's proc file.
    +write 21 to the process's proc file.

    - $ echo 0x1 > /proc/1234/coredump_filter
    + $ echo 0x21 > /proc/1234/coredump_filter

    When a new process is created, the process inherits the bitmask status from its
    parent. It is useful to set up coredump_filter before the program runs.
    Index: b/fs/binfmt_elf.c
    ================================================== =================
    --- a/fs/binfmt_elf.c 2008-09-25 21:18:58.000000000 +0900
    +++ b/fs/binfmt_elf.c 2008-09-25 21:19:16.000000000 +0900
    @@ -1156,16 +1156,24 @@ static int dump_seek(struct file *file,
    static unsigned long vma_dump_size(struct vm_area_struct *vma,
    unsigned long mm_flags)
    {
    +#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
    +
    /* The vma can be set up to tell us the answer directly. */
    if (vma->vm_flags & VM_ALWAYSDUMP)
    goto whole;

    + /* Hugetlb memory check */
    + if (vma->vm_flags & VM_HUGETLB) {
    + if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
    + goto whole;
    + if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
    + goto whole;
    + }
    +
    /* Do not dump I/O mapped devices or special mappings */
    if (vma->vm_flags & (VM_IO | VM_RESERVED))
    return 0;

    -#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
    -
    /* By default, dump shared memory if mapped from an anonymous file. */
    if (vma->vm_flags & VM_SHARED) {
    if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0 ?
    Index: b/include/linux/sched.h
    ================================================== =================
    --- a/include/linux/sched.h 2008-09-25 21:18:58.000000000 +0900
    +++ b/include/linux/sched.h 2008-09-25 21:19:16.000000000 +0900
    @@ -403,12 +403,15 @@ extern int get_dumpable(struct mm_struct
    #define MMF_DUMP_MAPPED_PRIVATE 4
    #define MMF_DUMP_MAPPED_SHARED 5
    #define MMF_DUMP_ELF_HEADERS 6
    +#define MMF_DUMP_HUGETLB_PRIVATE 7
    +#define MMF_DUMP_HUGETLB_SHARED 8
    #define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
    -#define MMF_DUMP_FILTER_BITS 5
    +#define MMF_DUMP_FILTER_BITS 7
    #define MMF_DUMP_FILTER_MASK \
    (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
    #define MMF_DUMP_FILTER_DEFAULT \
    - ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED))
    + ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\
    + (1 << MMF_DUMP_HUGETLB_PRIVATE))

    struct sighand_struct {
    atomic_t count;


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 2/2] hugepage: support ZERO_PAGE()

    Changelog:
    v1 -> v3
    o Coding style fix (Thanks mel, adam)


    ==========================================
    Subject: [PATCH v3] hugepage: support ZERO_PAGE()

    Now, hugepage doesn't use zero page at all because almost zero page is only used
    for coredumping and hugepage can't core dump ago.

    However now, we implemented hugepage coredumping. therefore we should implement
    the zero page of hugepage.

    This patch do it.


    Implementation note:
    -------------------------------------------------------------
    o Why do we only check VM_SHARED for zero page?
    normal page checked as ..

    static inline int use_zero_page(struct vm_area_struct *vma)
    {
    if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
    return 0;

    return !vma->vm_ops || !vma->vm_ops->fault;
    }

    First, hugepages never mlock()ed. we don't need concern to VM_LOCKED.

    Second, hugetlbfs is pseudo filesystem, not real filesystem and it doesn't
    have any file backing.
    Then, ops->fault checking is meaningless.


    o Why don't we use zero page if !pte.

    !pte indicate {pud, pmd} doesn't exist or any error happend.
    So, We shouldn't return zero page if any error happend.


    Signed-off-by: KOSAKI Motohiro
    CC: Adam Litke
    CC: Hugh Dickins
    CC: Kawai Hidehiro
    CC: Mel Gorman

    ---
    mm/hugetlb.c | 22 +++++++++++++++++++---
    1 file changed, 19 insertions(+), 3 deletions(-)

    Index: b/mm/hugetlb.c
    ================================================== =================
    --- a/mm/hugetlb.c 2008-09-25 21:22:41.000000000 +0900
    +++ b/mm/hugetlb.c 2008-09-26 02:54:10.000000000 +0900
    @@ -2071,6 +2071,14 @@ follow_huge_pud(struct mm_struct *mm, un
    return NULL;
    }

    +static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
    +{
    + if (!ptep || write || shared)
    + return 0;
    + else
    + return huge_pte_none(huge_ptep_get(ptep));
    +}
    +
    int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
    struct page **pages, struct vm_area_struct **vmas,
    unsigned long *position, int *length, int i,
    @@ -2080,6 +2088,8 @@ int follow_hugetlb_page(struct mm_struct
    unsigned long vaddr = *position;
    int remainder = *length;
    struct hstate *h = hstate_vma(vma);
    + int zeropage_ok = 0;
    + int shared = vma->vm_flags & VM_SHARED;

    spin_lock(&mm->page_table_lock);
    while (vaddr < vma->vm_end && remainder) {
    @@ -2092,8 +2102,11 @@ int follow_hugetlb_page(struct mm_struct
    * first, for the page indexing below to work.
    */
    pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
    + if (huge_zeropage_ok(pte, write, shared))
    + zeropage_ok = 1;

    - if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
    + if (!pte ||
    + (huge_pte_none(huge_ptep_get(pte)) && !zeropage_ok) ||
    (write && !pte_write(huge_ptep_get(pte)))) {
    int ret;

    @@ -2113,8 +2126,11 @@ int follow_hugetlb_page(struct mm_struct
    page = pte_page(huge_ptep_get(pte));
    same_page:
    if (pages) {
    - get_page(page);
    - pages[i] = page + pfn_offset;
    + if (zeropage_ok)
    + pages[i] = ZERO_PAGE(0);
    + else
    + pages[i] = page + pfn_offset;
    + get_page(pages[i]);
    }

    if (vmas)


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/2] coredump_filter: add hugepage dumping v3

    Hi Kosaki-san,

    KOSAKI Motohiro wrote:
    > Index: b/Documentation/filesystems/proc.txt
    > ================================================== =================
    > --- a/Documentation/filesystems/proc.txt 2008-09-25 21:19:13.000000000 +0900
    > +++ b/Documentation/filesystems/proc.txt 2008-09-25 21:21:05.000000000 +0900
    > @@ -2408,24 +2408,29 @@ will be dumped when the process is
    > of memory types. If a bit of the bitmask is set, memory segments of the
    > corresponding memory type are dumped, otherwise they are not dumped.
    >
    > -The following 4 memory types are supported:
    > +The following 7 memory types are supported:
    > - (bit 0) anonymous private memory
    > - (bit 1) anonymous shared memory
    > - (bit 2) file-backed private memory
    > - (bit 3) file-backed shared memory
    > - (bit 4) ELF header pages in file-backed private memory areas (it is
    > effective only if the bit 2 is cleared)
    > + - (bit 5) hugetlb private memory
    > + - (bit 6) hugetlb shared memory
    >
    > Note that MMIO pages such as frame buffer are never dumped and vDSO pages
    > are always dumped regardless of the bitmask status.
    >
    > -Default value of coredump_filter is 0x3; this means all anonymous memory
    > -segments are dumped.
    > + Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
    > + effected by bit 5-6.
    > +
    > +Default value of coredump_filter is 0x23; this means all anonymous memory
    > +segments and hugetlb private memory are dumped.
    >
    > If you don't want to dump all shared memory segments attached to pid 1234,
    > -write 1 to the process's proc file.
    > +write 21 to the process's proc file.


    This should be:
    write 0x21 to the process's proc file.

    Except for this, it seems OK. Thanks.

    Reviewed-by: Hidehiro Kawai

    > - $ echo 0x1 > /proc/1234/coredump_filter
    > + $ echo 0x21 > /proc/1234/coredump_filter
    >
    > When a new process is created, the process inherits the bitmask status from its
    > parent. It is useful to set up coredump_filter before the program runs.


    --
    Hidehiro Kawai
    Hitachi, Systems Development Laboratory
    Linux Technology Center

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH] coredump_filter: add hugepage dumping v4

    Hi Kawai-san,

    Thanks!


    > This should be:
    > write 0x21 to the process's proc file.
    >
    > Except for this, it seems OK. Thanks.
    >
    > Reviewed-by: Hidehiro Kawai
    >
    > > - $ echo 0x1 > /proc/1234/coredump_filter
    > > + $ echo 0x21 > /proc/1234/coredump_filter
    > >
    > > When a new process is created, the process inherits the bitmask status from its
    > > parent. It is useful to set up coredump_filter before the program runs.


    new version is attached.



    ============================================
    From: KOSAKI Motohiro
    Subject: [PATCH v4] coredump_filter: add hugepage core dumping

    Changelog
    -----------------
    v3 -> v4
    - fixed documentation typo.

    v2 -> v3
    - separated /proc/[pid]/core_dump_filter bits into
    shared and private mapping pages.
    - updated document

    v1 -> v2
    - updated document



    Now, hugepage's vma has VM_RESERVED flag in order not to being swapped.
    But VM_RESERVED vma isn't core dumped because this flag is often used for
    some kernel vmas (e.g. vmalloc, sound related).

    Then hugepage is never dumped and it can't be debugged easily.
    Many developers want hugepages to be included into core-dump.

    However, We can't read generic VM_RESERVED area because this area is often
    IO mapping area.
    then these area reading may change device state. it is definitly undesiable
    side-effect.

    So, To add hugepage specific bit of the coredump filter is better.
    it will be able to hugepage core dumping and doesn't cause any side-effect
    to any i/o devices.

    In additional, libhugetlb use hugetlb private mapping pages as anonymous page.
    Then, hugepage private mapping pages should be core dumped by default.


    Then, /proc/[pid]/core_dump_filter has two new bits.

    - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
    - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no)


    I tested by following method.

    % ulimit -c unlimited
    % ./crash_hugepage 50
    % ./crash_hugepage 50 -p
    % ls -lh
    % gdb ./crash_hugepage core
    %
    % echo 0x43 > /proc/self/coredump_filter
    % ./crash_hugepage 50
    % ./crash_hugepage 50 -p
    % ls -lh
    % gdb ./crash_hugepage core

    crash_hugepage.c
    ------------------------------------------------
    #include
    #include
    #include
    #include
    #include

    #include "hugetlbfs.h"

    int main(int argc, char** argv){
    char* p;
    int ch;
    int mmap_flags = MAP_SHARED;
    int fd;
    int nr_pages;

    while((ch = getopt(argc, argv, "p")) != -1) {
    switch (ch) {
    case 'p':
    mmap_flags &= ~MAP_SHARED;
    mmap_flags |= MAP_PRIVATE;
    break;
    default:
    /* nothing*/
    break;
    }
    }
    argc -= optind;
    argv += optind;

    if (argc == 0){
    printf("need # of pages\n");
    exit(1);
    }

    nr_pages = atoi(argv[0]);
    if (nr_pages < 2) {
    printf("nr_pages must >2\n");
    exit(1);
    }

    fd = hugetlbfs_unlinked_fd();
    p = mmap(NULL, nr_pages * gethugepagesize(),
    PROT_READ|PROT_WRITE, mmap_flags, fd, 0);

    sleep(2);

    *(p + gethugepagesize()) = 1; /* COW */
    sleep(2);

    /* crash! */
    *(int*)0 = 1;

    return 0;
    }

    -----------------------------------------------------


    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Kawai Hidehiro
    CC: Hugh Dickins
    CC: William Irwin
    CC: Adam Litke

    ---
    Documentation/filesystems/proc.txt | 15 ++++++++++-----
    fs/binfmt_elf.c | 12 ++++++++++--
    include/linux/sched.h | 7 +++++--
    3 files changed, 25 insertions(+), 9 deletions(-)

    Index: b/Documentation/filesystems/proc.txt
    ================================================== =================
    --- a/Documentation/filesystems/proc.txt 2008-09-25 21:19:13.000000000 +0900
    +++ b/Documentation/filesystems/proc.txt 2008-09-25 21:21:05.000000000 +0900
    @@ -2408,24 +2408,29 @@ will be dumped when the process is
    of memory types. If a bit of the bitmask is set, memory segments of the
    corresponding memory type are dumped, otherwise they are not dumped.

    -The following 4 memory types are supported:
    +The following 7 memory types are supported:
    - (bit 0) anonymous private memory
    - (bit 1) anonymous shared memory
    - (bit 2) file-backed private memory
    - (bit 3) file-backed shared memory
    - (bit 4) ELF header pages in file-backed private memory areas (it is
    effective only if the bit 2 is cleared)
    + - (bit 5) hugetlb private memory
    + - (bit 6) hugetlb shared memory

    Note that MMIO pages such as frame buffer are never dumped and vDSO pages
    are always dumped regardless of the bitmask status.

    -Default value of coredump_filter is 0x3; this means all anonymous memory
    -segments are dumped.
    + Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
    + effected by bit 5-6.
    +
    +Default value of coredump_filter is 0x23; this means all anonymous memory
    +segments and hugetlb private memory are dumped.

    If you don't want to dump all shared memory segments attached to pid 1234,
    -write 1 to the process's proc file.
    +write 0x21 to the process's proc file.

    - $ echo 0x1 > /proc/1234/coredump_filter
    + $ echo 0x21 > /proc/1234/coredump_filter

    When a new process is created, the process inherits the bitmask status from its
    parent. It is useful to set up coredump_filter before the program runs.
    Index: b/fs/binfmt_elf.c
    ================================================== =================
    --- a/fs/binfmt_elf.c 2008-09-25 21:18:58.000000000 +0900
    +++ b/fs/binfmt_elf.c 2008-09-25 21:19:16.000000000 +0900
    @@ -1156,16 +1156,24 @@ static int dump_seek(struct file *file,
    static unsigned long vma_dump_size(struct vm_area_struct *vma,
    unsigned long mm_flags)
    {
    +#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
    +
    /* The vma can be set up to tell us the answer directly. */
    if (vma->vm_flags & VM_ALWAYSDUMP)
    goto whole;

    + /* Hugetlb memory check */
    + if (vma->vm_flags & VM_HUGETLB) {
    + if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
    + goto whole;
    + if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
    + goto whole;
    + }
    +
    /* Do not dump I/O mapped devices or special mappings */
    if (vma->vm_flags & (VM_IO | VM_RESERVED))
    return 0;

    -#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
    -
    /* By default, dump shared memory if mapped from an anonymous file. */
    if (vma->vm_flags & VM_SHARED) {
    if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0 ?
    Index: b/include/linux/sched.h
    ================================================== =================
    --- a/include/linux/sched.h 2008-09-25 21:18:58.000000000 +0900
    +++ b/include/linux/sched.h 2008-09-25 21:19:16.000000000 +0900
    @@ -403,12 +403,15 @@ extern int get_dumpable(struct mm_struct
    #define MMF_DUMP_MAPPED_PRIVATE 4
    #define MMF_DUMP_MAPPED_SHARED 5
    #define MMF_DUMP_ELF_HEADERS 6
    +#define MMF_DUMP_HUGETLB_PRIVATE 7
    +#define MMF_DUMP_HUGETLB_SHARED 8
    #define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
    -#define MMF_DUMP_FILTER_BITS 5
    +#define MMF_DUMP_FILTER_BITS 7
    #define MMF_DUMP_FILTER_MASK \
    (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
    #define MMF_DUMP_FILTER_DEFAULT \
    - ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED))
    + ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\
    + (1 << MMF_DUMP_HUGETLB_PRIVATE))

    struct sighand_struct {
    atomic_t count;


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread