[PATCH 0/5] ftrace update patches - Kernel

This is a discussion on [PATCH 0/5] ftrace update patches - Kernel ; This patch series contains various fixes, clean ups and commenting that I've been doing today on ftrace. I need to go off and do other things right now, but I wanted these to get out before the weekend. -- Steve ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: [PATCH 0/5] ftrace update patches

  1. [PATCH 0/5] ftrace update patches

    This patch series contains various fixes, clean ups and commenting that
    I've been doing today on ftrace. I need to go off and do other things
    right now, but I wanted these to get out before the weekend.

    -- Steve
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 1/5] ftrace: simple clean ups

    Andrew Morton mentioned some clean ups that should be done to ftrace.
    This patch does some of the simple clean ups.

    Signed-off-by: Steven Rostedt
    ---
    kernel/trace/trace.c | 25 ++++++++++++-------------
    1 file changed, 12 insertions(+), 13 deletions(-)

    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 15:47:22.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 15:47:40.000000000 -0400
    @@ -36,8 +36,7 @@ unsigned long __read_mostly tracing_max_
    unsigned long __read_mostly tracing_thresh;

    /* dummy trace to disable tracing */
    -static struct tracer no_tracer __read_mostly =
    -{
    +static struct tracer no_tracer __read_mostly = {
    .name = "none",
    };

    @@ -1961,8 +1960,8 @@ tracing_iter_ctrl_write(struct file *fil
    int neg = 0;
    int i;

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2054,8 +2053,8 @@ tracing_ctrl_write(struct file *filp, co
    long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2154,10 +2153,10 @@ tracing_max_lat_read(struct file *filp,
    char buf[64];
    int r;

    - r = snprintf(buf, 64, "%ld\n",
    + r = snprintf(buf, sizeof(buf), "%ld\n",
    *ptr == (unsigned long)-1 ? -1 : nsecs_to_usecs(*ptr));
    - if (r > 64)
    - r = 64;
    + if (r > sizeof(buf))
    + r = sizeof(buf);
    return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
    }

    @@ -2169,8 +2168,8 @@ tracing_max_lat_write(struct file *filp,
    long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2434,8 +2433,8 @@ tracing_entries_write(struct file *filp,
    unsigned long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;

    --
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 2/5] ftrace: replace simple_strtoul with strict_strtoul

    Andrew Morton suggested using strict_strtoul over simple_strtoul.
    This patch replaces them in ftrace.

    Signed-off-by: Steven Rostedt
    ---
    kernel/trace/trace.c | 28 ++++++++++++++++++++++------
    1 file changed, 22 insertions(+), 6 deletions(-)

    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 15:47:40.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 15:49:09.000000000 -0400
    @@ -92,9 +92,16 @@ void trace_wake_up(void)

    static int __init set_nr_entries(char *str)
    {
    + unsigned long nr_entries;
    + int ret;
    +
    if (!str)
    return 0;
    - trace_nr_entries = simple_strtoul(str, &str, 0);
    + ret = strict_strtoul(str, 0, &nr_entries);
    + /* nr_entries can not be zero */
    + if (ret < 0 || nr_entries == 0)
    + return 0;
    + trace_nr_entries = nr_entries;
    return 1;
    }
    __setup("trace_entries=", set_nr_entries);
    @@ -2050,8 +2057,9 @@ tracing_ctrl_write(struct file *filp, co
    size_t cnt, loff_t *ppos)
    {
    struct trace_array *tr = filp->private_data;
    - long val;
    char buf[64];
    + long val;
    + int ret;

    if (cnt >= sizeof(buf))
    return -EINVAL;
    @@ -2061,7 +2069,9 @@ tracing_ctrl_write(struct file *filp, co

    buf[cnt] = 0;

    - val = simple_strtoul(buf, NULL, 10);
    + ret = strict_strtoul(buf, 10, &val);
    + if (ret < 0)
    + return ret;

    val = !!val;

    @@ -2165,8 +2175,9 @@ tracing_max_lat_write(struct file *filp,
    size_t cnt, loff_t *ppos)
    {
    long *ptr = filp->private_data;
    - long val;
    char buf[64];
    + long val;
    + int ret;

    if (cnt >= sizeof(buf))
    return -EINVAL;
    @@ -2176,7 +2187,9 @@ tracing_max_lat_write(struct file *filp,

    buf[cnt] = 0;

    - val = simple_strtoul(buf, NULL, 10);
    + ret = strict_strtoul(buf, 10, &val);
    + if (ret < 0)
    + return ret;

    *ptr = val * 1000;

    @@ -2432,6 +2445,7 @@ tracing_entries_write(struct file *filp,
    {
    unsigned long val;
    char buf[64];
    + int ret;

    if (cnt >= sizeof(buf))
    return -EINVAL;
    @@ -2441,7 +2455,9 @@ tracing_entries_write(struct file *filp,

    buf[cnt] = 0;

    - val = simple_strtoul(buf, NULL, 10);
    + ret = strict_strtoul(buf, 10, &val);
    + if (ret < 0)
    + return ret;

    /* must have at least 1 entry */
    if (!val)

    --
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 3/5] ftrace: modulize the number of CPU buffers

    Currently ftrace allocates a trace buffer for every possible CPU.
    Work is being done to change it to only online CPUs and add hooks
    to hotplug CPUS.

    This patch lays out the infrastructer for such a change.

    Signed-off-by: Steven Rostedt
    ---
    kernel/trace/trace.c | 38 ++++++++++++++++++++++++--------------
    1 file changed, 24 insertions(+), 14 deletions(-)

    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 15:50:42.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 15:50:47.000000000 -0400
    @@ -35,6 +35,12 @@
    unsigned long __read_mostly tracing_max_latency = (cycle_t)ULONG_MAX;
    unsigned long __read_mostly tracing_thresh;

    +static unsigned long __read_mostly tracing_nr_buffers;
    +static cpumask_t __read_mostly tracing_buffer_mask;
    +
    +#define for_each_tracing_cpu(cpu) \
    + for_each_cpu_mask(cpu, tracing_buffer_mask)
    +
    /* dummy trace to disable tracing */
    static struct tracer no_tracer __read_mostly = {
    .name = "none",
    @@ -341,7 +347,7 @@ update_max_tr(struct trace_array *tr, st
    WARN_ON_ONCE(!irqs_disabled());
    __raw_spin_lock(&ftrace_max_lock);
    /* clear out all the previous traces */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = tr->data[i];
    flip_trace(max_tr.data[i], data);
    tracing_reset(data);
    @@ -365,7 +371,7 @@ update_max_tr_single(struct trace_array

    WARN_ON_ONCE(!irqs_disabled());
    __raw_spin_lock(&ftrace_max_lock);
    - for_each_possible_cpu(i)
    + for_each_tracing_cpu(i)
    tracing_reset(max_tr.data[i]);

    flip_trace(max_tr.data[cpu], data);
    @@ -411,7 +417,7 @@ int register_tracer(struct tracer *type)
    * internal tracing to verify that everything is in order.
    * If we fail, we do not register this tracer.
    */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = tr->data[i];
    if (!head_page(data))
    continue;
    @@ -430,7 +436,7 @@ int register_tracer(struct tracer *type)
    goto out;
    }
    /* Only reset on passing, to avoid touching corrupted buffers */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = tr->data[i];
    if (!head_page(data))
    continue;
    @@ -902,7 +908,7 @@ find_next_entry(struct trace_iterator *i
    int next_cpu = -1;
    int cpu;

    - for_each_possible_cpu(cpu) {
    + for_each_tracing_cpu(cpu) {
    if (!head_page(tr->data[cpu]))
    continue;
    ent = trace_entry_idx(tr, tr->data[cpu], iter, cpu);
    @@ -1027,7 +1033,7 @@ static void *s_start(struct seq_file *m,
    iter->prev_ent = NULL;
    iter->prev_cpu = -1;

    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    iter->next_idx[i] = 0;
    iter->next_page[i] = NULL;
    }
    @@ -1144,7 +1150,7 @@ print_trace_header(struct seq_file *m, s
    if (type)
    name = type->name;

    - for_each_possible_cpu(cpu) {
    + for_each_tracing_cpu(cpu) {
    if (head_page(tr->data[cpu])) {
    total += tr->data[cpu]->trace_idx;
    if (tr->data[cpu]->trace_idx > tr->entries)
    @@ -1574,7 +1580,7 @@ static int trace_empty(struct trace_iter
    struct trace_array_cpu *data;
    int cpu;

    - for_each_possible_cpu(cpu) {
    + for_each_tracing_cpu(cpu) {
    data = iter->tr->data[cpu];

    if (head_page(data) && data->trace_idx &&
    @@ -1886,7 +1892,7 @@ tracing_cpumask_write(struct file *filp,

    raw_local_irq_disable();
    __raw_spin_lock(&ftrace_max_lock);
    - for_each_possible_cpu(cpu) {
    + for_each_tracing_cpu(cpu) {
    /*
    * Increase/decrease the disabled counter if we are
    * about to flip a bit in the cpumask:
    @@ -2364,7 +2370,7 @@ tracing_read_pipe(struct file *filp, cha
    ftrace_enabled = 0;
    #endif
    smp_wmb();
    - for_each_possible_cpu(cpu) {
    + for_each_tracing_cpu(cpu) {
    data = iter->tr->data[cpu];

    if (!head_page(data) || !data->trace_idx)
    @@ -2664,7 +2670,7 @@ static int trace_alloc_page(void)
    int i;

    /* first allocate a page for each CPU */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    array = (void *)__get_free_page(GFP_KERNEL);
    if (array == NULL) {
    printk(KERN_ERR "tracer: failed to allocate page"
    @@ -2689,7 +2695,7 @@ static int trace_alloc_page(void)
    }

    /* Now that we successfully allocate a page per CPU, add them */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = global_trace.data[i];
    page = list_entry(pages.next, struct page, lru);
    list_del_init(&page->lru);
    @@ -2725,7 +2731,7 @@ static int trace_free_page(void)
    int ret = 0;

    /* free one page from each buffer */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = global_trace.data[i];
    p = data->trace_pages.next;
    if (p == &data->trace_pages) {
    @@ -2776,8 +2782,12 @@ __init static int tracer_alloc_buffers(v

    global_trace.ctrl = tracer_enabled;

    + /* TODO: make the number of buffers hot pluggable with CPUS */
    + tracing_nr_buffers = num_possible_cpus();
    + tracing_buffer_mask = cpu_possible_map;
    +
    /* Allocate the first page for all buffers */
    - for_each_possible_cpu(i) {
    + for_each_tracing_cpu(i) {
    data = global_trace.data[i] = &per_cpu(global_trace_cpu, i);
    max_tr.data[i] = &per_cpu(max_data, i);


    --
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH 4/5] ftrace: limit trace entries

    Currently there is no protection from the root user to use up all of
    memory for trace buffers. If the root user allocates too many entries,
    the OOM killer might start kill off all tasks.

    This patch adds an algorith to check the following condition:

    pages_requested > (freeable_memory + current_trace_buffer_pages) / 4

    If the above is met then the allocation fails. The above prevents more
    than 1/4th of freeable memory from being used by trace buffers.

    To determine the freeable_memory, I made determine_dirtyable_memory in
    mm/page-writeback.c global.

    Special thanks goes to Peter Zijlstra for suggesting the above calculation.

    Signed-off-by: Steven Rostedt
    ---
    include/linux/writeback.h | 2 ++
    kernel/trace/trace.c | 38 ++++++++++++++++++++++++++++++++++++++
    mm/page-writeback.c | 10 +++++++---
    3 files changed, 47 insertions(+), 3 deletions(-)

    Index: linux-sched-devel.git/include/linux/writeback.h
    ================================================== =================
    --- linux-sched-devel.git.orig/include/linux/writeback.h 2008-04-18 15:47:21.000000000 -0400
    +++ linux-sched-devel.git/include/linux/writeback.h 2008-04-18 15:53:08.000000000 -0400
    @@ -105,6 +105,8 @@ extern int vm_highmem_is_dirtyable;
    extern int block_dump;
    extern int laptop_mode;

    +extern unsigned long determine_dirtyable_memory(void);
    +
    extern int dirty_ratio_handler(struct ctl_table *table, int write,
    struct file *filp, void __user *buffer, size_t *lenp,
    loff_t *ppos);
    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 15:50:47.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 15:53:08.000000000 -0400
    @@ -27,6 +27,7 @@
    #include
    #include
    #include
    +#include

    #include

    @@ -51,6 +52,8 @@ static int trace_free_page(void);

    static int tracing_disabled = 1;

    +static unsigned long tracing_pages_allocated;
    +
    long
    ns2usecs(cycle_t nsec)
    {
    @@ -2479,12 +2482,41 @@ tracing_entries_write(struct file *filp,
    }

    if (val > global_trace.entries) {
    + long pages_requested;
    + unsigned long freeable_pages;
    +
    + /* make sure we have enough memory before mapping */
    + pages_requested =
    + (val + (ENTRIES_PER_PAGE-1)) / ENTRIES_PER_PAGE;
    +
    + /* account for each buffer (and max_tr) */
    + pages_requested *= tracing_nr_buffers * 2;
    +
    + /* Check for overflow */
    + if (pages_requested < 0) {
    + cnt = -ENOMEM;
    + goto out;
    + }
    +
    + freeable_pages = determine_dirtyable_memory();
    +
    + /* we only allow to request 1/4 of useable memory */
    + if (pages_requested >
    + ((freeable_pages + tracing_pages_allocated) / 4)) {
    + cnt = -ENOMEM;
    + goto out;
    + }
    +
    while (global_trace.entries < val) {
    if (trace_alloc_page()) {
    cnt = -ENOMEM;
    goto out;
    }
    + /* double check that we don't go over the known pages */
    + if (tracing_pages_allocated > pages_requested)
    + break;
    }
    +
    } else {
    /* include the number of entries in val (inc of page entries) */
    while (global_trace.entries > val + (ENTRIES_PER_PAGE - 1))
    @@ -2667,6 +2699,7 @@ static int trace_alloc_page(void)
    struct page *page, *tmp;
    LIST_HEAD(pages);
    void *array;
    + unsigned pages_allocated = 0;
    int i;

    /* first allocate a page for each CPU */
    @@ -2678,6 +2711,7 @@ static int trace_alloc_page(void)
    goto free_pages;
    }

    + pages_allocated++;
    page = virt_to_page(array);
    list_add(&page->lru, &pages);

    @@ -2689,6 +2723,7 @@ static int trace_alloc_page(void)
    "for trace buffer!\n");
    goto free_pages;
    }
    + pages_allocated++;
    page = virt_to_page(array);
    list_add(&page->lru, &pages);
    #endif
    @@ -2710,6 +2745,7 @@ static int trace_alloc_page(void)
    SetPageLRU(page);
    #endif
    }
    + tracing_pages_allocated += pages_allocated;
    global_trace.entries += ENTRIES_PER_PAGE;

    return 0;
    @@ -2744,6 +2780,7 @@ static int trace_free_page(void)
    page = list_entry(p, struct page, lru);
    ClearPageLRU(page);
    list_del(&page->lru);
    + tracing_pages_allocated--;
    __free_page(page);

    tracing_reset(data);
    @@ -2761,6 +2798,7 @@ static int trace_free_page(void)
    page = list_entry(p, struct page, lru);
    ClearPageLRU(page);
    list_del(&page->lru);
    + tracing_pages_allocated--;
    __free_page(page);

    tracing_reset(data);
    Index: linux-sched-devel.git/mm/page-writeback.c
    ================================================== =================
    --- linux-sched-devel.git.orig/mm/page-writeback.c 2008-04-18 15:47:21.000000000 -0400
    +++ linux-sched-devel.git/mm/page-writeback.c 2008-04-18 15:53:08.000000000 -0400
    @@ -126,8 +126,6 @@ static void background_writeout(unsigned
    static struct prop_descriptor vm_completions;
    static struct prop_descriptor vm_dirties;

    -static unsigned long determine_dirtyable_memory(void);
    -
    /*
    * couple the period to the dirty_ratio:
    *
    @@ -286,7 +284,13 @@ static unsigned long highmem_dirtyable_m
    #endif
    }

    -static unsigned long determine_dirtyable_memory(void)
    +/**
    + * detremine_dirtyable_memory - amount of memory that may be used
    + *
    + * Returns the numebr of pages that can currently be freed and used
    + * by the kernel for direct mappings.
    + */
    +unsigned long determine_dirtyable_memory(void)
    {
    unsigned long x;


    --
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/5] ftrace: simple clean ups

    On Fri, 18 Apr 2008 16:05:39 -0400
    Steven Rostedt wrote:

    > - r = snprintf(buf, 64, "%ld\n",
    > + r = snprintf(buf, sizeof(buf), "%ld\n",


    If you use scnprintf here

    > *ptr == (unsigned long)-1 ? -1 : nsecs_to_usecs(*ptr));
    > - if (r > 64)
    > - r = 64;
    > + if (r > sizeof(buf))
    > + r = sizeof(buf);


    This becomes a cant-happen (I think).

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [PATCH 5/5] ftrace: comment code

    This is first installment of adding documentation to the ftrace.
    Expect many more patches of this kind in the near future.

    Signed-off-by: Steven Rostedt
    ---
    kernel/trace/trace.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++-
    kernel/trace/trace.h | 7 ++
    2 files changed, 141 insertions(+), 1 deletion(-)

    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 15:53:08.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 16:01:14.000000000 -0400
    @@ -67,26 +67,79 @@ cycle_t ftrace_now(int cpu)
    return cpu_clock(cpu);
    }

    +/*
    + * The global_trace is the descriptor that holds the tracing
    + * buffers for the live tracing. For each CPU, it contains
    + * a link list of pages that will store trace entries. The
    + * page descriptor of the pages in the memory is used to hold
    + * the link list by linking the lru item in the page descriptor
    + * to each of the pages in the buffer per CPU.
    + *
    + * For each active CPU there is a data field that holds the
    + * pages for the buffer for that CPU. Each CPU has the same number
    + * of pages allocated for its buffer.
    + */
    static struct trace_array global_trace;

    static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);

    +/*
    + * The max_tr is used to snapshot the global_trace when a maximum
    + * latency is reached. Some tracers will use this to store a maximum
    + * trace while it continues examining live traces.
    + *
    + * The buffers for the max_tr are set up the same as the global_trace.
    + * When a snapshot is taken, the link list of the max_tr is swapped
    + * with the link list of the global_trace and the buffers are reset for
    + * the global_trace so the tracing can continue.
    + */
    static struct trace_array max_tr;

    static DEFINE_PER_CPU(struct trace_array_cpu, max_data);

    +/* tracer_enabled is used to toggle activation of a tracer */
    static int tracer_enabled = 1;
    +
    +/*
    + * trace_nr_entries is the number of entries that is allocated
    + * for a buffer. Note, the number of entries is always rounded
    + * to ENTRIES_PER_PAGE.
    + */
    static unsigned long trace_nr_entries = 65536UL;

    +/* trace_types holds a link list of available tracers. */
    static struct tracer *trace_types __read_mostly;
    +
    +/* current_trace points to the tracer that is currently active */
    static struct tracer *current_trace __read_mostly;
    +
    +/*
    + * max_tracer_type_len is used to simplify the allocating of
    + * buffers to read userspace tracer names. We keep track of
    + * the longest tracer name registered.
    + */
    static int max_tracer_type_len;

    +/*
    + * trace_types_lock is used to protect the trace_types list.
    + * This lock is also used to keep user access serialized.
    + * Accesses from userspace will grab this lock while userspace
    + * activities happen inside the kernel.
    + */
    static DEFINE_MUTEX(trace_types_lock);
    +
    +/* trace_wait is a waitqueue for tasks blocked on trace_poll */
    static DECLARE_WAIT_QUEUE_HEAD(trace_wait);

    +/* trace_flags holds iter_ctrl options */
    unsigned long trace_flags = TRACE_ITER_PRINT_PARENT;

    +/**
    + * trace_wake_up - wake up tasks waiting for trace input
    + *
    + * Simply wakes up any task that is blocked on the trace_wait
    + * queue. These is used with trace_poll for tasks polling the trace.
    + */
    void trace_wake_up(void)
    {
    /*
    @@ -120,6 +173,14 @@ unsigned long nsecs_to_usecs(unsigned lo
    return nsecs / 1000;
    }

    +/*
    + * trace_flag_type is an enumeration that holds different
    + * states when a trace occurs. These are:
    + * IRQS_OFF - interrupts were disabled
    + * NEED_RESCED - reschedule is requested
    + * HARDIRQ - inside an interrupt handler
    + * SOFTIRQ - inside a softirq handler
    + */
    enum trace_flag_type {
    TRACE_FLAG_IRQS_OFF = 0x01,
    TRACE_FLAG_NEED_RESCHED = 0x02,
    @@ -127,10 +188,14 @@ enum trace_flag_type {
    TRACE_FLAG_SOFTIRQ = 0x08,
    };

    +/*
    + * TRACE_ITER_SYM_MASK masks the options in trace_flags that
    + * control the output of kernel symbols.
    + */
    #define TRACE_ITER_SYM_MASK \
    (TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRA CE_ITER_SYM_ADDR)

    -/* These must match the bit postions above */
    +/* These must match the bit postions in trace_iterator_flags */
    static const char *trace_options[] = {
    "print-parent",
    "sym-offset",
    @@ -145,6 +210,15 @@ static const char *trace_options[] = {
    NULL
    };

    +/*
    + * ftrace_max_lock is used to protect the swapping of buffers
    + * when taking a max snapshot. The buffers themselves are
    + * protected by per_cpu spinlocks. But the action of the swap
    + * needs its own lock.
    + *
    + * This is defined as a raw_spinlock_t in order to help
    + * with performance when lockdep debugging is enabled.
    + */
    static raw_spinlock_t ftrace_max_lock =
    (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;

    @@ -175,6 +249,13 @@ __update_max_tr(struct trace_array *tr,
    tracing_record_cmdline(current);
    }

    +/**
    + * check_pages - integrity check of trace buffers
    + *
    + * As a safty measure we check to make sure the data pages have not
    + * been corrupted. TODO: configure to disable this because it adds
    + * a bit of overhead.
    + */
    void check_pages(struct trace_array_cpu *data)
    {
    struct page *page, *tmp;
    @@ -188,6 +269,13 @@ void check_pages(struct trace_array_cpu
    }
    }

    +/**
    + * head_page - page address of the first page in per_cpu buffer.
    + *
    + * head_page returns the page address of the first page in
    + * a per_cpu buffer. This also preforms various consistency
    + * checks to make sure the buffer has not been corrupted.
    + */
    void *head_page(struct trace_array_cpu *data)
    {
    struct page *page;
    @@ -202,6 +290,17 @@ void *head_page(struct trace_array_cpu *
    return page_address(page);
    }

    +/**
    + * trace_seq_printf - sequence printing of trace information
    + * @s: trace sequence descriptor
    + * @fmt: printf format string
    + *
    + * The tracer may use either sequence operations or its own
    + * copy to user routines. To simplify formating of a trace
    + * trace_seq_printf is used to store strings into a special
    + * buffer (@s). Then the output may be either used by
    + * the sequencer or pulled into another buffer.
    + */
    int
    trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
    {
    @@ -225,6 +324,16 @@ trace_seq_printf(struct trace_seq *s, co
    return len;
    }

    +/**
    + * trace_seq_puts - trace sequence printing of simple string
    + * @s: trace sequence descriptor
    + * @str: simple string to record
    + *
    + * The tracer may use either the sequence operations or its own
    + * copy to user routines. This function records a simple string
    + * into a special buffer (@s) for later retrieval by a sequencer
    + * or other mechanism.
    + */
    static int
    trace_seq_puts(struct trace_seq *s, const char *str)
    {
    @@ -320,6 +429,13 @@ trace_print_seq(struct seq_file *m, stru
    trace_seq_reset(s);
    }

    +/*
    + * flip the trace buffers between two trace descriptors.
    + * This usually is the buffers between the global_trace and
    + * the max_tr to record a snapshot of a current trace.
    + *
    + * The ftrace_max_lock must be held.
    + */
    static void
    flip_trace(struct trace_array_cpu *tr1, struct trace_array_cpu *tr2)
    {
    @@ -341,6 +457,15 @@ flip_trace(struct trace_array_cpu *tr1,
    check_pages(tr2);
    }

    +/**
    + * update_max_tr - snapshot all trace buffers from global_trace to max_tr
    + * @tr: tracer
    + * @tsk: the task with the latency
    + * @cpu: The cpu that initiated the trace.
    + *
    + * Flip the buffers between the @tr and the max_tr and record information
    + * about which task was the cause of this latency.
    + */
    void
    update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
    {
    @@ -365,6 +490,8 @@ update_max_tr(struct trace_array *tr, st
    * @tr - tracer
    * @tsk - task with the latency
    * @cpu - the cpu of the buffer to copy.
    + *
    + * Flip the trace of a single CPU buffer between the @tr and the max_tr.
    */
    void
    update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
    @@ -384,6 +511,12 @@ update_max_tr_single(struct trace_array
    __raw_spin_unlock(&ftrace_max_lock);
    }

    +/**
    + * register_tracer - register a tracer with the ftrace system.
    + * @type - the plugin for the tracer
    + *
    + * Register a new plugin tracer.
    + */
    int register_tracer(struct tracer *type)
    {
    struct tracer *t;
    Index: linux-sched-devel.git/kernel/trace/trace.h
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.h 2008-04-18 15:47:22.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.h 2008-04-18 16:01:14.000000000 -0400
    @@ -314,6 +314,13 @@ extern long ns2usecs(cycle_t nsec);

    extern unsigned long trace_flags;

    +/*
    + * trace_iterator_flags is an enumeration that defines bit
    + * positions into trace_flags that controls the output.
    + *
    + * NOTE: These bits must match the trace_options array in
    + * trace.c.
    + */
    enum trace_iterator_flags {
    TRACE_ITER_PRINT_PARENT = 0x01,
    TRACE_ITER_SYM_OFFSET = 0x02,

    --
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 4/5] ftrace: limit trace entries

    On Fri, 18 Apr 2008 16:05:42 -0400
    Steven Rostedt wrote:

    > +/**
    > + * detremine_dirtyable_memory - amount of memory that may be used


    tpyo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 1/5] ftrace: simple clean ups


    On Fri, 18 Apr 2008, Andrew Morton wrote:

    > On Fri, 18 Apr 2008 16:05:39 -0400
    > Steven Rostedt wrote:
    >
    > > - r = snprintf(buf, 64, "%ld\n",
    > > + r = snprintf(buf, sizeof(buf), "%ld\n",

    >
    > If you use scnprintf here


    Cool, I didn't know of that function.

    >
    > > *ptr == (unsigned long)-1 ? -1 : nsecs_to_usecs(*ptr));
    > > - if (r > 64)
    > > - r = 64;
    > > + if (r > sizeof(buf))
    > > + r = sizeof(buf);

    >
    > This becomes a cant-happen (I think).


    Yep it does. New patch on the way.

    Thanks,

    -- Steve

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 4/5] ftrace: limit trace entries


    On Fri, 18 Apr 2008, Andrew Morton wrote:

    > On Fri, 18 Apr 2008 16:05:42 -0400
    > Steven Rostedt wrote:
    >
    > > +/**
    > > + * detremine_dirtyable_memory - amount of memory that may be used

    >
    > tpyo
    >


    Damn! You don't miss a thing. Are you sure your name isn't Monk?

    New patch coming.

    Thanks,

    -- Steve

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. [PATCH 1/5 -v2] ftrace: simple clean ups


    Andrew Morton mentioned some clean ups that should be done to ftrace.
    This patch does some of the simple clean ups.

    Signed-off-by: Steven Rostedt
    ---
    kernel/trace/trace.c | 23 ++++++++++-------------
    1 file changed, 10 insertions(+), 13 deletions(-)

    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 23:01:57.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 23:03:52.000000000 -0400
    @@ -36,8 +36,7 @@ unsigned long __read_mostly tracing_max_
    unsigned long __read_mostly tracing_thresh;

    /* dummy trace to disable tracing */
    -static struct tracer no_tracer __read_mostly =
    -{
    +static struct tracer no_tracer __read_mostly = {
    .name = "none",
    };

    @@ -1961,8 +1960,8 @@ tracing_iter_ctrl_write(struct file *fil
    int neg = 0;
    int i;

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2054,8 +2053,8 @@ tracing_ctrl_write(struct file *filp, co
    long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2154,10 +2153,8 @@ tracing_max_lat_read(struct file *filp,
    char buf[64];
    int r;

    - r = snprintf(buf, 64, "%ld\n",
    + r = scnprintf(buf, sizeof(buf), "%ld\n",
    *ptr == (unsigned long)-1 ? -1 : nsecs_to_usecs(*ptr));
    - if (r > 64)
    - r = 64;
    return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
    }

    @@ -2169,8 +2166,8 @@ tracing_max_lat_write(struct file *filp,
    long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;
    @@ -2434,8 +2431,8 @@ tracing_entries_write(struct file *filp,
    unsigned long val;
    char buf[64];

    - if (cnt > 63)
    - cnt = 63;
    + if (cnt >= sizeof(buf))
    + return -EINVAL;

    if (copy_from_user(&buf, ubuf, cnt))
    return -EFAULT;


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. [PATCH 4/5 -v2] ftrace: limit trace entries


    Currently there is no protection from the root user to use up all of
    memory for trace buffers. If the root user allocates too many entries,
    the OOM killer might start killing off all tasks.

    This patch adds an algorithm to check the following condition:

    pages_requested > (freeable_memory + current_trace_buffer_pages) / 4

    If the above is met then the allocation fails. The above prevents more
    than 1/4th of freeable memory from being used by trace buffers.

    To determine the freeable_memory, I made determine_dirtyable_memory in
    mm/page-writeback.c global.

    Special thanks goes to Peter Zijlstra for suggesting the above calculation.

    Signed-off-by: Steven Rostedt
    ---
    include/linux/writeback.h | 2 ++
    kernel/trace/trace.c | 38 ++++++++++++++++++++++++++++++++++++++
    mm/page-writeback.c | 10 +++++++---
    3 files changed, 47 insertions(+), 3 deletions(-)

    Index: linux-sched-devel.git/include/linux/writeback.h
    ================================================== =================
    --- linux-sched-devel.git.orig/include/linux/writeback.h 2008-04-18 23:01:57.000000000 -0400
    +++ linux-sched-devel.git/include/linux/writeback.h 2008-04-18 23:06:43.000000000 -0400
    @@ -105,6 +105,8 @@ extern int vm_highmem_is_dirtyable;
    extern int block_dump;
    extern int laptop_mode;

    +extern unsigned long determine_dirtyable_memory(void);
    +
    extern int dirty_ratio_handler(struct ctl_table *table, int write,
    struct file *filp, void __user *buffer, size_t *lenp,
    loff_t *ppos);
    Index: linux-sched-devel.git/kernel/trace/trace.c
    ================================================== =================
    --- linux-sched-devel.git.orig/kernel/trace/trace.c 2008-04-18 23:06:40.000000000 -0400
    +++ linux-sched-devel.git/kernel/trace/trace.c 2008-04-18 23:06:43.000000000 -0400
    @@ -27,6 +27,7 @@
    #include
    #include
    #include
    +#include

    #include

    @@ -51,6 +52,8 @@ static int trace_free_page(void);

    static int tracing_disabled = 1;

    +static unsigned long tracing_pages_allocated;
    +
    long
    ns2usecs(cycle_t nsec)
    {
    @@ -2477,12 +2480,41 @@ tracing_entries_write(struct file *filp,
    }

    if (val > global_trace.entries) {
    + long pages_requested;
    + unsigned long freeable_pages;
    +
    + /* make sure we have enough memory before mapping */
    + pages_requested =
    + (val + (ENTRIES_PER_PAGE-1)) / ENTRIES_PER_PAGE;
    +
    + /* account for each buffer (and max_tr) */
    + pages_requested *= tracing_nr_buffers * 2;
    +
    + /* Check for overflow */
    + if (pages_requested < 0) {
    + cnt = -ENOMEM;
    + goto out;
    + }
    +
    + freeable_pages = determine_dirtyable_memory();
    +
    + /* we only allow to request 1/4 of useable memory */
    + if (pages_requested >
    + ((freeable_pages + tracing_pages_allocated) / 4)) {
    + cnt = -ENOMEM;
    + goto out;
    + }
    +
    while (global_trace.entries < val) {
    if (trace_alloc_page()) {
    cnt = -ENOMEM;
    goto out;
    }
    + /* double check that we don't go over the known pages */
    + if (tracing_pages_allocated > pages_requested)
    + break;
    }
    +
    } else {
    /* include the number of entries in val (inc of page entries) */
    while (global_trace.entries > val + (ENTRIES_PER_PAGE - 1))
    @@ -2665,6 +2697,7 @@ static int trace_alloc_page(void)
    struct page *page, *tmp;
    LIST_HEAD(pages);
    void *array;
    + unsigned pages_allocated = 0;
    int i;

    /* first allocate a page for each CPU */
    @@ -2676,6 +2709,7 @@ static int trace_alloc_page(void)
    goto free_pages;
    }

    + pages_allocated++;
    page = virt_to_page(array);
    list_add(&page->lru, &pages);

    @@ -2687,6 +2721,7 @@ static int trace_alloc_page(void)
    "for trace buffer!\n");
    goto free_pages;
    }
    + pages_allocated++;
    page = virt_to_page(array);
    list_add(&page->lru, &pages);
    #endif
    @@ -2708,6 +2743,7 @@ static int trace_alloc_page(void)
    SetPageLRU(page);
    #endif
    }
    + tracing_pages_allocated += pages_allocated;
    global_trace.entries += ENTRIES_PER_PAGE;

    return 0;
    @@ -2742,6 +2778,7 @@ static int trace_free_page(void)
    page = list_entry(p, struct page, lru);
    ClearPageLRU(page);
    list_del(&page->lru);
    + tracing_pages_allocated--;
    __free_page(page);

    tracing_reset(data);
    @@ -2759,6 +2796,7 @@ static int trace_free_page(void)
    page = list_entry(p, struct page, lru);
    ClearPageLRU(page);
    list_del(&page->lru);
    + tracing_pages_allocated--;
    __free_page(page);

    tracing_reset(data);
    Index: linux-sched-devel.git/mm/page-writeback.c
    ================================================== =================
    --- linux-sched-devel.git.orig/mm/page-writeback.c 2008-04-18 23:01:57.000000000 -0400
    +++ linux-sched-devel.git/mm/page-writeback.c 2008-04-18 23:07:52.000000000 -0400
    @@ -126,8 +126,6 @@ static void background_writeout(unsigned
    static struct prop_descriptor vm_completions;
    static struct prop_descriptor vm_dirties;

    -static unsigned long determine_dirtyable_memory(void);
    -
    /*
    * couple the period to the dirty_ratio:
    *
    @@ -286,7 +284,13 @@ static unsigned long highmem_dirtyable_m
    #endif
    }

    -static unsigned long determine_dirtyable_memory(void)
    +/**
    + * determine_dirtyable_memory - amount of memory that may be used
    + *
    + * Returns the numebr of pages that can currently be freed and used
    + * by the kernel for direct mappings.
    + */
    +unsigned long determine_dirtyable_memory(void)
    {
    unsigned long x;



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH 4/5] ftrace: limit trace entries

    On Fri, 18 Apr 2008 23:12:43 -0400 (EDT) Steven Rostedt wrote:

    > Are you sure your name isn't Monk?


    Nah, he'd have pointed out that it's "linked list", not "link list".

    > New patch coming.


    We could do this all day at this level of detail
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 0/5] ftrace update patches


    * Steven Rostedt wrote:

    > This patch series contains various fixes, clean ups and commenting
    > that I've been doing today on ftrace. I need to go off and do other
    > things right now, but I wanted these to get out before the weekend.


    thanks Steve, applied.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH 4/5] ftrace: limit trace entries


    * Andrew Morton wrote:

    > On Fri, 18 Apr 2008 16:05:42 -0400
    > Steven Rostedt wrote:
    >
    > > +/**
    > > + * detremine_dirtyable_memory - amount of memory that may be used

    >
    > tpyo


    thanks, fixed.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread