[PATCH] fix up perfmon to build on -mm - Kernel

This is a discussion on [PATCH] fix up perfmon to build on -mm - Kernel ; Andrew Morton writes: > I was hoping that after the round of release-and-review which Stephane, > Andi and I did about twelve months ago that we were on track to merge the > perfmon codebase as-offered. But now it turns ...

+ Reply to Thread
Page 3 of 6 FirstFirst 1 2 3 4 5 ... LastLast
Results 41 to 60 of 113

Thread: [PATCH] fix up perfmon to build on -mm

  1. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    Andrew Morton writes:

    > I was hoping that after the round of release-and-review which Stephane,
    > Andi and I did about twelve months ago that we were on track to merge the
    > perfmon codebase as-offered. But now it turns out that the sentiment is
    > that the code simply has too many bells-and-whistles to be acceptable.


    Whose sentiment?

    I've had a bit of a look at it today together with David Gibson. Our
    impression is that the latest version is a lot cleaner and simpler
    than it used to be. I'm also reading Stephane's technical report
    which describes the interface, and whilst I'm only part-way through
    it, I haven't seen anything yet which strikes me as unnecessary or
    overly complicated.

    Paul.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wed, 14 Nov 2007 18:24:36 +1100 Paul Mackerras wrote:

    > Andrew Morton writes:
    >
    > > I was hoping that after the round of release-and-review which Stephane,
    > > Andi and I did about twelve months ago that we were on track to merge the
    > > perfmon codebase as-offered. But now it turns out that the sentiment is
    > > that the code simply has too many bells-and-whistles to be acceptable.

    >
    > Whose sentiment?


    Andi and hch, maybe others I've forgotten about.

    > I've had a bit of a look at it today together with David Gibson. Our
    > impression is that the latest version is a lot cleaner and simpler
    > than it used to be. I'm also reading Stephane's technical report
    > which describes the interface, and whilst I'm only part-way through
    > it, I haven't seen anything yet which strikes me as unnecessary or
    > overly complicated.


    Yes, that's quite possible. I don't know how up-to-date people's
    knowledge is. I know I haven't looked seriously at the code in around
    twelve months.

    Let's get it on the wires as outlined and take a look at it all.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wed, Nov 14, 2007 at 06:24:36PM +1100, Paul Mackerras wrote:
    > Whose sentiment?


    Mine for example. The whole userspace interface is just on crack,
    and the code is full of complexities aswell.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    Christoph Hellwig writes:

    > Mine for example. The whole userspace interface is just on crack,
    > and the code is full of complexities aswell.


    Could you give some _technical_ details of what you don't like?

    Paul.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wed, Nov 14, 2007 at 09:43:02PM +1100, Paul Mackerras wrote:
    > Christoph Hellwig writes:
    >
    > > Mine for example. The whole userspace interface is just on crack,
    > > and the code is full of complexities aswell.

    >
    > Could you give some _technical_ details of what you don't like?


    I've done this a gazillion times before, so maybe instead of beeing a lazy
    bastard you could look up mailinglist archive. It's not like this is the
    first discussion of perfmon. But to get start look at the systems calls,
    many of them are beasts like:

    int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)

    This is basically a read(2) (or for other syscalls a write) on something
    else than the file descriptor provided to the system call. The right thing
    to do is obviously have a pmds and pmcs file in procfs for the thread beeing
    monitored instead of these special-case files, with another set for global
    tracing. Similarly I'm pretty sure we can get a much better interface
    if we introduce marching files in procfs for the other calls.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [perfmon] Re: [perfmon2] perfmon2 merge news


    Ok, I just got 4 freakin' bounces from all of these subscriber only
    perfmon etc. mailing lists.

    Please remove those lists from the CC: as it's pointless for those of
    us not on the lists to participate if those lists can't even see the
    feedback we are giving.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    From: Christoph Hellwig
    Date: Wed, 14 Nov 2007 11:00:09 +0000

    > I've done this a gazillion times before, so maybe instead of beeing a lazy
    > bastard you could look up mailinglist archive. It's not like this is the
    > first discussion of perfmon. But to get start look at the systems calls,
    > many of them are beasts like:
    >
    > int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
    >
    > This is basically a read(2) (or for other syscalls a write) on something
    > else than the file descriptor provided to the system call. The right thing
    > to do is obviously have a pmds and pmcs file in procfs for the thread beeing
    > monitored instead of these special-case files, with another set for global
    > tracing. Similarly I'm pretty sure we can get a much better interface
    > if we introduce marching files in procfs for the other calls.


    This is my impression too, all of the things being done with
    a slew of system calls would be better served by real special
    files and appropriate fops. Whether the thing is some kind
    of misc device or procfs is less important than simply getting
    away from these system calls.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    Christoph Hellwig writes:

    > int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
    >
    > This is basically a read(2) (or for other syscalls a write) on something
    > else than the file descriptor provided to the system call.


    No it's not basically a read(). It's more like a request/reply
    interface, which a read()/write() interface doesn't handle very well.
    The request in this case is "tell me about this particular collection
    of PMDs" and the reply is the values.

    It seems to me that an important part of this is to be able to collect
    values from several PMDs at a single point in time, or at least an
    approximation to a single point in time. So that means that you don't
    want a file per PMD either.

    Basically we don't have a good abstraction for a request/reply (or
    command/response) type of interface, and this is a case where we need
    one. Having a syscall that takes a struct containing the request and
    reply is as good a way as any, particularly for something that needs
    to be quick.

    Paul.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    David Miller writes:

    > This is my impression too, all of the things being done with
    > a slew of system calls would be better served by real special
    > files and appropriate fops.


    Special files and fops really only work well if you can coerce the
    interface into one where data flows predominantly one way. I don't
    think they work so well for something that is more like an RPC across
    the user/kernel barrier. For that a system call is better.

    For instance, if you have something that kind-of looks like

    read_pmds(int n, int *pmd_numbers, u64 *pmd_values);

    where the caller supplies an array of PMD numbers and the function
    returns their values (and you want that reading to be done atomically
    in some sense), how would you do that using special files and fops?

    > Whether the thing is some kind
    > of misc device or procfs is less important than simply getting
    > away from these system calls.


    Why? What's inherently offensive about system calls?

    Paul.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    From: Paul Mackerras
    Date: Wed, 14 Nov 2007 22:44:56 +1100

    > For instance, if you have something that kind-of looks like
    >
    > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
    >
    > where the caller supplies an array of PMD numbers and the function
    > returns their values (and you want that reading to be done atomically
    > in some sense), how would you do that using special files and fops?


    The same way we handle some of the multicast "getsockopt()"
    calls. The parameters passed in are both inputs and outputs.

    For the above example:

    struct pmd_info {
    int *pmd_numbers;
    u64 *pmd_values;
    int n;
    } *p;

    buffer_size = N;
    p = malloc(buffer_size);
    p->pmd_numbers = p + foo;
    p->pmd_values = p + bar;
    p->n = whatever(N);
    err = read(fd, p, N);

    It's definitely doable, use your imagination.

    You can encode all kinds of operation types into the
    header as well.

    Another alternative is to use generic netlink.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
    > David Miller writes:
    > > This is my impression too, all of the things being done with
    > > a slew of system calls would be better served by real special
    > > files and appropriate fops.

    >
    > Special files and fops really only work well if you can coerce the
    > interface into one where data flows predominantly one way. I don't
    > think they work so well for something that is more like an RPC across
    > the user/kernel barrier. For that a system call is better.
    >
    > For instance, if you have something that kind-of looks like
    >
    > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
    >
    > where the caller supplies an array of PMD numbers and the function
    > returns their values (and you want that reading to be done atomically
    > in some sense), how would you do that using special files and fops?


    Could you implement it with readv()?
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    From: Paul Mackerras
    Date: Wed, 14 Nov 2007 22:39:24 +1100

    > No it's not basically a read(). It's more like a request/reply
    > interface, which a read()/write() interface doesn't handle very well.


    Yes it can, see my other reply.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    From: Nick Piggin
    Date: Wed, 14 Nov 2007 10:49:48 +1100

    > On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
    > > David Miller writes:
    > > > This is my impression too, all of the things being done with
    > > > a slew of system calls would be better served by real special
    > > > files and appropriate fops.

    > >
    > > Special files and fops really only work well if you can coerce the
    > > interface into one where data flows predominantly one way. I don't
    > > think they work so well for something that is more like an RPC across
    > > the user/kernel barrier. For that a system call is better.
    > >
    > > For instance, if you have something that kind-of looks like
    > >
    > > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
    > >
    > > where the caller supplies an array of PMD numbers and the function
    > > returns their values (and you want that reading to be done atomically
    > > in some sense), how would you do that using special files and fops?

    >
    > Could you implement it with readv()?


    Sure, why not? Just cook up an iovec. pmd_numbers goes to offset
    X and pmd_values goes to offset Y, with some helpers like what
    we have in the networking already for recvmsg.

    But why would you want readv() for this? The syscall thing
    Paul asked me to translate into a read() doesn't provide
    iovec-like behavior so I don't see why readv() is necessary
    at all.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    From: Paul Mackerras
    Date: Wed, 14 Nov 2007 23:03:24 +1100

    > You're suggesting that the behaviour of a read() should depend on what
    > was in the buffer before the read? Gack! Surely you have better
    > taste than that?


    Absolutely that's what I mean, it's atomic and gives you exactly what
    you need.

    I see nothing wrong or gross with these semantics. Nothing in the
    "book of UNIX" specifies that for a device or special file the passed
    in buffer cannot contain input control data.

    > > Another alternative is to use generic netlink.

    >
    > Then you end up with two system calls to get the data rather than one
    > (one to send the request and another to read the reply). For
    > something that needs to be quick that is a suboptimal interface.


    Not necessarily, consider the possibility of using recvmsg() control
    message data. With that it could be done in one go.

    This also suggests that it could be implemented as it's own protocol
    family.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    David Miller writes:

    > The same way we handle some of the multicast "getsockopt()"
    > calls. The parameters passed in are both inputs and outputs.


    For a read??!!!

    > For the above example:
    >
    > struct pmd_info {
    > int *pmd_numbers;
    > u64 *pmd_values;
    > int n;
    > } *p;
    >
    > buffer_size = N;
    > p = malloc(buffer_size);
    > p->pmd_numbers = p + foo;
    > p->pmd_values = p + bar;
    > p->n = whatever(N);
    > err = read(fd, p, N);


    You're suggesting that the behaviour of a read() should depend on what
    was in the buffer before the read? Gack! Surely you have better
    taste than that?

    Or are you saying that a read (or write) has a side-effect of altering
    some other area of memory besides the buffer you give to read()? That
    seems even worse to me.

    > Another alternative is to use generic netlink.


    Then you end up with two system calls to get the data rather than one
    (one to send the request and another to read the reply). For
    something that needs to be quick that is a suboptimal interface.

    Paul.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wednesday 14 November 2007 23:07, David Miller wrote:
    > From: Paul Mackerras
    > Date: Wed, 14 Nov 2007 23:03:24 +1100
    >
    > > You're suggesting that the behaviour of a read() should depend on what
    > > was in the buffer before the read? Gack! Surely you have better
    > > taste than that?

    >
    > Absolutely that's what I mean, it's atomic and gives you exactly what
    > you need.
    >
    > I see nothing wrong or gross with these semantics. Nothing in the
    > "book of UNIX" specifies that for a device or special file the passed
    > in buffer cannot contain input control data.


    True, but is it now any so different to an ioctl?
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    On Wednesday 14 November 2007 22:58, David Miller wrote:
    > From: Nick Piggin
    > Date: Wed, 14 Nov 2007 10:49:48 +1100
    >
    > > On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
    > > > David Miller writes:
    > > > > This is my impression too, all of the things being done with
    > > > > a slew of system calls would be better served by real special
    > > > > files and appropriate fops.
    > > >
    > > > Special files and fops really only work well if you can coerce the
    > > > interface into one where data flows predominantly one way. I don't
    > > > think they work so well for something that is more like an RPC across
    > > > the user/kernel barrier. For that a system call is better.
    > > >
    > > > For instance, if you have something that kind-of looks like
    > > >
    > > > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
    > > >
    > > > where the caller supplies an array of PMD numbers and the function
    > > > returns their values (and you want that reading to be done atomically
    > > > in some sense), how would you do that using special files and fops?

    > >
    > > Could you implement it with readv()?

    >
    > Sure, why not? Just cook up an iovec. pmd_numbers goes to offset
    > X and pmd_values goes to offset Y, with some helpers like what
    > we have in the networking already for recvmsg.
    >
    > But why would you want readv() for this? The syscall thing
    > Paul asked me to translate into a read() doesn't provide
    > iovec-like behavior so I don't see why readv() is necessary
    > at all.


    Ah sorry, that's what I get for typing before I think: of course
    readv doesn't vectorise the right part of the equation.

    What I really mean is a readv-like syscall, but one that also
    vectorises the file offset. Maybe this is useful enough as a generic
    syscall that also helps Paul's example...

    Of course, I guess this all depends on whether the atomicity is an
    important requirement. If not, you can obviously just do it with
    multiple read syscalls...
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    Christoph Hellwig writes:
    >
    > I've done this a gazillion times before, so maybe instead of beeing a lazy
    > bastard you could look up mailinglist archive. It's not like this is the
    > first discussion of perfmon. But to get start look at the systems calls,
    > many of them are beasts like:
    >
    > int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
    >
    > This is basically a read(2) (or for other syscalls a write) on something


    At least for x86 and I suspect some 1other architectures we don't
    initially need a syscall at all for this. There is an instruction
    RDPMC who can read a performance counter just fine. It is also much
    faster and generally preferable for the case where a process measures
    events about itself. In fact it is essential for one of the use cases
    I would like to see perfmon used (replacement of RDTSC for cycle
    counting)

    Later a syscall might be needed with event multiplexing, but that seems
    more like a far away non essential feature.

    > else than the file descriptor provided to the system call. The right thing


    I don't like read/write for this too much. I think it's better to
    have individual syscalls. After all that is CPU state and having
    syscalls for that does seem reasonable.

    -Andi
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [perfmon] Re: [perfmon2] perfmon2 merge news

    Andi,

    On Wed, Nov 14, 2007 at 03:07:02AM +0100, Andi Kleen wrote:
    >
    > [dropped all these bouncing email lists. Adding closed lists to public
    > cc lists is just a bad idea]
    >


    Just want to make sure perfmon2 users participate in this discussion.

    > > int
    > > main(int argc, char **argv)
    > > {
    > > int ctx_fd;
    > > pfarg_pmd_t pd[1];
    > > pfarg_pmc_t pc[1];
    > > pfarg_ctx_t ctx;
    > > pfarg_load_t load_args;
    > >
    > > memset(&ctx, 0, sizeof(ctx));
    > > memset(pc, 0, sizeof(pc));
    > > memset(pd, 0, sizeof(pd));
    > >
    > > /* create session (context) and get file descriptor back (identifier) */
    > > ctx_fd = pfm_create_context(&ctx, NULL, NULL, 0);

    >
    > There's nothing in your example that makes the file descriptor needed.
    >


    Partially true. The file descriptor becomes really useful when you sample.
    You leverage the file descriptor to receive notifications of counter overflows
    and full sampling buffer. You extract notification messages via read() and you can
    use SIGIO, select/poll.

    The example shows how you can leverage existing mechanisms to destroy the session, i.e.,
    free the associated kernel resources. For that, you use close() instead of adding yet
    another syscall. It also provides a resource limitation mechanisms to control consumption
    of kernel memory, i.e., you can only create as many sessions as you can have open files.

    > >
    > > /* setup one config register (PMC0) */
    > > pc[0].reg_num = 0
    > > pc[0].reg_value = 0x1234;

    >
    > That would be nicer if it was just two arguments.
    >

    Are you suggesting something like: pfm_write_pmcs(fd, 0, 0x1234)?

    That would be quite expensive when you have lots of registers to setup: one
    syscall per register. The perfmon syscalls to read/write registers accept vector
    of arguments to amortize the cost of the syscall over multiple registers
    (similar to poll(2)).

    With many tools, registers are not just setup once. During certain measurements,
    data registers may be read multiple times. When you sample or multiplex at
    the user level, you do need to reprogram the PMU state and that is on the critical
    path.

    You do not want a call that programs the entire PMU state all at once either. Many times,
    you only want to modify a small subset. Having the full state does also cause some portability
    problems.


    > >
    > > /* setup one data register (PMD0) */
    > > pd[0].reg_num = 0;
    > > pd[0].reg_value = 0;

    >
    > Why do you need to set the data register? Wouldn't it make
    > more sense to let the kernel handle that and just return one.
    >

    It depends on what you are doing. Here, this was not really necessary. It was
    meant to show how you can program the data registers as well. Perfmon2 provides
    default values for all data registers. For counters, the value is guaranteed to
    be zero.

    But it is important to note that not all data registers are counters. That is the
    case of Itanium 2, some are just buffers. On AMD Barcelona IBS several are buffers as
    well, and some may need to be initialized to non zero value, i.e., the IBS sampling
    period.

    With event-based sampling, the period is expressed as the number of occurrences
    of an event. For instance, you can say: " take a sample every 2000 L2 cache misses".
    The way you express this with perfmon2 is that you program a counter to measure
    L2 cache misses, and then you initialize the corresponding data register (counter)
    to overflow after 2000 occurrences. Given that the interface guarantees all counters
    are 64-bit regardless of the hardware, you simply have to program the counter to -2000.
    Thus you see that you need a call to actual program the data registers.

    > >
    > > /* program the registers */
    > > pfm_write_pmcs(ctx_fd, pc, 1);
    > > pfm_write_pmds(ctx_fd, pd, 1);
    > >
    > > /* attach the context to self */
    > > load_args.load_pid = getpid();
    > > pfm_load_context(ctx_fd, &load_args);

    >
    > My replacement would be to just add a flags argument to write_pmcs
    > with one flag bit meaning "GLOBAL CONTEXT" versus "MY CONTEXT"
    > >


    You are mixing PMU programming with the type of measurement you want to do.

    Perfmon2 decouples the two operations. In fact, no PMU hardware is actually touched
    before you attach to either a CPU or a thread. This way, you can prepare your measurement
    and then attach-and-go. Thus is is possible to create batches of ready-to-go sessions.
    That is useful, for instance, when you are trying to measure across fork, pthread_create
    which you can catch on-the-fly.

    Take the per-thread example, you can setup your session before you fork/exec the program
    you want to measure.

    Note also that perfmon2 supports attaching to an already running thread. So there is
    more than "GLOBAL CONTEXT" versus "MY CONTEXT".


    > > /* activate monitoring */
    > > pfm_start(ctx_fd, NULL);

    >
    > Why can't that be done by the call setting up the register?
    >


    Good question. If you do what say, you assume that the start/stop bit lives in the
    config (or data) registers of the PMU. This is not true on all hardware. On Itanium
    for instance, the start/stop bit is part of the Processor Status Register (psr).
    That is not a PMU register.

    On X86, you set the enable bit the PERFEVTSEL, but nothing really happens until you issue
    pfm_start(), i.e., the PERFEVTSEL registers are not touched until then.

    > Or if someone needs to do it for a specific region they can read
    > the register before and then afterwards.
    >
    > >
    > > /*
    > > * run code to measure
    > > */
    > >
    > > /* stop monitoring */
    > > pfm_stop(ctx_fd);
    > >
    > > /* read data register */
    > > pfm_read_pmds(ctx_fd, pd, 1);

    >
    > On x86 i think it would be much simpler to just let the set/alloc
    > register call return a number and then use RDPMC directly. That would
    > be actually faster and be much simpler too.
    >

    One approach does not prevent the other. Assuming you allow cr4.pce, then nothing prevents
    a self-monitoring thread from reading the counters directly. You'll just get the
    lower 32-bit of it. So if you read frequently enough, you should not have a problem.

    But keep in mind that we do want a uniform interface across all hardware and all type
    of sessions (self-monitoring, CPU-wide, monitoring of another thread). You don't want
    an interface that says on x86 you have to use rdpmc, on Itanium pfm_read_pmds() and so
    on. You want an interface that guarantees that with pfm_read_pmds() you'll be able to
    read on any hardware platforms, then on some you may be able to use a more efficient
    method, e.g., rdpmc on X86.

    Reducing performance monitoring to self-monitoring is not what we want. In fact, there
    are only a few domains where you can actually do this and HPC is one of them. But in
    many other situations, you cannot and don't want to have to instrument applications
    or libraries to collect performance data. It is quite handy to be able to do:
    $ pfmon /bin/ls
    or
    $ pfmon --attach-task=`pidof sshd` -timeout=10s


    Also note that there is no guarantee that RDPMC allows you to access all data registers
    on a PMU. For instance, on AMD Barcelona, it seems you cannot read the IBS register using
    RDPMC.


    > I suppose most architectures have similar facilities, if not a call could be
    > added for them but it's not really essential. The call might be also needed
    > for event multiplexing, but frankly I would just leave that out for now.
    >

    Itanium does allow user level read of data registers. It also allows start/stop.
    Perfmon2 allows this only for self-monitoring per-thread sessions.

    I think restricting per-thread mode to only self-monitoring is just too limiting
    even for a start.


    > e.g. here is one use case I would personally see as useful. We need
    > a replacement for simple cycle counting since RDTSC doesn't do that anymore
    > on modern x86 CPUs. It could be something like:
    >

    You can do exactly this with the perfmon2 interface as it exists today.
    Your example is perfectly fine, your interface works in your case.

    But you are driving the design of the interface from your very specific need
    and you are ignoring all the other usage models. This has been a problem with so
    many other interfaces and that explains the current situation. You have to
    take a broader view, look at what the hardware (across the board) provides and
    build from there. We do not need yet another interface to support one tool or one
    type of measurement, we need a true programming interface with a uniform set
    of calls. So sure, several calls may look overkill for basic measurements, but
    they become necessary with others.

    > /* 0 is the initial value */
    >
    > /* could be either library or syscall */
    > event = get_event(COUNTER_CYCLES);
    > if (event < 0)
    > /* CPU has no cycle counter */
    >
    > reg = setup_perfctr(event, 0 /* value */, LOCAL_EVENT); /* syscall */
    >
    > rdpmc(reg, start);
    > .... some code to run ...
    > rdpmc(reg, end);
    >
    > free_perfctr(reg); /* syscall */
    >

    --
    -Stephane
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [perfmon] Re: [perfmon2] perfmon2 merge news


    On Wed, Nov 14, 2007 at 10:44:56PM +1100, Paul Mackerras wrote:
    > David Miller writes:
    >
    > > This is my impression too, all of the things being done with
    > > a slew of system calls would be better served by real special
    > > files and appropriate fops.

    >
    > Special files and fops really only work well if you can coerce the
    > interface into one where data flows predominantly one way. I don't
    > think they work so well for something that is more like an RPC across
    > the user/kernel barrier. For that a system call is better.
    >
    > For instance, if you have something that kind-of looks like
    >
    > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
    >
    > where the caller supplies an array of PMD numbers and the function
    > returns their values (and you want that reading to be done atomically
    > in some sense), how would you do that using special files and fops?
    >

    Yes, the read call could be simplified to the level proposed above by Paul.

    --
    -Stephane
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 6 FirstFirst 1 2 3 4 5 ... LastLast