[PATCH] fix up perfmon to build on -mm - Kernel
This is a discussion on [PATCH] fix up perfmon to build on -mm - Kernel ; Andrew Morton writes:
> I was hoping that after the round of release-and-review which Stephane,
> Andi and I did about twelve months ago that we were on track to merge the
> perfmon codebase as-offered. But now it turns ...
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Andrew Morton writes:
> I was hoping that after the round of release-and-review which Stephane,
> Andi and I did about twelve months ago that we were on track to merge the
> perfmon codebase as-offered. But now it turns out that the sentiment is
> that the code simply has too many bells-and-whistles to be acceptable.
Whose sentiment?
I've had a bit of a look at it today together with David Gibson. Our
impression is that the latest version is a lot cleaner and simpler
than it used to be. I'm also reading Stephane's technical report
which describes the interface, and whilst I'm only part-way through
it, I haven't seen anything yet which strikes me as unnecessary or
overly complicated.
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, 14 Nov 2007 18:24:36 +1100 Paul Mackerras wrote:
> Andrew Morton writes:
>
> > I was hoping that after the round of release-and-review which Stephane,
> > Andi and I did about twelve months ago that we were on track to merge the
> > perfmon codebase as-offered. But now it turns out that the sentiment is
> > that the code simply has too many bells-and-whistles to be acceptable.
>
> Whose sentiment?
Andi and hch, maybe others I've forgotten about.
> I've had a bit of a look at it today together with David Gibson. Our
> impression is that the latest version is a lot cleaner and simpler
> than it used to be. I'm also reading Stephane's technical report
> which describes the interface, and whilst I'm only part-way through
> it, I haven't seen anything yet which strikes me as unnecessary or
> overly complicated.
Yes, that's quite possible. I don't know how up-to-date people's
knowledge is. I know I haven't looked seriously at the code in around
twelve months.
Let's get it on the wires as outlined and take a look at it all.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, Nov 14, 2007 at 06:24:36PM +1100, Paul Mackerras wrote:
> Whose sentiment?
Mine for example. The whole userspace interface is just on crack,
and the code is full of complexities aswell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Christoph Hellwig writes:
> Mine for example. The whole userspace interface is just on crack,
> and the code is full of complexities aswell.
Could you give some _technical_ details of what you don't like?
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, Nov 14, 2007 at 09:43:02PM +1100, Paul Mackerras wrote:
> Christoph Hellwig writes:
>
> > Mine for example. The whole userspace interface is just on crack,
> > and the code is full of complexities aswell.
>
> Could you give some _technical_ details of what you don't like?
I've done this a gazillion times before, so maybe instead of beeing a lazy
bastard you could look up mailinglist archive. It's not like this is the
first discussion of perfmon. But to get start look at the systems calls,
many of them are beasts like:
int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
This is basically a read(2) (or for other syscalls a write) on something
else than the file descriptor provided to the system call. The right thing
to do is obviously have a pmds and pmcs file in procfs for the thread beeing
monitored instead of these special-case files, with another set for global
tracing. Similarly I'm pretty sure we can get a much better interface
if we introduce marching files in procfs for the other calls.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Ok, I just got 4 freakin' bounces from all of these subscriber only
perfmon etc. mailing lists.
Please remove those lists from the CC: as it's pointless for those of
us not on the lists to participate if those lists can't even see the
feedback we are giving.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
From: Christoph Hellwig
Date: Wed, 14 Nov 2007 11:00:09 +0000
> I've done this a gazillion times before, so maybe instead of beeing a lazy
> bastard you could look up mailinglist archive. It's not like this is the
> first discussion of perfmon. But to get start look at the systems calls,
> many of them are beasts like:
>
> int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
>
> This is basically a read(2) (or for other syscalls a write) on something
> else than the file descriptor provided to the system call. The right thing
> to do is obviously have a pmds and pmcs file in procfs for the thread beeing
> monitored instead of these special-case files, with another set for global
> tracing. Similarly I'm pretty sure we can get a much better interface
> if we introduce marching files in procfs for the other calls.
This is my impression too, all of the things being done with
a slew of system calls would be better served by real special
files and appropriate fops. Whether the thing is some kind
of misc device or procfs is less important than simply getting
away from these system calls.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Christoph Hellwig writes:
> int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
>
> This is basically a read(2) (or for other syscalls a write) on something
> else than the file descriptor provided to the system call.
No it's not basically a read(). It's more like a request/reply
interface, which a read()/write() interface doesn't handle very well.
The request in this case is "tell me about this particular collection
of PMDs" and the reply is the values.
It seems to me that an important part of this is to be able to collect
values from several PMDs at a single point in time, or at least an
approximation to a single point in time. So that means that you don't
want a file per PMD either.
Basically we don't have a good abstraction for a request/reply (or
command/response) type of interface, and this is a case where we need
one. Having a syscall that takes a struct containing the request and
reply is as good a way as any, particularly for something that needs
to be quick.
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
David Miller writes:
> This is my impression too, all of the things being done with
> a slew of system calls would be better served by real special
> files and appropriate fops.
Special files and fops really only work well if you can coerce the
interface into one where data flows predominantly one way. I don't
think they work so well for something that is more like an RPC across
the user/kernel barrier. For that a system call is better.
For instance, if you have something that kind-of looks like
read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
where the caller supplies an array of PMD numbers and the function
returns their values (and you want that reading to be done atomically
in some sense), how would you do that using special files and fops?
> Whether the thing is some kind
> of misc device or procfs is less important than simply getting
> away from these system calls.
Why? What's inherently offensive about system calls?
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
From: Paul Mackerras
Date: Wed, 14 Nov 2007 22:44:56 +1100
> For instance, if you have something that kind-of looks like
>
> read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
>
> where the caller supplies an array of PMD numbers and the function
> returns their values (and you want that reading to be done atomically
> in some sense), how would you do that using special files and fops?
The same way we handle some of the multicast "getsockopt()"
calls. The parameters passed in are both inputs and outputs.
For the above example:
struct pmd_info {
int *pmd_numbers;
u64 *pmd_values;
int n;
} *p;
buffer_size = N;
p = malloc(buffer_size);
p->pmd_numbers = p + foo;
p->pmd_values = p + bar;
p->n = whatever(N);
err = read(fd, p, N);
It's definitely doable, use your imagination.
You can encode all kinds of operation types into the
header as well.
Another alternative is to use generic netlink.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
> David Miller writes:
> > This is my impression too, all of the things being done with
> > a slew of system calls would be better served by real special
> > files and appropriate fops.
>
> Special files and fops really only work well if you can coerce the
> interface into one where data flows predominantly one way. I don't
> think they work so well for something that is more like an RPC across
> the user/kernel barrier. For that a system call is better.
>
> For instance, if you have something that kind-of looks like
>
> read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
>
> where the caller supplies an array of PMD numbers and the function
> returns their values (and you want that reading to be done atomically
> in some sense), how would you do that using special files and fops?
Could you implement it with readv()?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
From: Paul Mackerras
Date: Wed, 14 Nov 2007 22:39:24 +1100
> No it's not basically a read(). It's more like a request/reply
> interface, which a read()/write() interface doesn't handle very well.
Yes it can, see my other reply.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
From: Nick Piggin
Date: Wed, 14 Nov 2007 10:49:48 +1100
> On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
> > David Miller writes:
> > > This is my impression too, all of the things being done with
> > > a slew of system calls would be better served by real special
> > > files and appropriate fops.
> >
> > Special files and fops really only work well if you can coerce the
> > interface into one where data flows predominantly one way. I don't
> > think they work so well for something that is more like an RPC across
> > the user/kernel barrier. For that a system call is better.
> >
> > For instance, if you have something that kind-of looks like
> >
> > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
> >
> > where the caller supplies an array of PMD numbers and the function
> > returns their values (and you want that reading to be done atomically
> > in some sense), how would you do that using special files and fops?
>
> Could you implement it with readv()?
Sure, why not? Just cook up an iovec. pmd_numbers goes to offset
X and pmd_values goes to offset Y, with some helpers like what
we have in the networking already for recvmsg.
But why would you want readv() for this? The syscall thing
Paul asked me to translate into a read() doesn't provide
iovec-like behavior so I don't see why readv() is necessary
at all.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
From: Paul Mackerras
Date: Wed, 14 Nov 2007 23:03:24 +1100
> You're suggesting that the behaviour of a read() should depend on what
> was in the buffer before the read? Gack! Surely you have better
> taste than that?
Absolutely that's what I mean, it's atomic and gives you exactly what
you need.
I see nothing wrong or gross with these semantics. Nothing in the
"book of UNIX" specifies that for a device or special file the passed
in buffer cannot contain input control data.
> > Another alternative is to use generic netlink.
>
> Then you end up with two system calls to get the data rather than one
> (one to send the request and another to read the reply). For
> something that needs to be quick that is a suboptimal interface.
Not necessarily, consider the possibility of using recvmsg() control
message data. With that it could be done in one go.
This also suggests that it could be implemented as it's own protocol
family.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
David Miller writes:
> The same way we handle some of the multicast "getsockopt()"
> calls. The parameters passed in are both inputs and outputs.
For a read??!!!
> For the above example:
>
> struct pmd_info {
> int *pmd_numbers;
> u64 *pmd_values;
> int n;
> } *p;
>
> buffer_size = N;
> p = malloc(buffer_size);
> p->pmd_numbers = p + foo;
> p->pmd_values = p + bar;
> p->n = whatever(N);
> err = read(fd, p, N);
You're suggesting that the behaviour of a read() should depend on what
was in the buffer before the read? Gack! Surely you have better
taste than that?
Or are you saying that a read (or write) has a side-effect of altering
some other area of memory besides the buffer you give to read()? That
seems even worse to me.
> Another alternative is to use generic netlink.
Then you end up with two system calls to get the data rather than one
(one to send the request and another to read the reply). For
something that needs to be quick that is a suboptimal interface.
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wednesday 14 November 2007 23:07, David Miller wrote:
> From: Paul Mackerras
> Date: Wed, 14 Nov 2007 23:03:24 +1100
>
> > You're suggesting that the behaviour of a read() should depend on what
> > was in the buffer before the read? Gack! Surely you have better
> > taste than that?
>
> Absolutely that's what I mean, it's atomic and gives you exactly what
> you need.
>
> I see nothing wrong or gross with these semantics. Nothing in the
> "book of UNIX" specifies that for a device or special file the passed
> in buffer cannot contain input control data.
True, but is it now any so different to an ioctl?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wednesday 14 November 2007 22:58, David Miller wrote:
> From: Nick Piggin
> Date: Wed, 14 Nov 2007 10:49:48 +1100
>
> > On Wednesday 14 November 2007 22:44, Paul Mackerras wrote:
> > > David Miller writes:
> > > > This is my impression too, all of the things being done with
> > > > a slew of system calls would be better served by real special
> > > > files and appropriate fops.
> > >
> > > Special files and fops really only work well if you can coerce the
> > > interface into one where data flows predominantly one way. I don't
> > > think they work so well for something that is more like an RPC across
> > > the user/kernel barrier. For that a system call is better.
> > >
> > > For instance, if you have something that kind-of looks like
> > >
> > > read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
> > >
> > > where the caller supplies an array of PMD numbers and the function
> > > returns their values (and you want that reading to be done atomically
> > > in some sense), how would you do that using special files and fops?
> >
> > Could you implement it with readv()?
>
> Sure, why not? Just cook up an iovec. pmd_numbers goes to offset
> X and pmd_values goes to offset Y, with some helpers like what
> we have in the networking already for recvmsg.
>
> But why would you want readv() for this? The syscall thing
> Paul asked me to translate into a read() doesn't provide
> iovec-like behavior so I don't see why readv() is necessary
> at all.
Ah sorry, that's what I get for typing before I think: of course
readv doesn't vectorise the right part of the equation.
What I really mean is a readv-like syscall, but one that also
vectorises the file offset. Maybe this is useful enough as a generic
syscall that also helps Paul's example...
Of course, I guess this all depends on whether the atomicity is an
important requirement. If not, you can obviously just do it with
multiple read syscalls...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Christoph Hellwig writes:
>
> I've done this a gazillion times before, so maybe instead of beeing a lazy
> bastard you could look up mailinglist archive. It's not like this is the
> first discussion of perfmon. But to get start look at the systems calls,
> many of them are beasts like:
>
> int pfm_read_pmds(int fd, pfarg_pmd_t *pmds, int n)
>
> This is basically a read(2) (or for other syscalls a write) on something
At least for x86 and I suspect some 1other architectures we don't
initially need a syscall at all for this. There is an instruction
RDPMC who can read a performance counter just fine. It is also much
faster and generally preferable for the case where a process measures
events about itself. In fact it is essential for one of the use cases
I would like to see perfmon used (replacement of RDTSC for cycle
counting)
Later a syscall might be needed with event multiplexing, but that seems
more like a far away non essential feature.
> else than the file descriptor provided to the system call. The right thing
I don't like read/write for this too much. I think it's better to
have individual syscalls. After all that is CPU state and having
syscalls for that does seem reasonable.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
Andi,
On Wed, Nov 14, 2007 at 03:07:02AM +0100, Andi Kleen wrote:
>
> [dropped all these bouncing email lists. Adding closed lists to public
> cc lists is just a bad idea]
>
Just want to make sure perfmon2 users participate in this discussion.
> > int
> > main(int argc, char **argv)
> > {
> > int ctx_fd;
> > pfarg_pmd_t pd[1];
> > pfarg_pmc_t pc[1];
> > pfarg_ctx_t ctx;
> > pfarg_load_t load_args;
> >
> > memset(&ctx, 0, sizeof(ctx));
> > memset(pc, 0, sizeof(pc));
> > memset(pd, 0, sizeof(pd));
> >
> > /* create session (context) and get file descriptor back (identifier) */
> > ctx_fd = pfm_create_context(&ctx, NULL, NULL, 0);
>
> There's nothing in your example that makes the file descriptor needed.
>
Partially true. The file descriptor becomes really useful when you sample.
You leverage the file descriptor to receive notifications of counter overflows
and full sampling buffer. You extract notification messages via read() and you can
use SIGIO, select/poll.
The example shows how you can leverage existing mechanisms to destroy the session, i.e.,
free the associated kernel resources. For that, you use close() instead of adding yet
another syscall. It also provides a resource limitation mechanisms to control consumption
of kernel memory, i.e., you can only create as many sessions as you can have open files.
> >
> > /* setup one config register (PMC0) */
> > pc[0].reg_num = 0
> > pc[0].reg_value = 0x1234;
>
> That would be nicer if it was just two arguments.
>
Are you suggesting something like: pfm_write_pmcs(fd, 0, 0x1234)?
That would be quite expensive when you have lots of registers to setup: one
syscall per register. The perfmon syscalls to read/write registers accept vector
of arguments to amortize the cost of the syscall over multiple registers
(similar to poll(2)).
With many tools, registers are not just setup once. During certain measurements,
data registers may be read multiple times. When you sample or multiplex at
the user level, you do need to reprogram the PMU state and that is on the critical
path.
You do not want a call that programs the entire PMU state all at once either. Many times,
you only want to modify a small subset. Having the full state does also cause some portability
problems.
> >
> > /* setup one data register (PMD0) */
> > pd[0].reg_num = 0;
> > pd[0].reg_value = 0;
>
> Why do you need to set the data register? Wouldn't it make
> more sense to let the kernel handle that and just return one.
>
It depends on what you are doing. Here, this was not really necessary. It was
meant to show how you can program the data registers as well. Perfmon2 provides
default values for all data registers. For counters, the value is guaranteed to
be zero.
But it is important to note that not all data registers are counters. That is the
case of Itanium 2, some are just buffers. On AMD Barcelona IBS several are buffers as
well, and some may need to be initialized to non zero value, i.e., the IBS sampling
period.
With event-based sampling, the period is expressed as the number of occurrences
of an event. For instance, you can say: " take a sample every 2000 L2 cache misses".
The way you express this with perfmon2 is that you program a counter to measure
L2 cache misses, and then you initialize the corresponding data register (counter)
to overflow after 2000 occurrences. Given that the interface guarantees all counters
are 64-bit regardless of the hardware, you simply have to program the counter to -2000.
Thus you see that you need a call to actual program the data registers.
> >
> > /* program the registers */
> > pfm_write_pmcs(ctx_fd, pc, 1);
> > pfm_write_pmds(ctx_fd, pd, 1);
> >
> > /* attach the context to self */
> > load_args.load_pid = getpid();
> > pfm_load_context(ctx_fd, &load_args);
>
> My replacement would be to just add a flags argument to write_pmcs
> with one flag bit meaning "GLOBAL CONTEXT" versus "MY CONTEXT"
> >
You are mixing PMU programming with the type of measurement you want to do.
Perfmon2 decouples the two operations. In fact, no PMU hardware is actually touched
before you attach to either a CPU or a thread. This way, you can prepare your measurement
and then attach-and-go. Thus is is possible to create batches of ready-to-go sessions.
That is useful, for instance, when you are trying to measure across fork, pthread_create
which you can catch on-the-fly.
Take the per-thread example, you can setup your session before you fork/exec the program
you want to measure.
Note also that perfmon2 supports attaching to an already running thread. So there is
more than "GLOBAL CONTEXT" versus "MY CONTEXT".
> > /* activate monitoring */
> > pfm_start(ctx_fd, NULL);
>
> Why can't that be done by the call setting up the register?
>
Good question. If you do what say, you assume that the start/stop bit lives in the
config (or data) registers of the PMU. This is not true on all hardware. On Itanium
for instance, the start/stop bit is part of the Processor Status Register (psr).
That is not a PMU register.
On X86, you set the enable bit the PERFEVTSEL, but nothing really happens until you issue
pfm_start(), i.e., the PERFEVTSEL registers are not touched until then.
> Or if someone needs to do it for a specific region they can read
> the register before and then afterwards.
>
> >
> > /*
> > * run code to measure
> > */
> >
> > /* stop monitoring */
> > pfm_stop(ctx_fd);
> >
> > /* read data register */
> > pfm_read_pmds(ctx_fd, pd, 1);
>
> On x86 i think it would be much simpler to just let the set/alloc
> register call return a number and then use RDPMC directly. That would
> be actually faster and be much simpler too.
>
One approach does not prevent the other. Assuming you allow cr4.pce, then nothing prevents
a self-monitoring thread from reading the counters directly. You'll just get the
lower 32-bit of it. So if you read frequently enough, you should not have a problem.
But keep in mind that we do want a uniform interface across all hardware and all type
of sessions (self-monitoring, CPU-wide, monitoring of another thread). You don't want
an interface that says on x86 you have to use rdpmc, on Itanium pfm_read_pmds() and so
on. You want an interface that guarantees that with pfm_read_pmds() you'll be able to
read on any hardware platforms, then on some you may be able to use a more efficient
method, e.g., rdpmc on X86.
Reducing performance monitoring to self-monitoring is not what we want. In fact, there
are only a few domains where you can actually do this and HPC is one of them. But in
many other situations, you cannot and don't want to have to instrument applications
or libraries to collect performance data. It is quite handy to be able to do:
$ pfmon /bin/ls
or
$ pfmon --attach-task=`pidof sshd` -timeout=10s
Also note that there is no guarantee that RDPMC allows you to access all data registers
on a PMU. For instance, on AMD Barcelona, it seems you cannot read the IBS register using
RDPMC.
> I suppose most architectures have similar facilities, if not a call could be
> added for them but it's not really essential. The call might be also needed
> for event multiplexing, but frankly I would just leave that out for now.
>
Itanium does allow user level read of data registers. It also allows start/stop.
Perfmon2 allows this only for self-monitoring per-thread sessions.
I think restricting per-thread mode to only self-monitoring is just too limiting
even for a start.
> e.g. here is one use case I would personally see as useful. We need
> a replacement for simple cycle counting since RDTSC doesn't do that anymore
> on modern x86 CPUs. It could be something like:
>
You can do exactly this with the perfmon2 interface as it exists today.
Your example is perfectly fine, your interface works in your case.
But you are driving the design of the interface from your very specific need
and you are ignoring all the other usage models. This has been a problem with so
many other interfaces and that explains the current situation. You have to
take a broader view, look at what the hardware (across the board) provides and
build from there. We do not need yet another interface to support one tool or one
type of measurement, we need a true programming interface with a uniform set
of calls. So sure, several calls may look overkill for basic measurements, but
they become necessary with others.
> /* 0 is the initial value */
>
> /* could be either library or syscall */
> event = get_event(COUNTER_CYCLES);
> if (event < 0)
> /* CPU has no cycle counter */
>
> reg = setup_perfctr(event, 0 /* value */, LOCAL_EVENT); /* syscall */
>
> rdpmc(reg, start);
> .... some code to run ...
> rdpmc(reg, end);
>
> free_perfctr(reg); /* syscall */
>
--
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [perfmon] Re: [perfmon2] perfmon2 merge news
On Wed, Nov 14, 2007 at 10:44:56PM +1100, Paul Mackerras wrote:
> David Miller writes:
>
> > This is my impression too, all of the things being done with
> > a slew of system calls would be better served by real special
> > files and appropriate fops.
>
> Special files and fops really only work well if you can coerce the
> interface into one where data flows predominantly one way. I don't
> think they work so well for something that is more like an RPC across
> the user/kernel barrier. For that a system call is better.
>
> For instance, if you have something that kind-of looks like
>
> read_pmds(int n, int *pmd_numbers, u64 *pmd_values);
>
> where the caller supplies an array of PMD numbers and the function
> returns their values (and you want that reading to be done atomically
> in some sense), how would you do that using special files and fops?
>
Yes, the read call could be simplified to the level proposed above by Paul.
--
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/