[PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes - Kernel
This is a discussion on [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes - Kernel ; [H. Peter Anvin - Wed, Nov 05, 2008 at 10:04:50AM -0800]
| Cyrill Gorcunov wrote:
| >
| > Ingo, what the conclusion is? As I understand from the thread --
| >
| > 1) Implement Peter's proposed cleanup/compress.
...
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
[H. Peter Anvin - Wed, Nov 05, 2008 at 10:04:50AM -0800]
| Cyrill Gorcunov wrote:
| >
| > Ingo, what the conclusion is? As I understand from the thread --
| >
| > 1) Implement Peter's proposed cleanup/compress.
| > 2) Test Alexander's patche.
| >
| > Did I miss something?
| >
|
| Nope, that's pretty much it.
|
| However, there are good reason to believe that using this kind of
| segment selector tricks is probably a bad idea in the long term,
| especially since CPU vendors have strong incentives to reduce the size
| of the segment descriptor cache now when none of the mainstream OSes
| rely on more than a small handful of segments.
|
| I was planning to look at doing the obvious stub shrink today.
|
| -hpa
|
I see. Thanks! Btw Peter, I remember I read long time ago about
segment caches (well... in time of DOS programming actually). But
there was only 'common' words like this cache exist. But maybe
it's possible to know what exactly size of such a cache is?
You mentoined number 32. (heh... I hadn't remember it until
you mentoined about such a cache :-)
- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
[Andi Kleen - Tue, Nov 04, 2008 at 06:05:01PM +0100]
| > not taking into account the cost of cs reading (which I
| > don't suspect to be that expensive apart from writting,
|
| GDT accesses have an implied LOCK prefix. Especially
| on some older CPUs that could be slow.
|
| I don't know if it's a problem or not but it would need
| some careful benchmarking on different systems to make sure interrupt
| latencies are not impacted.
|
| Another reason I would be also careful with this patch is that
| it will likely trigger slow paths in JITs like qemu/vmware/etc.
|
| Also code segment switching is likely not something that
| current and future micro architectures will spend a lot of time optimizing.
|
| I'm not sure that risk is worth the small improvement in code
| size.
|
| An alternative BTW to having all the stubs in the executable
| would be to just dynamically generate them when the interrupt
| is set up. Then you would only have the stubs around for the
| interrupts which are actually used.
|
| -Andi
|
Thanks a lot for comments, Andi!
- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
[H. Peter Anvin - Wed, Nov 05, 2008 at 10:20:23AM -0800]
| Cyrill Gorcunov wrote:
| >
| > I see. Thanks! Btw Peter, I remember I read long time ago about
| > segment caches (well... in time of DOS programming actually). But
| > there was only 'common' words like this cache exist. But maybe
| > it's possible to know what exactly size of such a cache is?
| > You mentoined number 32. (heh... I hadn't remember it until
| > you mentoined about such a cache :-)
| >
|
| As with any other caching structure, you can discover its size,
| associativity, and replacement policy by artificially trying to provoke
| patterns that produce pathological timings.
|
| At Transmeta, at one time we used a 32-entry direct-mapped cache, which
| ended up with a ~96% hit rate on common Win95 benchmarks.
|
| I should, however, make it clear that there are other alternatives for
| speeding up segment descriptor loading, and not all of them rely on a cache.
|
| -hpa
|
Thanks a lot for explanation!
- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Cyrill Gorcunov wrote:
>
> I see. Thanks! Btw Peter, I remember I read long time ago about
> segment caches (well... in time of DOS programming actually). But
> there was only 'common' words like this cache exist. But maybe
> it's possible to know what exactly size of such a cache is?
> You mentoined number 32. (heh... I hadn't remember it until
> you mentoined about such a cache :-)
>
As with any other caching structure, you can discover its size,
associativity, and replacement policy by artificially trying to provoke
patterns that produce pathological timings.
At Transmeta, at one time we used a 32-entry direct-mapped cache, which
ended up with a ~96% hit rate on common Win95 benchmarks.
I should, however, make it clear that there are other alternatives for
speeding up segment descriptor loading, and not all of them rely on a cache.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* Jeremy Fitzhardinge wrote:
> Why are the accesses locked? Is it because it does an update of the
> accessed bit in the descriptor? (We should be pre-setting them all
> anyway.)
yes, the accessed bit in the segment descriptor has to be updated in
an atomic transaction: the CPU has to do a MESI coherent
read+compare+write transaction, without damaging other updates to the
6 bytes segment descriptor.
Old OSs implemented paging to disk by swapping out segments based on
the accessed bit, and clearing the present and accessed bit when the
segment is swapped out.
But given that all our GDT entries have the accessed bit set on Linux,
there's no physical reason why the CPU should be using a locked cycle
here - only to stay compatible with ancient stuff.
So ... that notion just survived in the backwards-compatibility stream
of CPU enhancements, over the past 10 years.
On 64-bit Linux there's no reason to maintain that principle, so i'd
expect future CPUs to relax this even more, were it ever to show up on
the performance radar. Note that SYSCALL/SYSRET already optimize that
away.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* Alexander van Heukelum wrote:
> > | > | Opteron (cycles): 1024 / 1157 / 3527
> > | > | Xeon E5345 (cycles): 1092 / 1085 / 6622
> > | > | Athlon XP (cycles): 1028 / 1166 / 5192
> > | >
> > | > Xeon is defenitely out of luck :-)
> > |
> > | it's still OK - i.e. no outrageous showstopper overhead anywhere in
> > | that instruction sequence. The total round-trip overhead is what will
> > | matter most.
> > |
> > | Ingo
> > |
> >
> > Don't get me wrong please, I really like what Alexander have done!
> > But frankly six time slower is a bit scarying me.
the cost is 6 cycles instead of 1 cycles. In a codepath that takes
thousands of cycles and is often cache-limited.
> Thanks again
. Now it _is_ six times slower to do this tiny piece
> of code... But please keep in mind all the activity that follows to
> save the current data segment registers (the stack segment and code
> segment are saved automatically), the general purpose registers and
> to load most of the data segments with kernel-space values. And
> looking at it now... do_IRQ is also not exactly trivial.
>
> Also, I kept the information that is saved on the stack exactly the
> same. If this is not a requirement, "push %cs" is what is left of
> this expensive (6 cycle!) sequence. Even that could be unnecessary
> if the stack layout can be changed... But I'ld like to consider that
> separately.
we really want to keep the stack frame consistent between all the
context types. We can do things like return-to-userspace-from-irq or
schedule-from-irq-initiated-event, etc. - so crossing between these
context frames has to be standard and straightforward.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* H. Peter Anvin wrote:
> Ingo Molnar wrote:
>>
>> yes, the accessed bit in the segment descriptor has to be updated
>> in an atomic transaction: the CPU has to do a MESI coherent
>> read+compare+write transaction, without damaging other updates to
>> the 6 bytes segment descriptor.
>
> 8 bytes, rather.
heh, yes of course :-)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Ingo Molnar wrote:
>
> yes, the accessed bit in the segment descriptor has to be updated in
> an atomic transaction: the CPU has to do a MESI coherent
> read+compare+write transaction, without damaging other updates to the
> 6 bytes segment descriptor.
>
8 bytes, rather.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Alexander van Heukelum wrote:
>
> In general: after applying the patch, latencies are more
> often seen by the rdtsctest. It also seems to cause a
> small percentage decrease in speed of hackbench.
> Looking at the latency histograms I believe this is
> a real effect, but I could not do enough boots/runs to
> make this a certainty from the runtimes alone.
>
> At least for this PC, doing hpa's suggested cleanup of
> the stub table is the right way to go for now... A
> second option would be to get rid of the stub table by
> assigning each important vector a unique handler and
> to make sure those handlers do not rely on the vector
> number at all.
>
Hi Alexander,
First of all, great job on the timing analysis. I believe this confirms
the concerns that I had about this technique.
Here is a prototype patch of the compressed IRQ stubs -- this patch
compresses them down to 7 stubs per 32-byte cache line (or part of cache
line) at the expense of a back-to-back jmp which has the potential of
being ugly on some pipelines (we can only get 4 stubs into 32 bytes
without that).
Would you be willing to run your timing test on this patch? This isn't
submission-quality since it commingles multiple changes, and it needs
some cleanup, but it should be useful for testing.
As a side benefit it eliminates some gratuitous differences between the
32- and 64-bit code.
-hpa
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* Alexander van Heukelum wrote:
> Hi all,
>
> I have spent some time trying to find out how expensive the
> segment-switching patch was. I have only one computer available at
> the time: a "Sempron 2400+", 32-bit-only machine.
>
> Measured were timings of "hackbench 10" in a loop. The average was
> taken of more than 100 runs. Timings were done for two seperate
> boots of the system.
hackbench is _way_ too noisy to measure such cycle-level differences
as irq entry changes cause. It also does not really stress interrupts
- it only stresses networking, the VFS and the scheduler.
a better test might have been to generate a ton of interrupts, but
even then it's _very_ hard to measure it properly. The best method is
what i've suggested to you early on: run a loop in user-space and
observe irq costs via RDTSC, as they happen. Then build a histogram
and compare the before/after histogram. Compare best-case results as
well (the first slot of the histogram), as those are statistically
much more significant than a noisy average.
Measuring such things in a meaningful way is really tricky business.
Using hackbench to measure IRQ entry micro-costs is like trying to
take a photo of a delicate flower at night, by using an atomic bomb as
the flash-light: you certainly get some sort of effect to report, but
there's not many nuances left in the picture to really look at ;-)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
On Mon, 10 Nov 2008 09:58:46 +0100, "Ingo Molnar" said:
> * Alexander van Heukelum wrote:
> > Hi all,
> >
> > I have spent some time trying to find out how expensive the
> > segment-switching patch was. I have only one computer available at
> > the time: a "Sempron 2400+", 32-bit-only machine.
> >
> > Measured were timings of "hackbench 10" in a loop. The average was
> > taken of more than 100 runs. Timings were done for two seperate
> > boots of the system.
Hi Ingo,
I guess you just stopped reading here?
> hackbench is _way_ too noisy to measure such cycle-level differences
> as irq entry changes cause. It also does not really stress interrupts
> - it only stresses networking, the VFS and the scheduler.
>
> a better test might have been to generate a ton of interrupts, but
> even then it's _very_ hard to measure it properly.
I should have presented the second benchmark as the first I
guess. I really just used hackbench as a workload. I gathered
it would give a good amount of exceptions like page faults and
maybe others. It would be nice to have a simple debug switch in
the kernel to make it generate a lot of interrupts, though
.
> The best method is
> what i've suggested to you early on: run a loop in user-space and
> observe irq costs via RDTSC, as they happen. Then build a histogram
> and compare the before/after histogram. Compare best-case results as
> well (the first slot of the histogram), as those are statistically
> much more significant than a noisy average.
See the rest of the mail you replied to and its attachment. I've put
the programs I used and the histogram in
http://heukelum.fastmail.fm/irqstubs/
I think rdtsctest.c is pretty much what you describe.
Greetings,
Alexander
> Measuring such things in a meaningful way is really tricky business.
> Using hackbench to measure IRQ entry micro-costs is like trying to
> take a photo of a delicate flower at night, by using an atomic bomb as
> the flash-light: you certainly get some sort of effect to report, but
> there's not many nuances left in the picture to really look at ;-)
>
> Ingo
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Same, same, but different...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* Alexander van Heukelum wrote:
> On Mon, 10 Nov 2008 09:58:46 +0100, "Ingo Molnar" said:
> > * Alexander van Heukelum wrote:
> > > Hi all,
> > >
> > > I have spent some time trying to find out how expensive the
> > > segment-switching patch was. I have only one computer available at
> > > the time: a "Sempron 2400+", 32-bit-only machine.
> > >
> > > Measured were timings of "hackbench 10" in a loop. The average was
> > > taken of more than 100 runs. Timings were done for two seperate
> > > boots of the system.
>
> Hi Ingo,
>
> I guess you just stopped reading here?
yeah, sorry! You describe and did exactly the kind of histogram that i
wanted to see done ;-)
I'm not sure i can read out the same thing from the result though.
Firstly, it seems the 'after' histograms are better, because there the
histogram shifted towards shorter delays. (i.e. lower effective irq
entry overhead)
OTOH, unless i'm misreading them, it's a bit hard to compare them
visually: the integral of the histograms does not seem to be constant,
they dont seem to be normalized.
It should be made constant for them to be comparable. (i.e. the total
number of irq hits profiled should be equal - or should be normalized
with the sum after the fact)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Ingo Molnar wrote:
> * Alexander van Heukelum wrote:
>
> hackbench is _way_ too noisy to measure such cycle-level differences
> as irq entry changes cause. It also does not really stress interrupts
> - it only stresses networking, the VFS and the scheduler.
>
> a better test might have been to generate a ton of interrupts, but
> even then it's _very_ hard to measure it properly. The best method is
> what i've suggested to you early on: run a loop in user-space and
> observe irq costs via RDTSC, as they happen. Then build a histogram
> and compare the before/after histogram. Compare best-case results as
> well (the first slot of the histogram), as those are statistically
> much more significant than a noisy average.
>
For what it's worth, I tested this out, and I'm pretty sure you need to
run a uniprocessor configuration (or system) for it to make sense --
otherwise you end up missing too many of the interrupts. I first tested
this on an 8-processor system and, well, came up with nothing.
I'm going to try this later on a uniprocessor, unless Alexander beats me
to it.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
On Mon, 10 Nov 2008 14:07:09 +0100, "Ingo Molnar" said:
> * Alexander van Heukelum wrote:
> > On Mon, 10 Nov 2008 09:58:46 +0100, "Ingo Molnar" said:
> > > * Alexander van Heukelum wrote:
> > > > Hi all,
> > > >
> > > > I have spent some time trying to find out how expensive the
> > > > segment-switching patch was. I have only one computer available at
> > > > the time: a "Sempron 2400+", 32-bit-only machine.
> > > >
> > > > Measured were timings of "hackbench 10" in a loop. The average was
> > > > taken of more than 100 runs. Timings were done for two seperate
> > > > boots of the system.
> >
> > Hi Ingo,
> >
> > I guess you just stopped reading here?
>
> yeah, sorry! You describe and did exactly the kind of histogram that i
> wanted to see done ;-)
I thought so
.
> I'm not sure i can read out the same thing from the result though.
> Firstly, it seems the 'after' histograms are better, because there the
> histogram shifted towards shorter delays. (i.e. lower effective irq
> entry overhead)
>
> OTOH, unless i'm misreading them, it's a bit hard to compare them
> visually: the integral of the histograms does not seem to be constant,
> they dont seem to be normalized.
The total number of measured intervals (between two almost-adjacent
rdtsc's) is exactly the same for all histograms (10^10). Almost all
measurements are of the "nothing happened" type, i.e., around 11
clock cycles on this machine. The user time spent inside the
rdtsctest program is almost independent of the load, but it
measures time spent outside of the program... But what should be
attributed to what effect is unclear to me at the moment.
> It should be made constant for them to be comparable. (i.e. the total
> number of irq hits profiled should be equal - or should be normalized
> with the sum after the fact)
Basically the difference between the "idle" and "hack10" versions
should indicate the effect of extra interrupts (timer) and additional
exceptions and cache effects due to context switching.
Thanks,
Alexander
> Ingo
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - I mean, what is it about a decent email service?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
On Mon, 10 Nov 2008 07:39:22 -0800, "H. Peter Anvin"
said:
> Ingo Molnar wrote:
> > * Alexander van Heukelum wrote:
> >
> > hackbench is _way_ too noisy to measure such cycle-level differences
> > as irq entry changes cause. It also does not really stress interrupts
> > - it only stresses networking, the VFS and the scheduler.
> >
> > a better test might have been to generate a ton of interrupts, but
> > even then it's _very_ hard to measure it properly. The best method is
> > what i've suggested to you early on: run a loop in user-space and
> > observe irq costs via RDTSC, as they happen. Then build a histogram
> > and compare the before/after histogram. Compare best-case results as
> > well (the first slot of the histogram), as those are statistically
> > much more significant than a noisy average.
> >
>
> For what it's worth, I tested this out, and I'm pretty sure you need to
> run a uniprocessor configuration (or system) for it to make sense --
> otherwise you end up missing too many of the interrupts. I first tested
> this on an 8-processor system and, well, came up with nothing.
>
> I'm going to try this later on a uniprocessor, unless Alexander beats me
> to it.
I did the rdtsctest again for the irqstubs patch you sent. The data
is at http://heukelum.fastmail.fm/irqstubs/ and the latency histogram
is http://heukelum.fastmail.fm/irqstubs/latency_hpa.png
Greetings,
Alexander
> -hpa
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Same, same, but different...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Alexander van Heukelum wrote:
>>
>> OTOH, unless i'm misreading them, it's a bit hard to compare them
>> visually: the integral of the histograms does not seem to be constant,
>> they dont seem to be normalized.
>
> The total number of measured intervals (between two almost-adjacent
> rdtsc's) is exactly the same for all histograms (10^10). Almost all
> measurements are of the "nothing happened" type, i.e., around 11
> clock cycles on this machine. The user time spent inside the
> rdtsctest program is almost independent of the load, but it
> measures time spent outside of the program... But what should be
> attributed to what effect is unclear to me at the moment.
>
I believe you need to remove the obvious null events at the low end (no
interrupt happened) and renormalize to the same scale for the histograms
to make sense.
As it is, the difference in the number of events that actually matters
dominate the graphs; for example, there are 142187 events >= 12 in
hack10ticks, but 136533 in hack10ticks_hpa.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Alexander van Heukelum wrote:
>
> I did the rdtsctest again for the irqstubs patch you sent. The data
> is at http://heukelum.fastmail.fm/irqstubs/ and the latency histogram
> is http://heukelum.fastmail.fm/irqstubs/latency_hpa.png
>
Okay, I've stared at a bunch of different transformations of this data
and I'm starting to think that it's getting lost in the noise. The
difference between your "idleticks" and "idleticks2" data sets, for
example, is as big as the differences between any two data sets that I
can see.
Just for reference, see this graph where I have filtered out events
outside the [30..1000] cycle range and renormalized.
http://www.zytor.com/~hpa/hist.pdf
I don't know how to even figure out what a realistic error range looks
like, other than repeating each run something like 100+ times and do an
"eye chart" kind of diagram.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
Okay, after spending most of the day trying to get something that isn't
completely like white noise (interesting problem, otherwise I'd have
given up long ago) I did, eventually, come up with something that looks
like it's significant. I did a set of multiple runs, and am looking for
the "waterfall points" in the cumulative statistics.
http://www.zytor.com/~hpa/baseline-hpa-3000-3600.pdf
This particular set of data points was gathered on a 64-bit kernel, so I
didn't try the segment technique.
It looks to me that the collection of red lines is enough to the left of
the black ones that one can assume there is a significant effect,
probably by about a cache miss worth of time.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
* Alexander van Heukelum wrote:
> > OTOH, unless i'm misreading them, it's a bit hard to compare them
> > visually: the integral of the histograms does not seem to be
> > constant, they dont seem to be normalized.
>
> The total number of measured intervals (between two almost-adjacent
> rdtsc's) is exactly the same for all histograms (10^10). Almost all
> measurements are of the "nothing happened" type, i.e., around 11
> clock cycles on this machine. The user time spent inside the
> rdtsctest program is almost independent of the load, but it measures
> time spent outside of the program... But what should be attributed
> to what effect is unclear to me at the moment.
a high-pass filter should be applied in any case, to filter out the
"nothing happened" baseline. Eliminating every delta below 500-1000
cycles would do the trick i think, all IRQ costs are at least 1000
cycles.
then a low-pass filter should be applied to eliminate non-irq noise
such as scheduling effects or expensive irqs (which are both
uninteresting to such analysis).
and then _that_ double-filtered dataset should be normalized: the
number of events should be made the same. (just clip the larger
dataset to the length of the smaller dataset)
> > It should be made constant for them to be comparable. (i.e. the
> > total number of irq hits profiled should be equal - or should be
> > normalized with the sum after the fact)
>
> Basically the difference between the "idle" and "hack10" versions
> should indicate the effect of extra interrupts (timer) and
> additional exceptions and cache effects due to context switching.
i was only looking at before/after duos, for the same basic type of
workload. Idle versus hackbench is indeed apples to oranges.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/