On Tue, Apr 08, 2008 at 01:23:33PM -0700, Christoph Lameter wrote:
> It may also be useful to allow invalidate_start() to fail in some contexts
> (try_to_unmap f.e., maybe if a certain flag is passed). This may allow the
> device to get out of tight situations (pending I/O f.e. or time out if
> there is no response for network communications). But then that
> complicates the API.

That also complicates the fact that there can't be a spte mapped and a
pte not mapped or the spte would leak unswappable memory, so a failure
should re-establish the pte and undo the ptep_clear_flush or
equivalent... I think we can change the API later if needed. This is
an internal-only API invisible to userland so it can change and break
anytime to make the whole kernel faster and better (ask Greg for
kernel internal APIs).

One important detail is that because the secondary mmu page fault can
happen concurrently against invaldiate_page (there wasn't a
range_begin to block it), the secondary mmu page fault must ensure
that the pte is still established, before establishing the spte (with
proper locking that will block a concurrent invalidate_page). Having a
range_begin before the ptep_clear_flush effectively make lifes a bit
easier but it's not needed as those are locking issues that the driver
can solve (unlike range_begin being missed, now fixed by mm_lock) and
this allows for higher performance both when the lock is armed and
disarmed. I'm going to solve all the locking for kvm with spinlocks
and/or seqlocks to avoid any dependency on the patches that makes the
mmu notifier sleep capable.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/