diff options
Diffstat (limited to 'Documentation/vm/mmu_notifier.rst')
-rw-r--r-- | Documentation/vm/mmu_notifier.rst | 99 |
1 files changed, 0 insertions, 99 deletions
diff --git a/Documentation/vm/mmu_notifier.rst b/Documentation/vm/mmu_notifier.rst deleted file mode 100644 index df5d7777fc6b..000000000000 --- a/Documentation/vm/mmu_notifier.rst +++ /dev/null @@ -1,99 +0,0 @@ -.. _mmu_notifier: - -When do you need to notify inside page table lock ? -=================================================== - -When clearing a pte/pmd we are given a choice to notify the event through -(notify version of \*_clear_flush call mmu_notifier_invalidate_range) under -the page table lock. But that notification is not necessary in all cases. - -For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use -thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a -process virtual address space). There is only 2 cases when you need to notify -those secondary TLB while holding page table lock when clearing a pte/pmd: - - A) page backing address is free before mmu_notifier_invalidate_range_end() - B) a page table entry is updated to point to a new page (COW, write fault - on zero page, __replace_page(), ...) - -Case A is obvious you do not want to take the risk for the device to write to -a page that might now be used by some completely different task. - -Case B is more subtle. For correctness it requires the following sequence to -happen: - - - take page table lock - - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify()) - - set page table entry to point to new page - -If clearing the page table entry is not followed by a notify before setting -the new pte/pmd value then you can break memory model like C11 or C++11 for -the device. - -Consider the following scenario (device use a feature similar to ATS/PASID): - -Two address addrA and addrB such that \|addrA - addrB\| >= PAGE_SIZE we assume -they are write protected for COW (other case of B apply too). - -:: - - [Time N] -------------------------------------------------------------------- - CPU-thread-0 {try to write to addrA} - CPU-thread-1 {try to write to addrB} - CPU-thread-2 {} - CPU-thread-3 {} - DEV-thread-0 {read addrA and populate device TLB} - DEV-thread-2 {read addrB and populate device TLB} - [Time N+1] ------------------------------------------------------------------ - CPU-thread-0 {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}} - CPU-thread-1 {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}} - CPU-thread-2 {} - CPU-thread-3 {} - DEV-thread-0 {} - DEV-thread-2 {} - [Time N+2] ------------------------------------------------------------------ - CPU-thread-0 {COW_step1: {update page table to point to new page for addrA}} - CPU-thread-1 {COW_step1: {update page table to point to new page for addrB}} - CPU-thread-2 {} - CPU-thread-3 {} - DEV-thread-0 {} - DEV-thread-2 {} - [Time N+3] ------------------------------------------------------------------ - CPU-thread-0 {preempted} - CPU-thread-1 {preempted} - CPU-thread-2 {write to addrA which is a write to new page} - CPU-thread-3 {} - DEV-thread-0 {} - DEV-thread-2 {} - [Time N+3] ------------------------------------------------------------------ - CPU-thread-0 {preempted} - CPU-thread-1 {preempted} - CPU-thread-2 {} - CPU-thread-3 {write to addrB which is a write to new page} - DEV-thread-0 {} - DEV-thread-2 {} - [Time N+4] ------------------------------------------------------------------ - CPU-thread-0 {preempted} - CPU-thread-1 {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}} - CPU-thread-2 {} - CPU-thread-3 {} - DEV-thread-0 {} - DEV-thread-2 {} - [Time N+5] ------------------------------------------------------------------ - CPU-thread-0 {preempted} - CPU-thread-1 {} - CPU-thread-2 {} - CPU-thread-3 {} - DEV-thread-0 {read addrA from old page} - DEV-thread-2 {read addrB from new page} - -So here because at time N+2 the clear page table entry was not pair with a -notification to invalidate the secondary TLB, the device see the new value for -addrB before seeing the new value for addrA. This break total memory ordering -for the device. - -When changing a pte to write protect or to point to a new write protected page -with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range -call to mmu_notifier_invalidate_range_end() outside the page table lock. This -is true even if the thread doing the page table update is preempted right after -releasing page table lock but before call mmu_notifier_invalidate_range_end(). |