From 8be7258aad44b5e25977a98db136f677fa6f4370 Mon Sep 17 00:00:00 2001 From: Jeff Xu Date: Mon, 15 Apr 2024 16:35:21 +0000 Subject: mseal: add mseal syscall MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. Following input during RFC are incooperated into this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Linus Torvalds: assisting in defining system call signature and scope. Liam R. Howlett: perf optimization. Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. Finally, the idea that inspired this patch comes from Stephen Röttger's work in Chrome V8 CFI. [jeffxu@chromium.org: add branch prediction hint, per Pedro] Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org Signed-off-by: Jeff Xu Reviewed-by: Kees Cook Reviewed-by: Liam R. Howlett Cc: Pedro Falcato Cc: Dave Hansen Cc: Greg Kroah-Hartman Cc: Guenter Roeck Cc: Jann Horn Cc: Jeff Xu Cc: Jonathan Corbet Cc: Jorge Lucangeli Obes Cc: Linus Torvalds Cc: Matthew Wilcox (Oracle) Cc: Muhammad Usama Anjum Cc: Pedro Falcato Cc: Stephen Röttger Cc: Suren Baghdasaryan Cc: Amer Al Shanawany Cc: Javier Carrasco Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/syscalls.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e619ac10cd23..9104952d323d 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -821,6 +821,7 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags); asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size, unsigned long prot, unsigned long pgoff, unsigned long flags); +asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long flags); asmlinkage long sys_mbind(unsigned long start, unsigned long len, unsigned long mode, const unsigned long __user *nmask, -- cgit v1.2.3