summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-01-08signal: Rename group_exit_task group_exec_taskEric W. Biederman3-14/+10
The only remaining user of group_exit_task is exec. Rename the field so that it is clear which part of the code uses it. Update the comment above the definition of group_exec_task to document how it is currently used. Link: https://lkml.kernel.org/r/20211213225350.27481-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08coredump: Stop setting signal->group_exit_taskEric W. Biederman1-2/+0
Currently the coredump code sets group_exit_task so that signal_group_exit() will return true during a coredump. Now that the coredump code always sets SIGNAL_GROUP_EXIT there is no longer a need to set signal->group_exit_task. Link: https://lkml.kernel.org/r/20211213225350.27481-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Remove SIGNAL_GROUP_COREDUMPEric W. Biederman3-5/+3
After the previous cleanups "signal->core_state" is set whenever SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested whenver the code wants to know if a coredump is in progress. The remaining tests of SIGNAL_GROUP_COREDUMP also test to see if SIGNAL_GROUP_EXIT is set. Similarly the only place that sets SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT. Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and remove it's definition. With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags by setting SIGNAL_GROUP_EXIT. Link: https://lkml.kernel.org/r/20211213225350.27481-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: During coredumps set SIGNAL_GROUP_EXIT in zap_processEric W. Biederman1-3/+3
There are only a few places that test SIGNAL_GROUP_EXIT and are not also already testing SIGNAL_GROUP_COREDUMP. This will not affect the callers of signal_group_exit as zap_process also sets group_exit_task so signal_group_exit will continue to return true at the same times. This does not affect wait_task_zombie as the none of the threads wind up in EXIT_ZOMBIE state during a coredump. This does not affect oom_kill.c:__task_will_free_mem as sig->core_state is tested and handled before SIGNAL_GROUP_EXIT is tested for. This does not affect complete_signal as signal->core_state is tested for to ensure the coredump case is handled appropriately. Link: https://lkml.kernel.org/r/20211213225350.27481-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Make coredump handling explicit in complete_signalEric W. Biederman1-1/+1
Ever since commit 6cd8f0acae34 ("coredump: ensure that SIGKILL always kills the dumping thread") it has been possible for a SIGKILL received during a coredump to set SIGNAL_GROUP_EXIT and trigger a process shutdown (for a second time). Update the logic to explicitly allow coredumps so that coredumps can set SIGNAL_GROUP_EXIT and shutdown like an ordinary process. Link: https://lkml.kernel.org/r/87zgo6ytyf.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Have prepare_signal detect coredumps using signal->core_stateEric W. Biederman1-2/+2
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change prepare_signal to test signal->core_state instead of the flag SIGNAL_GROUP_COREDUMP. Both fields are protected by siglock and both live in signal_struct so there are no real tradeoffs here, just a change to which field is being tested. Link: https://lkml.kernel.org/r/20211213225350.27481-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/875yqu14co.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Have the oom killer detect coredumps using signal->core_stateEric W. Biederman1-1/+1
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change __task_will_free_mem to test signal->core_state instead of the flag SIGNAL_GROUP_COREDUMP. Both fields are protected by siglock and both live in signal_struct so there are no real tradeoffs here, just a change to which field is being tested. Link: https://lkml.kernel.org/r/20211213225350.27481-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08exit: Move force_uaccess back into do_exitEric W. Biederman1-9/+14
With kernel threads on architectures that still have set_fs/get_fs running as KERNEL_DS moving force_uaccess_begin does not appear safe. Calling force_uaccess_begin is a noop on anything people care about. Update the comment to explain why this code while looking like an obvious candidate for moving to make_task_dead probably needs to remain in do_exit until set_fs/get_fs are entirely removed from the kernel. Fixes: 05ea0424f0e2 ("exit: Move oops specific logic from do_exit into make_task_dead") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/YdUxGKRcSiDy8jGg@zeniv-ca.linux.org.uk Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08exit: Guarantee make_task_dead leaks the tsk when calling do_task_exitEric W. Biederman1-0/+2
Change the task state to EXIT_DEAD and take an extra rcu_refernce to guarantee the task will not be reaped and that it will not be freed. Link: https://lkml.kernel.org/r/YdUzjrLAlRiNLQp2@zeniv-ca.linux.org.uk Pointed-out-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 7f80a2fd7db9 ("exit: Stop poorly open coding do_task_dead in make_task_dead") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08exit/xtensa: In arch/xtensa/entry.S:Linvalid_mask call make_task_deadEric W. Biederman1-1/+1
There have historically been two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. The function make_task_dead has been added to accomidate the second use. The call to do_exit in Linvalidmask is clearly not a normal userspace exit. As failure handling there are two possible ways to go. If userspace can trigger the issue force_exit_sig should be called. Otherwise make_task_dead probably from the implementation of die is appropriate. Replace the call of do_exit in Linvalidmask with make_task_dead as I don't know xtensa and especially xtensa assembly language well enough to do anything else. Link: https://lkml.kernel.org/r/YdUmN7n4W5YETUhW@zeniv-ca.linux.org.uk Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08csky: Fix function name in csky_alignment() and die()Nathan Chancellor2-2/+2
When building ARCH=csky defconfig: arch/csky/kernel/traps.c: In function 'die': arch/csky/kernel/traps.c:112:17: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 112 | make_dead_task(SIGSEGV); | ^~~~~~~~~~~~~~ The function's name is make_task_dead(), change it so there is no more build error. Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lkml.kernel.org/r/20211227184851.2297759-4-nathan@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-01-08h8300: Fix build errors from do_exit() to make_task_dead() transitionNathan Chancellor2-2/+3
When building ARCH=h8300 defconfig: arch/h8300/kernel/traps.c: In function 'die': arch/h8300/kernel/traps.c:109:2: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 109 | make_dead_task(SIGSEGV); | ^~~~~~~~~~~~~~ arch/h8300/mm/fault.c: In function 'do_page_fault': arch/h8300/mm/fault.c:54:2: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 54 | make_dead_task(SIGKILL); | ^~~~~~~~~~~~~~ The function's name is make_task_dead(), change it so there is no more build error. Additionally, include linux/sched/task.h in arch/h8300/kernel/traps.c to avoid the same error because do_exit()'s declaration is in kernel.h but make_task_dead()'s is in task.h, which is not included in traps.c. Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20211227184851.2297759-3-nathan@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-01-08hexagon: Fix function name in die()Nathan Chancellor1-1/+1
When building ARCH=hexagon defconfig: arch/hexagon/kernel/traps.c:217:2: error: implicit declaration of function 'make_dead_task' [-Werror,-Wimplicit-function-declaration] make_dead_task(err); ^ The function's name is make_task_dead(), change it so there is no more build error. Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20211227184851.2297759-2-nathan@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-01-08kthread: Generalize pf_io_worker so it can point to struct kthreadEric W. Biederman6-23/+13
The point of using set_child_tid to hold the kthread pointer was that it already did what is necessary. There are now restrictions on when set_child_tid can be initialized and when set_child_tid can be used in schedule_tail. Which indicates that continuing to use set_child_tid to hold the kthread pointer is a bad idea. Instead of continuing to use the set_child_tid field of task_struct generalize the pf_io_worker field of task_struct and use it to hold the kthread pointer. Rename pf_io_worker (which is a void * pointer) to worker_private so it can be used to store kthreads struct kthread pointer. Update the kthread code to store the kthread pointer in the worker_private field. Remove the places where set_child_tid had to be dealt with carefully because kthreads also used it. Link: https://lkml.kernel.org/r/CAHk-=wgtFAA9SbVYg0gR1tqPMC17-NYcs0GQkaYg1bGhh1uJQQ@mail.gmail.com Link: https://lkml.kernel.org/r/87a6grvqy8.fsf_-_@email.froward.int.ebiederm.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-23kthread: Never put_user the set_child_tid addressEric W. Biederman1-1/+1
Kernel threads abuse set_child_tid. Historically that has been fine as set_child_tid was initialized after the kernel thread had been forked. Unfortunately storing struct kthread in set_child_tid after the thread is running makes struct kthread being unusable for storing result codes of the thread. When set_child_tid is set to struct kthread during fork that results in schedule_tail writing the thread id to the beggining of struct kthread (if put_user does not realize it is a kernel address). Solve this by skipping the put_user for all kthreads. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/YcNsG0Lp94V13whH@archlinux-ax161 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-22kthread: Warn about failed allocations for the init kthreadEric W. Biederman1-1/+1
Failed allocates are not expected when setting up the initial task and it is not really possible to handle them either. So I added a warning to report if such an allocation failure ever happens. Correct the sense of the warning so it warns when an allocation failure happens not when the allocation succeeded. Oops. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lkml.kernel.org/r/20211221231611.785b74cf@canb.auug.org.au Link: https://lkml.kernel.org/r/CA+G9fYvLaR5CF777CKeWTO+qJFTN6vAvm95gtzN+7fw3Wi5hkA@mail.gmail.com Link: https://lkml.kernel.org/r/20211216102956.GC10708@xsang-OptiPlex-9020 Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-20fork: Rename bad_fork_cleanup_threadgroup_lock to bad_fork_cleanup_delayacctEric W. Biederman1-3/+3
I just fixed a bug in copy_process when using the label bad_fork_cleanup_threadgroup_lock. While fixing the bug I looked closer at the label and realized it has been misnamed since 568ac888215c ("cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork"). Fix the name so that fork is easier to understand. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-20fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMAEric W. Biederman1-1/+1
Mark Brown <broonie@kernel.org> reported: > This is also causing further build errors including but not limited to: > > /tmp/next/build/kernel/fork.c: In function 'copy_process': > /tmp/next/build/kernel/fork.c:2106:4: error: label 'bad_fork_cleanup_threadgroup_lock' used but not defined > 2106 | goto bad_fork_cleanup_threadgroup_lock; > | ^~~~ It turns out that I messed up and was depending upon a label protected by an ifdef. Move the label out of the ifdef as the ifdef around the label no longer makes sense (if it ever did). Link: https://lkml.kernel.org/r/YbugCP144uxXvRsk@sirena.org.uk Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-15objtool: Add a missing comma to avoid string concatenationEric W. Biederman1-1/+1
Recently the kbuild robot reported two new errors: >> lib/kunit/kunit-example-test.o: warning: objtool: .text.unlikely: unexpected end of section >> arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_opcodes() I don't know why they did not occur in my test setup but after digging it I realized I had accidentally dropped a comma in tools/objtool/check.c when I renamed rewind_stack_do_exit to rewind_stack_and_make_dead. Add that comma back to fix objtool errors. Link: https://lkml.kernel.org/r/202112140949.Uq5sFKR1-lkp@intel.com Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-14exit/kthread: Fix the kerneldoc comment for kthread_complete_and_exitEric W. Biederman1-1/+1
I misspelled kthread_complete_and_exit in the kernel doc comment fix it. Link: https://lkml.kernel.org/r/202112141329.KBkyJ5ql-lkp@intel.com Link: https://lkml.kernel.org/r/202112141422.Cykr6YUS-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Fixes: cead18552660 ("exit: Rename complete_and_exit to kthread_complete_and_exit") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit/kthread: Move the exit code for kernel threads into struct kthreadEric W. Biederman1-2/+5
The exit code of kernel threads has different semantics than the exit_code of userspace tasks. To avoid confusion and allow the userspace implementation to change as needed move the kernel thread exit code into struct kthread. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13kthread: Ensure struct kthread is present for all kthreadsEric W. Biederman5-26/+29
Today the rules are a bit iffy and arbitrary about which kernel threads have struct kthread present. Both idle threads and thread started with create_kthread want struct kthread present so that is effectively all kernel threads. Make the rule that if PF_KTHREAD and the task is running then struct kthread is present. This will allow the kernel thread code to using tsk->exit_code with different semantics from ordinary processes. To make ensure that struct kthread is present for all kernel threads move it's allocation into copy_process. Add a deallocation of struct kthread in exec for processes that were kernel threads. Move the allocation of struct kthread for the initial thread earlier so that it is not repeated for each additional idle thread. Move the initialization of struct kthread into set_kthread_struct so that the structure is always and reliably initailized. Clear set_child_tid in free_kthread_struct to ensure the kthread struct is reliably freed during exec. The function free_kthread_struct does not need to clear vfork_done during exec as exec_mm_release called from exec_mmap has already cleared vfork_done. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Rename complete_and_exit to kthread_complete_and_exitEric W. Biederman15-31/+43
Update complete_and_exit to call kthread_exit instead of do_exit. Change the name to reflect this change in functionality. All of the users of complete_and_exit are causing the current kthread to exit so this change makes it clear what is happening. Move the implementation of kthread_complete_and_exit from kernel/exit.c to to kernel/kthread.c. As this function is kthread specific it makes most sense to live with the kthread functions. There are no functional change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Rename module_put_and_exit to module_put_and_kthread_exitEric W. Biederman11-17/+17
Update module_put_and_exit to call kthread_exit instead of do_exit. Change the name to reflect this change in functionality. All of the users of module_put_and_exit are causing the current kthread to exit so this change makes it clear what is happening. There is no functional change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Implement kthread_exitEric W. Biederman3-4/+21
The way the per task_struct exit_code is used by kernel threads is not quite compatible how it is used by userspace applications. The low byte of the userspace exit_code value encodes the exit signal. While kthreads just use the value as an int holding ordinary kernel function exit status like -EPERM. Add kthread_exit to clearly separate the two kinds of uses. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Stop exporting do_exitEric W. Biederman1-1/+0
Now that there are no more modular uses of do_exit remove the EXPORT_SYMBOL. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Stop poorly open coding do_task_dead in make_task_deadEric W. Biederman1-2/+1
When the kernel detects it is oops or otherwise force killing a task while it exits the code poorly attempts to permanently stop the task from scheduling. I say poorly because it is possible for a task in TASK_UINTERRUPTIBLE to be woken up. As it makes no sense for the task to continue call do_task_dead instead which actually does the work and permanently removes the task from the scheduler. Guaranteeing the task will never be woken up again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Move oops specific logic from do_exit into make_task_deadEric W. Biederman3-41/+41
The beginning of do_exit has become cluttered and difficult to read as it is filled with checks to handle things that can only happen when the kernel is operating improperly. Now that we have a dedicated function for cleaning up a task when the kernel is operating improperly move the checks there. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit: Add and use make_task_dead.Eric W. Biederman39-56/+63
There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13exit/s390: Remove dead reference to do_exit from copy_threadEric W. Biederman1-1/+0
My s390 assembly is not particularly good so I have read the history of the reference to do_exit copy_thread and have been able to verify that do_exit is not used. The general argument is that s390 has been changed to use the generic kernel_thread and kernel_execve and the generic versions do not call do_exit. So it is strange to see a do_exit reference sitting there. The history of the do_exit reference in s390's version of copy_thread seems conclusive that the do_exit reference is something that lingers and should have been removed several years ago. Up through 8d19f15a60be ("[PATCH] s390 update (1/27): arch.") the s390 code made a call to the exit(2) system call when a kernel thread finished. Then kernel_thread_starter was added which branched directly to the value in register 11 when the kernel thread finshed. The value in register 11 was set in kernel_thread to "regs.gprs[11] = (unsigned long) do_exit" In commit 37fe5d41f640 ("s390: fold kernel_thread_helper() into ret_from_fork()") kernel_thread_starter was moved into entry.S and entry64.S unchanged (except for the syntax differences between inline assemly and in the assembly file). In commit f9a7e025dfc2 ("s390: switch to generic kernel_thread()") the assignment to "gprs[11]" was moved into copy_thread from the old kernel_thread. The helper kernel_thread_starter was still being used and was still branching to "%r11" at the end. In commit 30dcb0996e40 ("s390: switch to saner kernel_execve() semantics") kernel_thread_starter was changed to unconditionally branch to sysc_tracenogo instead to %r11 which held the value of do_exit. Unfortunately copy_thread was not updated to stop passing do_exit in "gprs[11]". In commit 56e62a737028 ("s390: convert to generic entry") kernel_thread_starter was replaced by __ret_from_fork. And the code still continued to pass do_exit in "gprs[11]" despite __ret_from_fork not caring in the slightest. Remove this dead reference to do_exit to make it clear that s390 is not doing anything with do_exit in copy_thread. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: 30dcb0996e40 ("s390: switch to saner kernel_execve() semantics") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-04Merge SA_IMMUTABLE-fixes-for-v5.16-rc2Eric W. Biederman11-19/+42
I completed the first batch of signal changes for v5.17 against v5.16-rc1 before the SA_IMMUTABLE fixes where completed. Which leaves me with two lines of development that I want on my signal development branch both rooted at v5.16-rc1. Especially as I am hoping to reach the point of being able to remove SA_IMMUTABLE. Linus merged my SA_IMUTABLE fixes as: 7af959b5d5c8 ("Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace") To avoid rebasing the development changes that are currently complete I am merging the work I sent upstream to Linus to make my life simpler. The SA_IMMUTABLE changes as they are described in Linus's merge commit. Pull exit-vs-signal handling fixes from Eric Biederman: "This is a small set of changes where debuggers were no longer able to intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit cleanups. This is essentially the change you suggested with all of i's dotted and the t's crossed so that ptrace can intercept all of the cases it has been able to intercept the past, and all of the cases that made it to exit without giving ptrace a chance still don't give ptrace a chance" * 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Replace force_fatal_sig with force_exit_sig when in doubt signal: Don't always set SA_IMMUTABLE for forced signals Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19signal: Replace force_fatal_sig with force_exit_sig when in doubtEric W. Biederman11-12/+26
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals. This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept. In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis. Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19signal: Don't always set SA_IMMUTABLE for forced signalsEric W. Biederman1-7/+16
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace know about the signal until the process has exited. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Link: https://lkml.kernel.org/r/877dd5qfw5.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: requeuing undeliverable signalsEric W. Biederman3-23/+33
Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents ptrace_signal from delivering a signal, as the kernel setups up a signal frame for a signal that rr did not have a chance to observe with ptrace. In looking into it I found a couple of bugs and a quality of implementation issue. - The test for signal_group_exit should be inside the for loop in get_signal. - Signals should be requeued on the same queue they were dequeued from. - When a fatal signal is pending ptrace_signal should not return another signal for delivery. Kyle Huey has verified[2] an earlier version of this change. I have reworked things one more time to completely fix the issues raised, and to keep the code maintainable long term. I have smoke tested this code and combined with a careful review I expect this code to work fine. Kyle if you can double check that my last round of changes still works for rr I would appreciate it. Eric W. Biederman (3): signal: In get_signal test for signal_group_exit every time through the loop signal: Requeue signals in the appropriate queue signal: Requeue ptrace signals fs/signalfd.c | 5 +++-- include/linux/sched/signal.h | 7 ++++--- kernel/signal.c | 44 ++++++++++++++++++++++++++------------------ 3 files changed, 33 insertions(+), 23 deletions(-) [1] https://lkml.kernel.org/r/20211101034147.6203-1-khuey@kylehuey.com [2] https://lkml.kernel.org/r/CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com Tested-by: Kyle Huey <khuey@kylehuey.com> Link: https://lkml.kernel.org/r/87bl2kekig.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: Requeue ptrace signalsEric W. Biederman1-1/+2
Kyle Huey <me@kylehuey.com> writes: > rr, a userspace record and replay debugger[0], uses the recorded register > state at PTRACE_EVENT_EXIT to find the point in time at which to cease > executing the program during replay. > > If a SIGKILL races with processing another signal in get_signal, it is > possible for the kernel to decline to notify the tracer of the original > signal. But if the original signal had a handler, the kernel proceeds > with setting up a signal handler frame as if the tracer had chosen to > deliver the signal unmodified to the tracee. When the kernel goes to > execute the signal handler that it has now modified the stack and registers > for, it will discover the pending SIGKILL, and terminate the tracee > without executing the handler. When PTRACE_EVENT_EXIT is delivered to > the tracer, however, the effects of handler setup will be visible to > the tracer. > > Because rr (the tracer) was never notified of the signal, it is not aware > that a signal handler frame was set up and expects the state of the program > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally > by allowing the program to execute from the last event. When that fails > to happen during replay, rr will assert and die. > > The following patches add an explicit check for a newly pending SIGKILL > after the ptracer has been notified and the siglock has been reacquired. > If this happens, we stop processing the current signal and proceed > immediately to handling the SIGKILL. This makes the state reported at > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the > work to set up a signal handler frame that will never be used. > > [0] https://rr-project.org/ The problem is that while the traced process makes it into ptrace_stop, the tracee is killed before the tracer manages to wait for the tracee and discover which signal was about to be delivered. More generally the problem is that while siglock was dropped a signal with process wide effect is short cirucit delivered to the entire process killing it, but the process continues to try and deliver another signal. In general it impossible to avoid all cases where work is performed after the process has been killed. In particular if the process is killed after get_signal returns the code will simply not know it has been killed until after delivering the signal frame to userspace. On the other hand when the code has already discovered the process has been killed and taken user space visible action that shows the kernel knows the process has been killed, it is just silly to then write the signal frame to the user space stack. Instead of being silly detect the process has been killed in ptrace_signal and requeue the signal so the code can pretend it was simply never dequeued for delivery. To test the process has been killed I use fatal_signal_pending rather than signal_group_exit to match the test in signal_pending_state which is used in schedule which is where ptrace_stop detects the process has been killed. Requeuing the signal so the code can pretend it was simply never dequeued improves the user space visible behavior that has been present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4"). Kyle Huey verified that this change in behavior and makes rr happy. Reported-by: Kyle Huey <khuey@kylehuey.com> Reported-by: Marko Mäkelä <marko.makela@mariadb.com> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87tugcd5p2.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: Requeue signals in the appropriate queueEric W. Biederman3-12/+21
In the event that a tracer changes which signal needs to be delivered and that signal is currently blocked then the signal needs to be requeued for later delivery. With the advent of CLONE_THREAD the kernel has 2 signal queues per task. The per process queue and the per task queue. Update the code so that if the signal is removed from the per process queue it is requeued on the per process queue. This is necessary to make it appear the signal was never dequeued. The rr debugger reasonably believes that the state of the process from the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated by simply letting a process run. If a SIGKILL interrupts a ptrace_stop this is not true today. So return signals to their original queue in ptrace_signal so that signals that are not delivered appear like they were never dequeued. Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87zgq4d5r4.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: In get_signal test for signal_group_exit every time through the loopEric W. Biederman1-10/+10
Recently while investigating a problem with rr and signals I noticed that siglock is dropped in ptrace_signal and get_signal does not jump to relock. Looking farther to see if the problem is anywhere else I see that do_signal_stop also returns if signal_group_exit is true. I believe that test can now never be true, but it is a bit hard to trace through and be certain. Testing signal_group_exit is not expensive, so move the test for signal_group_exit into the for loop inside of get_signal to ensure the test is never skipped improperly. This has been a potential problem since I added the test for signal_group_exit was added. Fixes: 35634ffa1751 ("signal: Always notice exiting tasks") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/875yssekcd.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-15Linux 5.16-rc1Linus Torvalds1-2/+2
2021-11-15kconfig: Add support for -Wimplicit-fallthroughGustavo A. R. Silva2-5/+6
Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang. The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH, which is enabled by default. Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This bugfix only appears in Clang 14.0.0, so older versions still contain the bug and -Wimplicit-fallthrough won't be enabled for them, for now. This concludes a long journey and now we are finally getting rid of the unintentional fallthrough bug-class in the kernel, entirely. :) Link: https://github.com/llvm/llvm-project/commit/9ed4a94d6451046a51ef393cd62f00710820a7e8 [1] Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2] Link: https://github.com/KSPP/linux/issues/115 Link: https://github.com/ClangBuiltLinux/linux/issues/236 Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-14Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds4-7/+12
Pull xfs cleanups from Darrick Wong: "The most 'exciting' aspect of this branch is that the xfsprogs maintainer and I have worked through the last of the code discrepancies between kernel and userspace libxfs such that there are no code differences between the two except for #includes. IOWs, diff suffices to demonstrate that the userspace tools behave the same as the kernel, and kernel-only bits are clearly marked in the /kernel/ source code instead of just the userspace source. Summary: - Clean up open-coded swap() calls. - A little bit of #ifdef golf to complete the reunification of the kernel and userspace libxfs source code" * tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: sync xfs_btree_split macros with userspace libxfs xfs: #ifdef out perag code for userspace xfs: use swap() to make dabtree code cleaner
2021-11-14Merge tag 'for-5.16/parisc-3' of ↵Linus Torvalds5-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc fixes from Helge Deller: "Fix a build error in stracktrace.c, fix resolving of addresses to function names in backtraces, fix single-stepping in assembly code and flush userspace pte's when using set_pte_at()" * tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc/entry: fix trace test in syscall exit path parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page parisc: Fix implicit declaration of function '__kernel_text_address' parisc: Fix backtrace to always include init funtion names
2021-11-14Merge tag 'sh-for-5.16' of git://git.libc.org/linux-shLinus Torvalds21-180/+78
Pull arch/sh updates from Rich Felker. * tag 'sh-for-5.16' of git://git.libc.org/linux-sh: sh: pgtable-3level: Fix cast to pointer from integer of different size sh: fix READ/WRITE redefinition warnings sh: define __BIG_ENDIAN for math-emu sh: math-emu: drop unused functions sh: fix kconfig unmet dependency warning for FRAME_POINTER sh: Cleanup about SPARSE_IRQ sh: kdump: add some attribute to function maple: fix wrong return value of maple_bus_init(). sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/ sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y sh: boards: Fix the cacography in irq.c sh: check return code of request_irq sh: fix trivial misannotations
2021-11-14Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2-13/+13
Pull ARM fixes from Russell King: - Fix early_iounmap - Drop cc-option fallbacks for architecture selection * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9156/1: drop cc-option fallbacks for architecture selection ARM: 9155/1: fix early early_iounmap()
2021-11-14Merge tag 'devicetree-fixes-for-5.16-1' of ↵Linus Torvalds107-277/+277
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Two fixes due to DT node name changes on Arm, Ltd. boards - Treewide rename of Ingenic CGU headers - Update ST email addresses - Remove Netlogic DT bindings - Dropping few more cases of redundant 'maxItems' in schemas - Convert toshiba,tc358767 bridge binding to schema * tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: watchdog: sunxi: fix error in schema bindings: media: venus: Drop redundant maxItems for power-domain-names dt-bindings: Remove Netlogic bindings clk: versatile: clk-icst: Ensure clock names are unique of: Support using 'mask' in making device bus id dt-bindings: treewide: Update @st.com email address to @foss.st.com dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml dt-bindings: media: Update maintainers for st,stm32-cec.yaml dt-bindings: mfd: timers: Update maintainers for st,stm32-timers dt-bindings: timer: Update maintainers for st,stm32-timer dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
2021-11-14Merge tag 'timers-urgent-2021-11-14' of ↵Linus Torvalds3-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for POSIX CPU timers to address a problem where POSIX CPU timer delivery stops working for a new child task because copy_process() copies state information which is only valid for the parent task" * tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
2021-11-14Merge tag 'irq-urgent-2021-11-14' of ↵Linus Torvalds8-28/+60
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem Core code: - A regression fix for the Open Firmware interrupt mapping code where a interrupt controller property in a node caused a map property in the same node to be ignored. Interrupt chip drivers: - Workaround a limitation in SiFive PLIC interrupt chip which silently ignores an EOI when the interrupt line is masked. - Provide the missing mask/unmask implementation for the CSKY MP interrupt controller. PCI/MSI: - Prevent a use after free when PCI/MSI interrupts are released by destroying the sysfs entries before freeing the memory which is accessed in the sysfs show() function. - Implement a mask quirk for the Nvidia ION AHCI chip which does not advertise masking capability despite implementing it. Even worse the chip comes out of reset with all MSI entries masked, which due to the missing masking capability never get unmasked. - Move the check which prevents accessing the MSI[X] masking for XEN back into the low level accessors. The recent consolidation missed that these accessors can be invoked from places which do not have that check which broke XEN. Move them back to he original place instead of sprinkling tons of these checks all over the code" * tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: of/irq: Don't ignore interrupt-controller when interrupt-map failed irqchip/sifive-plic: Fixup EOI failed when masked irqchip/csky-mpintc: Fixup mask/unmask implementation PCI/MSI: Destroy sysfs before freeing entries PCI: Add MSI masking quirk for Nvidia ION AHCI PCI/MSI: Deal with devices lying about their MSI mask capability PCI/MSI: Move non-mask check back into low level accessors
2021-11-14Merge tag 'locking-urgent-2021-11-14' of ↵Linus Torvalds3-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 static call update from Thomas Gleixner: "A single fix for static calls to make the trampoline patching more robust by placing explicit signature bytes after the call trampoline to prevent patching random other jumps like the CFI jump table entries" * tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call,x86: Robustify trampoline patching
2021-11-14Merge tag 'sched_urgent_for_v5.16_rc1' of ↵Linus Torvalds13-59/+96
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Avoid touching ~100 config files in order to be able to select the preemption model - clear cluster CPU masks too, on the CPU unplug path - prevent use-after-free in cfs - Prevent a race condition when updating CPU cache domains - Factor out common shared part of smp_prepare_cpus() into a common helper which can be called by both baremetal and Xen, in order to fix a booting of Xen PV guests * tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Restore preemption model selection configs arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() sched/fair: Prevent dead task groups from regaining cfs_rq's sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14Merge tag 'perf_urgent_for_v5.16_rc1' of ↵Linus Torvalds3-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Prevent unintentional page sharing by checking whether a page reference to a PMU samples page has been acquired properly before that - Make sure the LBR_SELECT MSR is saved/restored too - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any residual data left * tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Avoid put_page() when GUP fails perf/x86/vlbr: Add c->flags to vlbr event constraints perf/x86/lbr: Reset LBR_SELECT during vlbr reset
2021-11-14Merge tag 'x86_urgent_for_v5.16_rc1' of ↵Linus Torvalds8-4/+71
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add the model number of a new, Raptor Lake CPU, to intel-family.h - Do not log spurious corrected MCEs on SKL too, due to an erratum - Clarify the path of paravirt ops patches upstream - Add an optimization to avoid writing out AMX components to sigframes when former are in init state * tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Raptor Lake to Intel family x86/mce: Add errata workaround for Skylake SKX37 MAINTAINERS: Add some information to PARAVIRT_OPS entry x86/fpu: Optimize out sigframe xfeatures when in init state