summaryrefslogtreecommitdiff
path: root/drivers/char/random.c
AgeCommit message (Collapse)AuthorFilesLines
2022-03-22Revert "random: block in /dev/urandom"Linus Torvalds1-17/+55
This reverts commit 6f98a4bfee72c22f50aedb39fb761567969865fe. It turns out we still can't do this. Way too many platforms that don't have any real source of randomness at boot and no jitter entropy because they don't even have a cycle counter. As reported by Guenter Roeck: "This causes a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at 'Saving random seed:'" This isn't hugely unexpected - we tried it, it failed, so now we'll revert it. Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/ Reported-and-bisected-by: Guenter Roeck <linux@roeck-us.net> Cc: Jason Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
2022-03-13random: check for signal and try earlier when generating entropyJason A. Donenfeld1-2/+3
Rather than waiting a full second in an interruptable waiter before trying to generate entropy, try to generate entropy first and wait second. While waiting one second might give an extra second for getting entropy from elsewhere, we're already pretty late in the init process here, and whatever else is generating entropy will still continue to contribute. This has implications on signal handling: we call try_to_generate_entropy() from wait_for_random_bytes(), and wait_for_random_bytes() always uses wait_event_interruptible_timeout() when waiting, since it's called by userspace code in restartable contexts, where signals can pend. Since try_to_generate_entropy() now runs first, if a signal is pending, it's necessary for try_to_generate_entropy() to check for signals, since it won't hit the wait until after try_to_generate_entropy() has returned. And even before this change, when entering a busy loop in try_to_generate_entropy(), we should have been checking to see if any signals are pending, so that a process doesn't get stuck in that loop longer than expected. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: reseed more often immediately after bootingJason A. Donenfeld1-3/+25
In order to chip away at the "premature first" problem, we augment our existing entropy accounting with more frequent reseedings at boot. The idea is that at boot, we're getting entropy from various places, and we're not very sure which of early boot entropy is good and which isn't. Even when we're crediting the entropy, we're still not totally certain that it's any good. Since boot is the one time (aside from a compromise) that we have zero entropy, it's important that we shepherd entropy into the crng fairly often. At the same time, we don't want a "premature next" problem, whereby an attacker can brute force individual bits of added entropy. In lieu of going full-on Fortuna (for now), we can pick a simpler strategy of just reseeding more often during the first 5 minutes after boot. This is still bounded by the 256-bit entropy credit requirement, so we'll skip a reseeding if we haven't reached that, but in case entropy /is/ coming in, this ensures that it makes its way into the crng rather rapidly during these early stages. Ordinarily we reseed if the previous reseeding is 300 seconds old. This commit changes things so that for the first 600 seconds of boot time, we reseed if the previous reseeding is uptime / 2 seconds old. That means that we'll reseed at the very least double the uptime of the previous reseeding. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: make consistent usage of crng_ready()Jason A. Donenfeld1-12/+7
Rather than sometimes checking `crng_init < 2`, we should always use the crng_ready() macro, so that should we change anything later, it's consistent. Additionally, that macro already has a likely() around it, which means we don't need to open code our own likely() and unlikely() annotations. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: use SipHash as interrupt entropy accumulatorJason A. Donenfeld1-39/+55
The current fast_mix() function is a piece of classic mailing list crypto, where it just sort of sprung up by an anonymous author without a lot of real analysis of what precisely it was accomplishing. As an ARX permutation alone, there are some easily searchable differential trails in it, and as a means of preventing malicious interrupts, it completely fails, since it xors new data into the entire state every time. It can't really be analyzed as a random permutation, because it clearly isn't, and it can't be analyzed as an interesting linear algebraic structure either, because it's also not that. There really is very little one can say about it in terms of entropy accumulation. It might diffuse bits, some of the time, maybe, we hope, I guess. But for the most part, it fails to accomplish anything concrete. As a reminder, the simple goal of add_interrupt_randomness() is to simply accumulate entropy until ~64 interrupts have elapsed, and then dump it into the main input pool, which uses a cryptographic hash. It would be nice to have something cryptographically strong in the interrupt handler itself, in case a malicious interrupt compromises a per-cpu fast pool within the 64 interrupts / 1 second window, and then inside of that same window somehow can control its return address and cycle counter, even if that's a bit far fetched. However, with a very CPU-limited budget, actually doing that remains an active research project (and perhaps there'll be something useful for Linux to come out of it). And while the abundance of caution would be nice, this isn't *currently* the security model, and we don't yet have a fast enough solution to make it our security model. Plus there's not exactly a pressing need to do that. (And for the avoidance of doubt, the actual cluster of 64 accumulated interrupts still gets dumped into our cryptographically secure input pool.) So, for now we are going to stick with the existing interrupt security model, which assumes that each cluster of 64 interrupt data samples is mostly non-malicious and not colluding with an infoleaker. With this as our goal, we have a few more choices, simply aiming to accumulate entropy, while discarding the least amount of it. We know from <https://eprint.iacr.org/2019/198> that random oracles, instantiated as computational hash functions, make good entropy accumulators and extractors, which is the justification for using BLAKE2s in the main input pool. As mentioned, we don't have that luxury here, but we also don't have the same security model requirements, because we're assuming that there aren't malicious inputs. A pseudorandom function instance can approximately behave like a random oracle, provided that the key is uniformly random. But since we're not concerned with malicious inputs, we can pick a fixed key, which is not secret, knowing that "nature" won't interact with a sufficiently chosen fixed key by accident. So we pick a PRF with a fixed initial key, and accumulate into it continuously, dumping the result every 64 interrupts into our cryptographically secure input pool. For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on 32-bit, which are already in use in the kernel's hsiphash family of functions and achieve the same performance as the function they replace. It would be nice to do two rounds, but we don't exactly have the CPU budget handy for that, and one round alone is already sufficient. As mentioned, we start with a fixed initial key (zeros is fine), and allow SipHash's symmetry breaking constants to turn that into a useful starting point. Also, since we're dumping the result (or half of it on 64-bit so as to tax our hash function the same amount on all platforms) into the cryptographically secure input pool, there's no point in finalizing SipHash's output, since it'll wind up being finalized by something much stronger. This means that all we need to do is use the ordinary round function word-by-word, as normal SipHash does. Simplified, the flow is as follows: Initialize: siphash_state_t state; siphash_init(&state, key={0, 0, 0, 0}); Update (accumulate) on interrupt: siphash_update(&state, interrupt_data_and_timing); Dump into input pool after 64 interrupts: blake2s_update(&input_pool, &state, sizeof(state) / 2); The result of all of this is that the security model is unchanged from before -- we assume non-malicious inputs -- yet we now implement that model with a stronger argument. I would like to emphasize, again, that the purpose of this commit is to improve the existing design, by making it analyzable, without changing any fundamental assumptions. There may well be value down the road in changing up the existing design, using something cryptographically strong, or simply using a ring buffer of samples rather than having a fast_mix() at all, or changing which and how much data we collect each interrupt so that we can use something linear, or a variety of other ideas. This commit does not invalidate the potential for those in the future. For example, in the future, if we're able to characterize the data we're collecting on each interrupt, we may be able to inch toward information theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s = ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good accumulators for 2-monotone distributions, which would apply to timestamp counters, like random_get_entropy() or jiffies, but would not apply to our current combination of the two values, or to the various function addresses and register values we mix in. Alternatively, <https://eprint.iacr.org/2021/1002> shows that max-period linear functions with no non-trivial invariant subspace make good extractors, used in the form `s = f(s) ^ x`. However, this only works if the input data is both identical and independent, and obviously a collection of address values and counters fails; so it goes with theoretical papers. Future directions here may involve trying to characterize more precisely what we actually need to collect in the interrupt handler, and building something specific around that. However, as mentioned, the morass of data we're gathering at the interrupt handler presently defies characterization, and so we use SipHash for now, which works well and performs well. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: provide notifier for VM forkJason A. Donenfeld1-0/+15
Drivers such as WireGuard need to learn when VMs fork in order to clear sessions. This commit provides a simple notifier_block for that, with a register and unregister function. When no VM fork detection is compiled in, this turns into a no-op, similar to how the power notifier works. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: replace custom notifier chain with standard oneJason A. Donenfeld1-48/+19
We previously rolled our own randomness readiness notifier, which only has two users in the whole kernel. Replace this with a more standard atomic notifier block that serves the same purpose with less code. Also unexport the symbols, because no modules use it, only unconditional builtins. The only drawback is that it's possible for a notification handler returning the "stop" code to prevent further processing, but given that there are only two users, and that we're unexporting this anyway, that doesn't seem like a significant drawback for the simplification we receive here. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: do not export add_vmfork_randomness() unless neededJason A. Donenfeld1-0/+4
Since add_vmfork_randomness() is only called from vmgenid.o, we can guard it in CONFIG_VMGENID, similarly to how we do with add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things calling into add_vmfork_randomness(), we can add another shared Kconfig symbol for that, but for now, this is good enough. Even though add_vmfork_randomess() is a pretty small function, removing it means that there are only calls to crng_reseed(false) and none to crng_reseed(true), which means the compiler can constant propagate the false, removing branches from crng_reseed() and its descendants. Additionally, we don't even need the symbol to be exported if CONFIG_VMGENID is not a module, so conditionalize that too. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: add mechanism for VM forks to reinitialize crngJason A. Donenfeld1-15/+35
When a VM forks, we must immediately mix in additional information to the stream of random output so that two forks or a rollback don't produce the same stream of random numbers, which could have catastrophic cryptographic consequences. This commit adds a simple API, add_vmfork_ randomness(), for that, by force reseeding the crng. This has the added benefit of also draining the entropy pool and setting its timer back, so that any old entropy that was there prior -- which could have already been used by a different fork, or generally gone stale -- does not contribute to the accounting of the next 256 bits. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jann Horn <jannh@google.com> Cc: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: don't let 644 read-only sysctls be written toJason A. Donenfeld1-2/+9
We leave around these old sysctls for compatibility, and we keep them "writable" for compatibility, but even after writing, we should keep reporting the same value. This is consistent with how userspaces tend to use sysctl_random_write_wakeup_bits, writing to it, and then later reading from it and using the value. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: give sysctl_random_min_urandom_seed a more sensible valueJason A. Donenfeld1-2/+2
This isn't used by anything or anywhere, but we can't delete it due to compatibility. So at least give it the correct value of what it's supposed to be instead of a garbage one. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-13random: block in /dev/urandomJason A. Donenfeld1-55/+17
This topic has come up countless times, and usually doesn't go anywhere. This time I thought I'd bring it up with a slightly narrower focus, updated for some developments over the last three years: we finally can make /dev/urandom always secure, in light of the fact that our RNG is now always seeded. Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it"), the RNG does a haveged-style jitter dance around the scheduler, in order to produce entropy (and credit it) for the case when we're stuck in wait_for_random_bytes(). How ever you feel about the Linus Jitter Dance is beside the point: it's been there for three years and usually gets the RNG initialized in a second or so. As a matter of fact, this is what happens currently when people use getrandom(). It's already there and working, and most people have been using it for years without realizing. So, given that the kernel has grown this mechanism for seeding itself from nothing, and that this procedure happens pretty fast, maybe there's no point any longer in having /dev/urandom give insecure bytes. In the past we didn't want the boot process to deadlock, which was understandable. But now, in the worst case, a second goes by, and the problem is resolved. It seems like maybe we're finally at a point when we can get rid of the infamous "urandom read hole". The one slight drawback is that the Linus Jitter Dance relies on random_ get_entropy() being implemented. The first lines of try_to_generate_ entropy() are: stack.now = random_get_entropy(); if (stack.now == random_get_entropy()) return; On most platforms, random_get_entropy() is simply aliased to get_cycles(). The number of machines without a cycle counter or some other implementation of random_get_entropy() in 2022, which can also run a mainline kernel, and at the same time have a both broken and out of date userspace that relies on /dev/urandom never blocking at boot is thought to be exceedingly low. And to be clear: those museum pieces without cycle counters will continue to run Linux just fine, and even /dev/urandom will be operable just like before; the RNG just needs to be seeded first through the usual means, which should already be the case now. On systems that really do want unseeded randomness, we already offer getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding their hash tables at boot. Nothing in this commit would affect GRND_INSECURE, and it remains the means of getting those types of random numbers. This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated. With very minimal downside, we're finally in a position where we can make this change. Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Borislav Petkov <bp@alien8.de> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Joshua Kinard <kumba@gentoo.org> Cc: David Laight <David.Laight@aculab.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28random: do crng pre-init loading in worker rather than irqJason A. Donenfeld1-46/+19
Taking spinlocks from IRQ context is generally problematic for PREEMPT_RT. That is, in part, why we take trylocks instead. However, a spin_try_lock() is also problematic since another spin_lock() invocation can potentially PI-boost the wrong task, as the spin_try_lock() is invoked from an IRQ-context, so the task on CPU (random task or idle) is not the actual owner. Additionally, by deferring the crng pre-init loading to the worker, we can use the cryptographic hash function rather than xor, which is perhaps a meaningful difference when considering this data has only been through the relatively weak fast_mix() function. The biggest downside of this approach is that the pre-init loading is now deferred until later, which means things that need random numbers after interrupts are enabled, but before workqueues are running -- or before this particular worker manages to run -- are going to get into trouble. Hopefully in the real world, this window is rather small, especially since this code won't run until 64 interrupts had occurred. Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28random: unify cycles_t and jiffies usage and typesJason A. Donenfeld1-29/+27
random_get_entropy() returns a cycles_t, not an unsigned long, which is sometimes 64 bits on various 32-bit platforms, including x86. Conversely, jiffies is always unsigned long. This commit fixes things to use cycles_t for fields that use random_get_entropy(), named "cycles", and unsigned long for fields that use jiffies, named "now". It's also good to mix in a cycles_t and a jiffies in the same way for both add_device_randomness and add_timer_randomness, rather than using xor in one case. Finally, we unify the order of these volatile reads, always reading the more precise cycles counter, and then jiffies, so that the cycle counter is as close to the event as possible. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28random: cleanup UUID handlingJason A. Donenfeld1-16/+13
Rather than hard coding various lengths, we can use the right constants. Strings should be `char *` while buffers should be `u8 *`. Rather than have a nonsensical and unused maxlength, just remove it. Finally, use snprintf instead of sprintf, just out of good hygiene. As well, remove the old comment about returning a binary UUID via the binary sysctl syscall. That syscall was removed from the kernel in 5.5, and actually, the "uuid_strategy" function and related infrastructure for even serving it via the binary sysctl syscall was removed with 894d2491153a ("sysctl drivers: Remove dead binary sysctl support") back in 2.6.33. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-24random: only wake up writers after zap if threshold was passedJason A. Donenfeld1-1/+1
The only time that we need to wake up /dev/random writers on RNDCLEARPOOL/RNDZAPPOOL is when we're changing from a value that is greater than or equal to POOL_MIN_BITS to zero, because if we're changing from below POOL_MIN_BITS to zero, the writers are already unblocked. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-24random: round-robin registers as ulong, not u32Jason A. Donenfeld1-3/+3
When the interrupt handler does not have a valid cycle counter, it calls get_reg() to read a register from the irq stack, in round-robin. Currently it does this assuming that registers are 32-bit. This is _probably_ the case, and probably all platforms without cycle counters are in fact 32-bit platforms. But maybe not, and either way, it's not quite correct. This commit fixes that to deal with `unsigned long` rather than `u32`. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: clear fast pool, crng, and batches in cpuhp bring upJason A. Donenfeld1-15/+47
For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire irq handler, simply take care of the extreme edge case of resetting count to zero in the cpuhp online handler, just after workqueues have been reenabled. This simplifies the code a bit and lets us use vanilla variables rather than atomics, and performance should be improved. As well, very early on when the CPU comes up, while interrupts are still disabled, we clear out the per-cpu crng and its batches, so that it always starts with fresh randomness. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: check for crng_init == 0 in add_device_randomness()Jason A. Donenfeld1-1/+1
This has no real functional change, as crng_pre_init_inject() (and before that, crng_slow_init()) always checks for == 0, not >= 2. So correct the outer unlocked change to reflect that. Before this used crng_ready(), which was not correct. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: unify early init crng load accountingJason A. Donenfeld1-54/+58
crng_fast_load() and crng_slow_load() have different semantics: - crng_fast_load() xors and accounts with crng_init_cnt. - crng_slow_load() hashes and doesn't account. However add_hwgenerator_randomness() can afford to hash (it's called from a kthread), and it should account. Additionally, ones that can afford to hash don't need to take a trylock but can take a normal lock. So, we combine these into one function, crng_pre_init_inject(), which allows us to control these in a uniform way. This will make it simpler later to simplify this all down when the time comes for that. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: do not take pool spinlock at bootJason A. Donenfeld1-3/+3
Since rand_initialize() is run while interrupts are still off and nothing else is running, we don't need to repeatedly take and release the pool spinlock, especially in the RDSEED loop. Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: defer fast pool mixing to workerJason A. Donenfeld1-14/+49
On PREEMPT_RT, it's problematic to take spinlocks from hard irq handlers. We can fix this by deferring to a workqueue the dumping of the fast pool into the input pool. We accomplish this with some careful rules on fast_pool->count: - When it's incremented to >= 64, we schedule the work. - If the top bit is set, we never schedule the work, even if >= 64. - The worker is responsible for setting it back to 0 when it's done. There are two small issues around using workqueues for this purpose that we work around. The first issue is that mix_interrupt_randomness() might be migrated to another CPU during CPU hotplug. This issue is rectified by checking that it hasn't been migrated (after disabling irqs). If it has been migrated, then we set the count to zero, so that when the CPU comes online again, it can requeue the work. As part of this, we switch to using an atomic_t, so that the increment in the irq handler doesn't wipe out the zeroing if the CPU comes back online while this worker is running. The second issue is that, though relatively minor in effect, we probably want to make sure we get a consistent view of the pool onto the stack, in case it's interrupted by an irq while reading. To do this, we don't reenable irqs until after the copy. There are only 18 instructions between the cli and sti, so this is a pretty tiny window. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: rewrite header introductory commentJason A. Donenfeld1-162/+21
Now that we've re-documented the various sections, we can remove the outdated text here and replace it with a high-level overview. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group sysctl functionsJason A. Donenfeld1-6/+31
This pulls all of the sysctl-focused functions into the sixth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group userspace read/write functionsJason A. Donenfeld1-48/+77
This pulls all of the userspace read/write-focused functions into the fifth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group entropy collection functionsJason A. Donenfeld1-164/+206
This pulls all of the entropy collection-focused functions into the fourth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group entropy extraction functionsJason A. Donenfeld1-107/+109
This pulls all of the entropy extraction-focused functions into the third labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group crng functionsJason A. Donenfeld1-382/+410
This pulls all of the crng-focused functions into the second labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: group initialization wait functionsJason A. Donenfeld1-161/+172
This pulls all of the readiness waiting-focused functions into the first labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove whitespace and reorder includesJason A. Donenfeld1-2/+1
This is purely cosmetic. Future work involves figuring out which of these headers we need and which we don't. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: introduce drain_entropy() helper to declutter crng_reseed()Jason A. Donenfeld1-13/+23
In preparation for separating responsibilities, break out the entropy count management part of crng_reseed() into its own function. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: deobfuscate irq u32/u64 contributionsJason A. Donenfeld1-21/+28
In the irq handler, we fill out 16 bytes differently on 32-bit and 64-bit platforms, and for 32-bit vs 64-bit cycle counters, which doesn't always correspond with the bitness of the platform. Whether or not you like this strangeness, it is a matter of fact. But it might not be a fact you well realized until now, because the code that loaded the irq info into 4 32-bit words was quite confusing. Instead, this commit makes everything explicit by having separate (compile-time) branches for 32-bit and 64-bit types. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: add proper SPDX headerJason A. Donenfeld1-36/+1
Convert the current license into the SPDX notation of "(GPL-2.0 OR BSD-3-Clause)". This infers GPL-2.0 from the text "ALTERNATIVELY, this product may be distributed under the terms of the GNU General Public License, in which case the provisions of the GPL are required INSTEAD OF the above restrictions" and it infers BSD-3-Clause from the verbatim BSD 3 clause license in the file. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove unused tracepointsJason A. Donenfeld1-27/+3
These explicit tracepoints aren't really used and show sign of aging. It's work to keep these up to date, and before I attempted to keep them up to date, they weren't up to date, which indicates that they're not really used. These days there are better ways of introspecting anyway. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove ifdef'd out interrupt benchJason A. Donenfeld1-40/+0
With tools like kbench9000 giving more finegrained responses, and this basically never having been used ever since it was initially added, let's just get rid of this. There *is* still work to be done on the interrupt handler, but this really isn't the way it's being developed. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: tie batched entropy generation to base_crng generationJason A. Donenfeld1-21/+8
Now that we have an explicit base_crng generation counter, we don't need a separate one for batched entropy. Rather, we can just move the generation forward every time we change crng_init state or update the base_crng key. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: fix locking for crng_init in crng_reseed()Dominik Brodowski1-3/+6
crng_init is protected by primary_crng->lock. Therefore, we need to hold this lock when increasing crng_init to 2. As we shouldn't hold this lock for too long, only hold it for those parts which require protection. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: zero buffer after reading entropy from userspaceJason A. Donenfeld1-3/+8
This buffer may contain entropic data that shouldn't stick around longer than needed, so zero out the temporary buffer at the end of write_pool(). Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: remove outdated INT_MAX >> 6 check in urandom_read()Jason A. Donenfeld1-2/+1
In 79a8468747c5 ("random: check for increase of entropy_count because of signed conversion"), a number of checks were added around what values were passed to account(), because account() was doing fancy fixed point fractional arithmetic, and a user had some ability to pass large values directly into it. One of things in that commit was limiting those values to INT_MAX >> 6. The first >> 3 was for bytes to bits, and the next >> 3 was for bits to 1/8 fractional bits. However, for several years now, urandom reads no longer touch entropy accounting, and so this check serves no purpose. The current flow is: urandom_read_nowarn()-->get_random_bytes_user()-->chacha20_block() Of course, we don't want that size_t to be truncated when adding it into the ssize_t. But we arrive at urandom_read_nowarn() in the first place either via ordinary fops, which limits reads to MAX_RW_COUNT, or via getrandom() which limits reads to INT_MAX. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: make more consistent use of integer typesJason A. Donenfeld1-68/+55
We've been using a flurry of int, unsigned int, size_t, and ssize_t. Let's unify all of this into size_t where it makes sense, as it does in most places, and leave ssize_t for return values with possible errors. In addition, keeping with the convention of other functions in this file, functions that are dealing with raw bytes now take void * consistently instead of a mix of that and u8 *, because much of the time we're actually passing some other structure that is then interpreted as bytes by the function. We also take the opportunity to fix the outdated and incorrect comment in get_random_bytes_arch(). Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use hash function for crng_slow_load()Jason A. Donenfeld1-27/+15
Since we have a hash function that's really fast, and the goal of crng_slow_load() is reportedly to "touch all of the crng's state", we can just hash the old state together with the new state and call it a day. This way we dont need to reason about another LFSR or worry about various attacks there. This code is only ever used at early boot and then never again. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use simpler fast key erasure flow on per-cpu keysJason A. Donenfeld1-166/+229
Rather than the clunky NUMA full ChaCha state system we had prior, this commit is closer to the original "fast key erasure RNG" proposal from <https://blog.cr.yp.to/20170723-random.html>, by simply treating ChaCha keys on a per-cpu basis. All entropy is extracted to a base crng key of 32 bytes. This base crng has a birthdate and a generation counter. When we go to take bytes from the crng, we first check if the birthdate is too old; if it is, we reseed per usual. Then we start working on a per-cpu crng. This per-cpu crng makes sure that it has the same generation counter as the base crng. If it doesn't, it does fast key erasure with the base crng key and uses the output as its new per-cpu key, and then updates its local generation counter. Then, using this per-cpu state, we do ordinary fast key erasure. Half of this first block is used to overwrite the per-cpu crng key for the next call -- this is the fast key erasure RNG idea -- and the other half, along with the ChaCha state, is returned to the caller. If the caller desires more than this remaining half, it can generate more ChaCha blocks, unlocked, using the now detached ChaCha state that was just returned. Crypto-wise, this is more or less what we were doing before, but this simply makes it more explicit and ensures that we always have backtrack protection by not playing games with a shared block counter. The flow looks like this: ──extract()──► base_crng.key ◄──memcpy()───┐ │ │ └──chacha()──────┬─► new_base_key └─► crngs[n].key ◄──memcpy()───┐ │ │ └──chacha()───┬─► new_key └─► random_bytes │ └────► There are a few hairy details around early init. Just as was done before, prior to having gathered enough entropy, crng_fast_load() and crng_slow_load() dump bytes directly into the base crng, and when we go to take bytes from the crng, in that case, we're doing fast key erasure with the base crng rather than the fast unlocked per-cpu crngs. This is fine as that's only the state of affairs during very early boot; once the crng initializes we never use these paths again. In the process of all this, the APIs into the crng become a bit simpler: we have get_random_bytes(buf, len) and get_random_bytes_user(buf, len), which both do what you'd expect. All of the details of fast key erasure and per-cpu selection happen only in a very short critical section of crng_make_state(), which selects the right per-cpu key, does the fast key erasure, and returns a local state to the caller's stack. So, we no longer have a need for a separate backtrack function, as this happens all at once here. The API then allows us to extend backtrack protection to batched entropy without really having to do much at all. The result is a bit simpler than before and has fewer foot guns. The init time state machine also gets a lot simpler as we don't need to wait for workqueues to come online and do deferred work. And the multi-core performance should be increased significantly, by virtue of having hardly any locking on the fast path. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: absorb fast pool into input pool after fast loadJason A. Donenfeld1-0/+4
During crng_init == 0, we never credit entropy in add_interrupt_ randomness(), but instead dump it directly into the primary_crng. That's fine, except for the fact that we then wind up throwing away that entropy later when we switch to extracting from the input pool and xoring into (and later in this series overwriting) the primary_crng key. The two other early init sites -- add_hwgenerator_randomness()'s use crng_fast_load() and add_device_ randomness()'s use of crng_slow_load() -- always additionally give their inputs to the input pool. But not add_interrupt_randomness(). This commit fixes that shortcoming by calling mix_pool_bytes() after crng_fast_load() in add_interrupt_randomness(). That's partially verboten on PREEMPT_RT, where it implies taking spinlock_t from an IRQ handler. But this also only happens during early boot and then never again after that. Plus it's a trylock so it has the same considerations as calling crng_fast_load(), which we're already using. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Suggested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: do not xor RDRAND when writing into /dev/randomJason A. Donenfeld1-12/+2
Continuing the reasoning of "random: ensure early RDSEED goes through mixer on init", we don't want RDRAND interacting with anything without going through the mixer function, as a backdoored CPU could presumably cancel out data during an xor, which it'd have a harder time doing when being forced through a cryptographic hash function. There's actually no need at all to be calling RDRAND in write_pool(), because before we extract from the pool, we always do so with 32 bytes of RDSEED hashed in at that stage. Xoring at this stage is needless and introduces a minor liability. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: ensure early RDSEED goes through mixer on initJason A. Donenfeld1-11/+5
Continuing the reasoning of "random: use RDSEED instead of RDRAND in entropy extraction" from this series, at init time we also don't want to be xoring RDSEED directly into the crng. Instead it's safer to put it into our entropy collector and then re-extract it, so that it goes through a hash function with preimage resistance. As a matter of hygiene, we also order these now so that the RDSEED byte are hashed in first, followed by the bytes that are likely more predictable (e.g. utsname()). Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: inline leaves of rand_initialize()Jason A. Donenfeld1-57/+33
This is a preparatory commit for the following one. We simply inline the various functions that rand_initialize() calls that have no other callers. The compiler was doing this anyway before. Doing this will allow us to reorganize this after. We can then move the trust_cpu and parse_trust_cpu definitions a bit closer to where they're actually used, which makes the code easier to read. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: get rid of secondary crngsJason A. Donenfeld1-172/+53
As the comment said, this is indeed a "hack". Since it was introduced, it's been a constant state machine nightmare, with lots of subtle early boot issues and a wildly complex set of machinery to keep everything in sync. Rather than continuing to play whack-a-mole with this approach, this commit simply removes it entirely. This commit is preparation for "random: use simpler fast key erasure flow on per-cpu keys" in this series, which introduces a simpler (and faster) mechanism to accomplish the same thing. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: use RDSEED instead of RDRAND in entropy extractionJason A. Donenfeld1-13/+9
When /dev/random was directly connected with entropy extraction, without any expansion stage, extract_buf() was called for every 10 bytes of data read from /dev/random. For that reason, RDRAND was used rather than RDSEED. At the same time, crng_reseed() was still only called every 5 minutes, so there RDSEED made sense. Those olden days were also a time when the entropy collector did not use a cryptographic hash function, which meant most bets were off in terms of real preimage resistance. For that reason too it didn't matter _that_ much whether RDSEED was mixed in before or after entropy extraction; both choices were sort of bad. But now we have a cryptographic hash function at work, and with that we get real preimage resistance. We also now only call extract_entropy() every 5 minutes, rather than every 10 bytes. This allows us to do two important things. First, we can switch to using RDSEED in extract_entropy(), as Dominik suggested. Second, we can ensure that RDSEED input always goes into the cryptographic hash function with other things before being used directly. This eliminates a category of attacks in which the CPU knows the current state of the crng and knows that we're going to xor RDSEED into it, and so it computes a malicious RDSEED. By going through our hash function, it would require the CPU to compute a preimage on the fly, which isn't going to happen. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-21random: fix locking in crng_fast_load()Dominik Brodowski1-2/+3
crng_init is protected by primary_crng->lock, so keep holding that lock when incrementing crng_init from 0 to 1 in crng_fast_load(). The call to pr_notice() can wait until the lock is released; this code path cannot be reached twice, as crng_fast_load() aborts early if crng_init > 0. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>