summaryrefslogtreecommitdiff
path: root/Documentation/process/handling-regressions.rst
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09docs: handling-regressions: rework section about fixing proceduresThorsten Leemhuis1-82/+126
This basically rewrites the 'Prioritize work on fixing regressions' section of Documentation/process/handling-regressions.rst for various reasons. Among them: some things were too demanding, some didn't align well with the usual workflows, and some apparently were not clear enough -- and of course a few things were missing that would be good to have in there. Linus for example recently stated that regressions introduced during the past year should be handled similarly to regressions from the current cycle, if it's a clear fix with no semantic subtlety. His exact wording[1] didn't fit well into the text structure, but the author tried to stick close to the apparent intention. It was a noble goal from the original author to state "[prevent situations that might force users to] continue running an outdated and thus potentially insecure kernel version for more than two weeks after a regression's culprit was identified"; this directly led to the goal "fix regression in mainline within one week, if the issue made it into a stable/longterm kernel", because the stable team needs time to pick up and prepare a new release. But apparently all that was a bit too demanding. That "one week" target for example doesn't align well with the usual habits of the subsystem maintainers, which normally send their fixes to Linus once a week; and it doesn't align too well with stable/longterm releases either, which often enter a -rc phase on Mondays or Tuesdays and then are released two to three days later. And asking developers to create, review, and mainline fixes within one week might be too much to ask for in general. Hence tone the general goal down to three weeks and use an approach that better aligns with the usual merging and release habits. While at it, also make the rules of thumb a bit easier to follow by grouping them by topic (e.g. generic things, timing, procedures, ...). Also add text for a few cases where recent discussions showed they need covering. Among them are multiple points that better explain the relations to stable and longterm kernels and the team that manages them; they and the group seperators are the primary reason why this whole section sadly grew somewhat in the rewrite. The group about those relations led to one addition the author came up with without any precedent from Linus: the text now tells developers to add a stable tag for any regression that made it into a proper mainline release during the past 12 months. This is meant to ensure the stable team will definitely notice any fixes for recent regressions. That includes those introduced shortly before a new mainline release and found right after it; without such a rule the stable team might miss the fix, which then would only reach users after weeks or months with later releases. Note, the aspect "Do not consider regressions from the current cycle as something that can wait till the cycle's end [...]" might look like an addition, but was kinda was in the old text as well -- but only indirectly. That apparently was too subtle, as many developers seem to assume waiting till the end of the cycle is fine (even for build fixes). In practice this was especially problematic when a cause of a regression made it into a proper release (either directly or through a backport). A revert performed by Linus shortly before the 6.3 release illustrated that[2], as the developer of the culprit had been willing to revert the culprit about three weeks earlier already -- but didn't do so when a fix came into sight and a maintainer suggested it can wait. Due to that the issue in the end plagued users of 6.2.y at least two weeks longer than necessary, as the fix in the end didn't become ready in time. This issue in fact could have been resolved one or two additional weeks earlier, if the developer had reverted the culprit shortly after it had been identified (which even the old version of the text suggest to do in such cases). [1] https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/ [2] https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/ CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg KH <gregkh@linuxfoundation.org> CC: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/6971680941a5b7b9cb0c2839c75b5cc4ddb2d162.1684139586.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-02-24docs: *-regressions.rst: explain how quickly issues should be handledThorsten Leemhuis1-0/+87
Add a section with a few rules of thumb about how quickly developers should address regressions to Documentation/process/handling-regressions.rst; additionally, add a short paragraph about this to the companion document Documentation/admin-guide/reporting-regressions.rst as well. The rules of thumb were written after studying the quotes from Linus found in handling-regressions.rst and especially influenced by statements like "Users are literally the _only_ thing that matters" and "without users, your program is not a program, it's a pointless piece of code that you might as well throw away". The author interpreted those in perspective to how the various Linux kernel series are maintained currently and what those practices might mean for users running into a regression on a small or big kernel update. That for example lead to the paragraph starting with "Aim to get fixes for regressions mainlined within one week after identifying the culprit, if the regression was introduced in a stable/longterm release or the devel cycle for the latest mainline release". Some might see this as pretty high bar, but on the other hand something like that is needed to not leave users out in the cold for too long -- which can quickly happen when updating to the latest stable series, as the previous one is normally stamped "End of Life" about three or four weeks after a new mainline release. This makes a lot of users switch during this timeframe. Any of them thus risk running into regressions not promptly fixed; even worse, once the previous stable series is EOLed for real, users that face a regression might be left with only three options: (1) continue running an outdated and thus potentially insecure kernel version from an abandoned stable series (2) run the kernel with the regression (3) downgrade to an earlier longterm series still supported This is better avoided, as (1) puts users and their data in danger, (2) will only be possible if it's a minor regression that doesn't interfere with booting or serious usage, and (3) might be regression itself or impossible on the particular machine, as the users might require drivers or features only introduced after the latest longterm series branched of. In the end this lead to the aforementioned "Aim to fix regression within one week" part. It's also the reason for the "Try to resolve any regressions introduced in the current development cycle before its end.". Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> CC: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/a7b717b52c0d54cdec9b6daf56ed6669feddee2c.1644994117.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-02-24docs: add two documents about regression handlingThorsten Leemhuis1-0/+659
Create two documents explaining various aspects around regression handling and tracking; one is aimed at users, the other targets developers. The texts among others describes the first rule of Linux kernel development and what it means in practice. They also explain what a regression actually is and how to report one properly. Both texts additionally provide a brief introduction to the bot the kernel's regression tracker uses to facilitate the work, but mention the use is optional. To sum things up, provide a few quotes from Linus in the document for developers to show how serious we take regressions. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/r/34e56d3588f22d7e0b4d635ef9c9c3b33ca4ac04.1644994117.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>