Increase StartLimitIntervalSec to 240s

The DefaultTimeoutStartSec is 90s. If a service is hitting this timeout repeatedly then the StartLimitIntervalSec needs to be set in a way to handle this worse case scenario so that the service which is timing out does not continuously get restarted. This means it needs to be set to: StartLimitBurst*DefaultTimeoutStartSec + StartLimitBurst*<worst case processing time> (30s) which currently would be 2x90 + 2x30 Ref: systemd-system.conf Tested: Verified that if 90s timeout is hit in service that it is no longer restarted after 2 attempts. Resolves openbmc/openbmc#3379 (From meta-phosphor rev: ee52526c80eaca65a581c01bcf703861ec1a80b6) Change-Id: I8ff4febeb46a746dd3e5e625c5bdc3735963799b Signed-off-by: Andrew Geissler <geissonator@yahoo.com> Signed-off-by: Brad Bishop <bradleyb@fuzziesquirrel.com>
author: Andrew Geissler <geissonator@yahoo.com> 2018-09-17 18:36:08 +0300
committer: Brad Bishop <bradleyb@fuzziesquirrel.com> 2018-09-24 14:43:49 +0300
commit: 0ee690fcb712718ca7ad179ec1a29a9803c80ed6 (patch)
tree: caad817235ecb34529d49913fa0c599b42baaafb /meta-phosphor
parent: edb619229fb988cd72a919e4068d820c632a7f5e (diff)
download: openbmc-0ee690fcb712718ca7ad179ec1a29a9803c80ed6.tar.xz
1 files changed, 9 insertions, 5 deletions
diff --git a/meta-phosphor/recipes-phosphor/systemd-policy/phosphor-systemd-policy/service-restart-policy.conf b/meta-phosphor/recipes-phosphor/systemd-policy/phosphor-systemd-policy/service-restart-policy.conf
index 54516c2d4..17c9e6bea 100644
--- a/meta-phosphor/recipes-phosphor/systemd-policy/phosphor-systemd-policy/service-restart-policy.conf
+++ b/meta-phosphor/recipes-phosphor/systemd-policy/phosphor-systemd-policy/service-restart-policy.conf
@@ -13,19 +13,23 @@
 # restarting once does the job or restarting all 5 times does not help
 # and we just end up hitting the 5 limit anyway.
 #
-# - Change the StartLimitIntervalSec to 30s
+# - Change the StartLimitIntervalSec to 240s
 # The BMC CPU performance is already challenged. When a service is
 # failing and a core dump is being generated and collected into a dump,
 # it's even more challenged. Recent failures have shown situations where
 # the service does not fail again until 15-20 seconds after the initial
 # failure which means the default of 10s for this results in the service
-# being restarted indefinitely. Change this to 30s to only allow a service
-# to be restarted StartLimitBurst times within a 30s interval before
-# being put in a permanent fail state.
+# being restarted indefinitely.
+# Another issue that has cropped up recently is that the DefaultTimeoutStartSec
+# is 90s. If a service is hitting this timeout repeatedly then there
+# is a similar issue as noted above. Because of this, the StartLimitIntervalSec
+# needs to be StartLimitBurst*DefaultTimeoutStartSec +
+# StartLimitBurst* worst case processing time (30s)
+# which currently would be 2x90 + 2x30
 #
 # See systemd-system.conf(5) for details on the conf files
 
 [Manager]
 DefaultRestartSec=1s
 DefaultStartLimitBurst=2
-DefaultStartLimitIntervalSec=30s
+DefaultStartLimitIntervalSec=240s
author	Andrew Geissler <geissonator@yahoo.com>	2018-09-17 18:36:08 +0300
committer	Brad Bishop <bradleyb@fuzziesquirrel.com>	2018-09-24 14:43:49 +0300
commit	0ee690fcb712718ca7ad179ec1a29a9803c80ed6 (patch)
tree	caad817235ecb34529d49913fa0c599b42baaafb /meta-phosphor
parent	edb619229fb988cd72a919e4068d820c632a7f5e (diff)
download	openbmc-0ee690fcb712718ca7ad179ec1a29a9803c80ed6.tar.xz