summaryrefslogtreecommitdiff
path: root/meta-ibm
diff options
context:
space:
mode:
authorAndrew Jeffery <andrew@aj.id.au>2021-09-14 07:48:19 +0300
committerAndrew Geissler <geissonator@yahoo.com>2021-09-14 15:07:12 +0300
commite6866cac1855ace301003166f997177706de9acf (patch)
tree6b59d98406c03a78d661e8b17f2eb867ebaa56c6 /meta-ibm
parenta63d7f98ea1bd2aa47ea056751a63f5e56c128e0 (diff)
downloadopenbmc-e6866cac1855ace301003166f997177706de9acf.tar.xz
Revert "Override pldm response time out value"
This reverts commit bcc5f6b0f24e8ad0b03b8217e88a19ff3002c084. bcc5f6b0f24e ("Override pldm response time out value") talks about timeouts due to the endpoint taking some time to respond. However, the net effect of the change is the response to a retried request races against the instance ID expiration interval because the retry interval is effectively equal to the instance ID expiration interval once we account for some timer slack. This is demonstrated by the following strace on pldmd, where we can see a retried request go out, followed by the report that the request failed, further followed by the response to the request coming in. Note the values are string-literal-escaped-octal, so the [ 0x80 0x00 0x03 ... ] byte encoding of the GetPLDMVersions request appears as "\200\0\3...": ``` ... 11:56:25.046173 socket(AF_UNIX, SOCK_SEQPACKET, 0) = 3 ... 11:56:25.183936 connect(3, {sa_family=AF_UNIX, sun_path=@"mctp-mux"}, 11) = 0 11:56:25.190994 write(3, "\1", 1) = 1 ... 11:56:25.195272 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11 ... 11:56:30.202298 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\t\1", iov_len=2}, {iov_base="\200\0\3\0\0\0\0\1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11 11:56:30.202820 gettid() = 1918 11:56:30.203029 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=3848, tv_nsec=61124978641398328}}, NULL) = 0 11:56:30.203286 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1 11:56:30.203509 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=203587376}) = 0 11:56:30.203687 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=523046301}) = 0 11:56:30.203844 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=523206110}) = 0 11:56:30.204049 write(2, "Response not received for the re"..., 59) = 59 11:56:30.204427 write(2, " EID = ", 7) = 7 11:56:30.204745 write(2, "9", 1) = 1 11:56:30.205047 write(2, " INSTANCE_ID = ", 15) = 15 11:56:30.205389 write(2, "0", 1) = 1 11:56:30.205719 write(2, " TYPE = ", 8) = 8 11:56:30.205997 write(2, "0", 1) = 1 11:56:30.206266 write(2, " COMMAND = ", 11) = 11 11:56:30.206576 write(2, "3", 1) = 1 11:56:30.206893 write(2, "\n", 1) = 1 11:56:30.209402 write(2, "Failed to receive response for ", 31) = 31 11:56:30.209814 write(2, "getPLDMVersion command, Host see"..., 46) = 46 11:56:30.210969 gettid() = 1918 11:56:30.211171 timerfd_settime64(6, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=8549172174085160960}, it_value={tv_sec=0, tv_nsec=8566510441663037440}}, NULL) = 0 11:56:30.211406 epoll_wait(4, [{EPOLLIN, {u32=14373240, u64=14373240}}], 14, 0) = 1 11:56:30.211640 clock_gettime64(CLOCK_REALTIME, {tv_sec=1629806190, tv_nsec=211720512}) = 0 11:56:30.211825 clock_gettime64(CLOCK_MONOTONIC, {tv_sec=3843, tv_nsec=531188829}) = 0 11:56:30.211983 clock_gettime64(CLOCK_BOOTTIME, {tv_sec=3843, tv_nsec=531335706}) = 0 11:56:30.212143 recv(3, NULL, 0, MSG_PEEK|MSG_TRUNC) = 15 11:56:30.212366 recv(3, "\t\1\0\0\3\0\0\0\0\0\5\361\361\360\0", 15, 0) = 15 ``` That is, at 11:56:30.202298 we send out the retry for the request initiated at 11:56:25.195272 and the reply arrives back at 11:56:30.212366, but in between we've already cancelled the request handler due to the instance ID interval timer expiring. Resolve this by removing the explicit configuration of the response-time-out build parameter setting the per-request response time to 4.8 seconds, setting its value back to the default of two seconds. Anecdotal testing of with the following shell script produced no failures (by inspection of the journal as the iterations executed): ``` for i in `seq 1 30`; do echo $i; ( systemctl stop pldmd mctp-demux && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/unbind && sleep 1 && echo 1e78902c.kcs > /sys/bus/platform/drivers/ast-kcs-bmc/bind && systemctl start pldmd && sleep 15 ) || break; done ``` Change-Id: Ide125d686e79376b412fca0105449c8bef722cfe Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Diffstat (limited to 'meta-ibm')
-rw-r--r--meta-ibm/recipes-phosphor/pldm/pldm_%.bbappend4
1 files changed, 0 insertions, 4 deletions
diff --git a/meta-ibm/recipes-phosphor/pldm/pldm_%.bbappend b/meta-ibm/recipes-phosphor/pldm/pldm_%.bbappend
index 273df37b0..95e3e1b09 100644
--- a/meta-ibm/recipes-phosphor/pldm/pldm_%.bbappend
+++ b/meta-ibm/recipes-phosphor/pldm/pldm_%.bbappend
@@ -1,10 +1,6 @@
PACKAGECONFIG += "oem-ibm"
PACKAGECONFIG[oem-ibm] = "-Doem-ibm=enabled, -Doem-ibm=disabled, , squashfs-tools"
-EXTRA_OEMESON:append = " \
- -Dresponse-time-out=4800 \
-"
-
SYSTEMD_SERVICE:${PN} += "${@bb.utils.contains('PACKAGECONFIG', 'oem-ibm', \
'pldm-create-phyp-nvram.service \
pldm-create-phyp-nvram-cksum.service \