dbus-sensors: nvmesensor: add 1s timeout for mi worker cv.wait The main thread `workerCv.notify_all` function may be called when mi worker thread is not in waiting state because there is a small time window between the `io.run()` and the `std::unique_lock<std::mutex> lock(mtx)` statement. This corner case is not a big issue after a subsystem is fully started because there is periodical health poll to notify the mi worker thread. However, for the `Status::Initializing` state, this corner case will cause mi worker thread never wake up to execute handler. There exist many solutions to fix this problem. The simplest (one line) fix should be adding 1 second timeout for the `cv.wait`. Tested: Reproduce script: https://paste.googleplex.com/5229001098919936 Before the fix, if we keep restarting nvmesensor for hundreds of loops, we will find that at least one subsystem keeps on printing: subsystem is intiatilzing, cancel the health poll processing health data has been cancelled After the fix, the same restarting test passed hundreds of loops without printing the message above. Google-Bug-Id: 310663382 Change-Id: I6493e87d1ee555a8fba581470da503b0c67bec27 Signed-off-by: Jinliang Wang <jinliangw@google.com> (cherry picked from commit d65f03ce6549a6be9cb0f373460a14bcca689fed)

commit: 90aea0cda731207d6b1664f670191bfacca2a623 [log] [tgz]
author: Jinliang Wang <jinliangw@google.com> Fri Nov 17 23:38:47 2023 -0800
committer: Jinliang Wang <jinliangw@google.com> Sun Nov 19 23:56:18 2023 -0800
tree: caa5dcd1c32c595fa084f0d665521e6485d55c9f
parent: 1944300d2965dfe43a84363cc4b78e03564bb9e6 [diff]
diff --git a/recipes-phosphor/sensors/dbus-sensors/0023-nvmesensor-add-1s-timeout-for-mi-worker-condition_va.patch b/recipes-phosphor/sensors/dbus-sensors/0023-nvmesensor-add-1s-timeout-for-mi-worker-condition_va.patch
new file mode 100644
index 0000000..6ab27c7
--- /dev/null
+++ b/recipes-phosphor/sensors/dbus-sensors/0023-nvmesensor-add-1s-timeout-for-mi-worker-condition_va.patch

@@ -0,0 +1,84 @@
+From 5747098fd523b5bf170efef4cc9a18aab1a885d0 Mon Sep 17 00:00:00 2001
+From: Jinliang Wang <jinliangw@google.com>
+Date: Fri, 17 Nov 2023 22:33:20 -0800
+Subject: [PATCH] nvmesensor: add 1s timeout for mi worker condition_variable
+ wait
+
+The main thread `workerCv.notify_all` function may be called
+when mi worker thread is not in waiting state because there is a
+small time window between the `io.run()` and the
+`std::unique_lock<std::mutex> lock(mtx)` statement.
+
+This corner case is not a big issue after a subsystem is fully
+started because there is periodical health poll to notify the
+mi worker thread.  However, for the `Status::Initializing` state,
+this corner case will cause mi worker thread never wake up to
+execute handler.
+
+There exist many solutions to fix this problem. The simplest one line
+fix should be adding 1 second timeout for the `cv.wait`.
+
+Tested:
+Before the fix, if we keep restarting nvmesensor for hundreds of loops,
+we will find at least one subsystem keeps on printing:
+
+  subsystem is intiatilzing, cancel the health poll
+  processing health data has been cancelled
+
+After the fix, the same restarting test passed hundreds of loops
+without printing the message above.
+
+Google-Bug-Id: b/310663382
+Change-Id: I5e1daa8b89f3dbd083a79b50c6c5093614791936
+Signed-off-by: Jinliang Wang <jinliangw@google.com>
+---
+ src/NVMeMi.cpp     | 3 ++-
+ src/NVMeSubsys.cpp | 4 ++--
+ 2 files changed, 4 insertions(+), 3 deletions(-)
+
+diff --git a/src/NVMeMi.cpp b/src/NVMeMi.cpp
+index babcc3c..d50d041 100644
+--- a/src/NVMeMi.cpp
++++ b/src/NVMeMi.cpp
+@@ -10,6 +10,7 @@
+ #include <cerrno>
+ #include <fstream>
+ #include <iostream>
++#include <chrono>
+ 
+ std::map<int, std::weak_ptr<NVMeMi::Worker>> NVMeMi::workerMap{};
+ 
+@@ -205,7 +206,7 @@ NVMeMi::Worker::Worker()
+             io.restart();
+             {
+                 std::unique_lock<std::mutex> lock(mtx);
+-                cv.wait(lock);
++                cv.wait_for(lock, std::chrono::seconds(1));
+                 if (stop)
+                 {
+                     // exhaust all tasks and exit
+diff --git a/src/NVMeSubsys.cpp b/src/NVMeSubsys.cpp
+index 1b889ea..96339cd 100644
+--- a/src/NVMeSubsys.cpp
++++ b/src/NVMeSubsys.cpp
+@@ -1112,7 +1112,7 @@ void NVMeSubsystem::attachCtrlVolume(uint16_t c, uint32_t ns)
+         return;
+     }
+     attached[c].insert(ns);
+-    std::cout << name << " attached insert " << c << " " << ns << "\n";
++    std::cerr << name << " attached insert " << c << " " << ns << std::endl;
+     controllers[c].first->updateAssociation();
+ }
+ 
+@@ -1129,7 +1129,7 @@ void NVMeSubsystem::detachCtrlVolume(uint16_t c, uint32_t ns)
+         return;
+     }
+     attached[c].erase(ns);
+-    std::cout << name << " attached erase " << c << " " << ns << "\n";
++    std::cerr << name << " attached erase " << c << " " << ns << std::endl;
+     controllers[c].first->updateAssociation();
+ }
+ 
+-- 
+2.43.0.rc0.421.g78406f8d94-goog
+

diff --git a/recipes-phosphor/sensors/dbus-sensors_%.bbappend b/recipes-phosphor/sensors/dbus-sensors_%.bbappend
index 0b78e04..eaa07d4 100644
--- a/recipes-phosphor/sensors/dbus-sensors_%.bbappend
+++ b/recipes-phosphor/sensors/dbus-sensors_%.bbappend

@@ -91,6 +91,7 @@
   file://0020-nvmesensor-add-primary-info-for-plugins.patch \
   file://0021-nvmesensor-use-processSecondaryControllerList-to-rep.patch \
   file://0022-nvmesensor-refactor-createVolume-and-createNamespace.patch \
+  file://0023-nvmesensor-add-1s-timeout-for-mi-worker-condition_va.patch \
 "
 PACKAGECONFIG[nvmesensor] = "-Dnvme=enabled, -Dnvme=disabled, libnvme"
 SYSTEMD_SERVICE:${PN} += "${@bb.utils.contains('PACKAGECONFIG', 'nvmesensor', \
commit	90aea0cda731207d6b1664f670191bfacca2a623	[log] [tgz]
author	Jinliang Wang <jinliangw@google.com>	Fri Nov 17 23:38:47 2023 -0800
committer	Jinliang Wang <jinliangw@google.com>	Sun Nov 19 23:56:18 2023 -0800
tree	caa5dcd1c32c595fa084f0d665521e6485d55c9f
parent	1944300d2965dfe43a84363cc4b78e03564bb9e6 [diff]