| # Sensor Debug Guide |
| |
| go/tlbmc-sensor-debug |
| |
| <!--* |
| # Document freshness: For more information, see go/fresh-source. |
| freshness: { owner: 'tlbmc-dev' reviewed: '2025-07-08' } |
| *--> |
| |
| By default, tlBMC store will fail to create when sensor creation fails. Note |
| that with traditional dbus-sensors daemons, sensor creation failures are silent |
| and will only result in the objects not being created. This results in partial |
| data being reported and hides missing sensors in the redfish tree unless they |
| are directly queried by a client. tlBMC's behavior allows for verification that |
| all TlbmcOwned sensors declared in Entity Manager Config files are successfully |
| created and served by tlBMC. |
| |
| Note that sensor readings may still not be present even if tlBMC is disabled if |
| `SkipDbusRead` is configured in EM config files per sensor. This guide will |
| describe some common methods for how to debug which sensors fail to be created |
| in this situation. |
| |
| [TOC] |
| |
| ## Check tlBMC Store Creation Failure Logs |
| |
| To verify that tlBMC has failed during creation of a sensor, we can check the |
| bmcweb logs. To check logs, use the following command and expect similar output: |
| |
| ``` |
| root@HOSTNAME:~# journalctl -u bmcweb | grep -i tlbmc |
| ... |
| Jul 06 09:09:54 {HOSTNAME} bmcweb[1955026]: E0706 09:09:54.072554 1955026 webserver_main_setup.hpp:232] Cannot create tlBMC store!! Error: INTERNAL: Failed to find hwmon under /sys/bus/i2c/devices/i2c-1/1-001a - Disabling tlBMC |
| ``` |
| |
| This log indicates a failure to create a sensor object in tlBMC, likely due to a |
| failure to initialize a sensor which may indicate a real hardware failure. |
| |
| To find additional information about the exact sensor that is failing, you can |
| utilize the bus and address information derived from the log at |
| `/sys/bus/i2c/devices/i2c-{BUS}/{BUS}-00{ADDRESS}`. By checking the EM config |
| files associated with this platform, you can find the sensor configuration |
| associated with the given bus/address. Verifying that the bus/address is correct |
| for the intended sensor is an important first step for debug. |
| |
| ## Verify hwmon File Presence and Value |
| |
| Checking that the expected hwmon file is present on the machine is necessary to |
| verify that sensor readings are working as intended. To do so, use the following |
| commands and you should expect to see similar output: |
| |
| ``` |
| root@HOSTNAME:~# ls /sys/bus/i2c/devices/i2c-{BUS}/{BUS}-00{ADDRESS} |
| driver hwmon modalias name of_node pec subsystem uevent |
| ``` |
| |
| Failure to find the hwmon directory shown above could indicate a larger issue, |
| such as a real hardware failure (see below). |
| |
| If the hwmon directory is present, you can verify the intended value of the |
| sensor by checking the following: |
| |
| ``` |
| root@HOSTNAME:~# ls /sys/bus/i2c/devices/i2c-{BUS}/{BUS}-00{ADDRESS}/hwmon/hwmon{*}/ |
| curr1_crit curr3_input in1_label in3_lcrit_alarm power2_label temp2_crit_alarm |
| curr1_crit_alarm curr3_label in1_lcrit in4_crit power3_input temp2_input |
| curr1_input curr3_max in1_lcrit_alarm in4_crit_alarm power3_label temp2_lcrit |
| curr1_label curr3_max_alarm in1_max in4_input power4_input temp2_lcrit_alarm |
| curr1_max curr4_crit in1_max_alarm in4_label power4_label temp2_max |
| curr1_max_alarm curr4_crit_alarm in1_min in4_lcrit subsystem temp2_max_alarm |
| curr2_crit curr4_input in1_min_alarm in4_lcrit_alarm temp1_crit temp3_crit |
| curr2_crit_alarm curr4_label in2_input name temp1_crit_alarm temp3_crit_alarm |
| curr2_input curr4_max in2_label of_node temp1_input temp3_input |
| curr2_label curr4_max_alarm in3_crit power1_alarm temp1_lcrit temp3_lcrit |
| curr2_max device in3_crit_alarm power1_input temp1_lcrit_alarm temp3_lcrit_alarm |
| curr2_max_alarm in1_crit in3_input power1_label temp1_max temp3_max |
| curr3_crit in1_crit_alarm in3_label power2_alarm temp1_max_alarm temp3_max_alarm |
| curr3_crit_alarm in1_input in3_lcrit power2_input temp2_crit uevent |
| ``` |
| |
| To find the file that corresponds to the sensor you are interested in, you can |
| `cat` the value of the `{sensor_type}{*}_label` files to find one that matches |
| the label in the EM config. For instance, if the sensor name in the EM config is |
| `vout1_Name`, the value `vout1` will be present in one of `in1_label`, |
| `in2_label`, `in3_label`, or `in4_label`. The sensor reading will be in the |
| corresponding `in{*}_input` file. Verify that this is a well-formed sensor |
| reading value as expected. |
| |
| ## Enable allow_sensor_creation_failure in tlBMC Configuration |
| |
| The method described above has the disadvantage of only being able to diagnose a |
| single sensor at a time. tlBMC conveniently provides an option to configure |
| bypassing sensor creation failures while still providing useful information for |
| debugging. We provide a setting in the tlBMC central configuration to |
| `allow_sensor_creation_failure`. |
| |
| This setting allows tlBMC store to be created regardless of sensor creation |
| failures. Valid sensors will still be served by tlBMC and debug information can |
| be obtained from tlBMC debug paths such as: |
| |
| ``` |
| root@HOSTNAME:~# curl localhost/redfish/tlbmc/AllSensors |
| { |
| ... |
| "error": { |
| "@Message.ExtendedInfo": [ |
| { |
| "@odata.type": "#Message.v1_1_1.Message", |
| "Message": "Sensor temperature_{SENSOR_NAME} is not ready in tlBMC Store: Failed to read from input device: No such device or address; input device path: /sys/bus/i2c/devices/i2c-{BUS}/{BUS}-00{ADDRESS}/hwmon/hwmon{*}/temp{*}_input", |
| "MessageId": "Base.1.13.0.InternalError" |
| }, |
| { |
| "@odata.type": "#Message.v1_1_1.Message", |
| "Message": "Sensor voltage_{SENSOR_NAME} is not ready in tlBMC Store: Read data can't be converted to a number: Invalid argument", |
| "MessageId": "Base.1.13.0.InternalError" |
| }, |
| ... |
| ], |
| "code": "Base.1.8.GeneralError", |
| "message": "A general error has occurred. See Resolution for information on how to resolve the error." |
| } |
| } |
| ``` |
| |
| All sensor creation errors encountered during tlBMC store creation are combined |
| in the AllSensors response following the |
| [Redfish error message spec](https://redfish.dmtf.org/schemas/DSP0266_1.19.0.html#error-responses). |
| |
| To enable the `allow_sensor_creation_failure` feature, a change must be made |
| similar to: |
| https://gbmc-private-review.git.corp.google.com/c/meta-google-private/+/35823. |
| Add/modify the entry corresponding with the desired platform to include: |
| |
| ``` |
| sensor_collector_module { enabled: true allow_sensor_creation_failure: true } |
| ``` |
| |
| Build and flash a bmcweb binary including this change to have the central config |
| take effect. |
| |
| ## Verify Real Hardware Failures |
| |
| For additional information to diagnose sensor failures, it may be helpful to |
| check logs using `dmesg`. Consider using the following command and look for logs |
| similar to the following: |
| |
| ``` |
| root@HOSTNAME:~# dmesg | grep "Failed to register" |
| [ 92.520183] i2c i2c-{BUS}: Failed to register i2c client {DRIVER} at 0x{ADDRESS} (-16) |
| ``` |
| |
| This indicates a failure to set up the device which could indicate a real |
| failure or the device could have been occupied by another script or service |
| during boot. |
| |
| Potential *short term* solutions in this case could be to: |
| |
| - Manually bind the device using: |
| |
| ``` |
| root@HOSTNAME:~# echo "{BUS}-00{ADDRESS}" > |
| /sys/bus/i2c/drivers/{DRIVER}/bind |
| ``` |
| |
| If the device was temporarily occupied during boot, this may correctly set |
| up the device. |
| |
| - Powercycle the machine: rebooting has fixed sensor instantiation in some |
| cases |
| |
| If either approach above is used, a bug should still be filed and the issue |
| should be reproduced. This flakiness in sensor creation could mask |
| underlying problems e.g. b/428930642. |
| |
| Note: In some cases, it may be expected to see some `Failed to register i2c |
| client` logs, for instance in the case of having sensors configured in the EM |
| config for second-source boards. These sensors may be expected to fail to create |
| if the FRU on the machine does not correspond with the second-source FRU. Also |
| note that expected dmesg error logs are only possible when these sensors are not |
| supported by tlBMC. If tlBMC were to support the second source board sensors, a |
| separate config would have to be made to logically separate these sensors and |
| probe accordingly. |