This is a standalone gRPC server running on all arena management nodes (Baseboard Management Controllers or BMCs) of a data center machine. Its expected clients are telemetry collectors from the cloud in this diagram (or any other place from the network with credentials):
The collector only depends on the Voyager Telemetry gRPC proto definition to interact with this server (server's northbound API). The server polls telemetry sources when subscribed from clients.
If defines a generic “TelemetrySource” API for adding new type of sources into its management (server‘s southbound API). There’s no explicit dependency on existing OpenBMC services on the arena management node, like you can implement I2C sensors as a type of source by reading I2C sysfs file directly, and/or MCTP sockets for sources from host CPUs.
OpenBMC uses Server-Sent Events (SSE) for server-pushed events, which can be used for machine telemetry. However, in our real-world use cases, especially with the rise of ML hardware usage in data centers, we found that this interface doesn‘t optimally satisfy our requirements. Here’s why:
The OpenBMC solution relies on existing telemetry source services, such as the PSUSensor systemd service, to poll the hardware and provide telemetry data via D-Bus object property change events. This approach has several limitations:
Our new telemetry server addresses these issues by providing dynamically adjustable, millisecond-level telemetry sampling, instead of the fixed polling rate at the second level used in OpenBMC.
gRPC naturally provides a streaming interface for telemetry. By removing the assumption that telemetry must go through the Redfish interface, we can eliminate layers of indirection and improve performance:
We still use the Redfish specification as our data model, ensuring compatibility with existing telemetry data consumers.
The current Redfish-based BMC telemetry solution requires multiple transactions between client and server to select desired telemetry sources. It also needs additional rules and optimization efforts for optimal telemetry collection. Our new telemetry server simplifies this process by:
This allows a collector client to use xpath-like syntax or a simple server configuration name to subscribe to all interested telemetry sources with desired parameters in a single gRPC call.
For a more meaningful telemetry solution, we want data collection to focus on sources that need more attention. We achieve this through threshold-controlled telemetry sampling rates:
This approach ensures that critical data is collected more frequently when needed, while conserving resources during normal operation.
The core concept of this telemetry server is the Telemetry Source Manager. For more details, please see the Telemetry Source Manager README.
A subscribe call from a collector client creates a bi-directional stream:
A client can now subscribe to a set of telemetry sources using one of the following Request:
Server configuration name:
TelemetryRequest { req_id: "req_repairability".into(), req_config_group: "repairability_basic_cfg_group".into(), ..Default::default() }
XPath-like query:
Fqp { specifier: "/redfish/v1/Chassis/{ChassisId}/Sensors/{SensorId}".into(), identifiers: HashMap::from([ ("ChassisId".into(), "*".into()), ("SensorId".into(), "*".into()), ]), r#type: FqpType::NotSet as i32, ..Default::default() }
Redfish data type:
Fqp { specifier: "#Sensor.v1_2_0.Sensor".into(), r#type: FqpType::RedfishResource as i32, ..Default::default() }
The stream of response messages, an Update message, contains a vector of Datapoints, each representing a telemetry sample from a source:
0th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11565, timestamp: 2024:10:09:02:20:56.411 1th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11565, timestamp: 2024:10:09:02:20:56.417 ... 45th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11565, timestamp: 2024:10:09:02:20:56.687 46th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11565, timestamp: 2024:10:09:02:20:56.695 47th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11430, timestamp: 2024:10:09:02:20:56.704 48th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11430, timestamp: 2024:10:09:02:20:56.710 ... 79th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11430, timestamp: 2024:10:09:02:20:56.903 80th datapoint, @odata.id: /redfish/v1/Chassis/ChassisTwo/Sensors/fantach_fan5_tach, value: 11430, timestamp: 2024:10:09:02:20:56.909
This time series captures telemetry source value changes with millisecond precision.
Download the code to your Linux workstation
~/workspace$ git clone https://github/google/streaming-telemetry-server -b main; cd streaming-telemetry-server/streaming_telemetry
Need install Rust cross build if not done yet, it requires docker be installed first.
~$ cargo install -f cross
Build the target for ASpeed 2600 SoC
~/workspace/streaming-telemetry-server/streaming_telemetry$ cross build --no-default-features --release --target armv7-unknown-linux-gnueabihf ~/workspace/streaming-telemetry-server/streaming_telemetry$ file ../target/armv7-unknown-linux-gnueabihf/release/streaming_telemetry_server ../target/armv7-unknown-linux-gnueabihf/release/streaming_telemetry_server: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=8b7ef3cef9da4c72110cb1b12c9bad135c2b2c60, with debug_info, not stripped
Copy target to target BMC board
~/workspace/streaming-telemetry-server/streaming_telemetry$ sshpass -p 0penBmc scp ../target/armv7-unknown-linux-gnueabihf/release/streaming_telemetry_server ../yocto/meta-my-machine/recipes-google/streaming-telemetry-server-systemd/files/streaming_telemetry_server_config.textproto root@bmc:/tmp/
Run the gRPC telemetry server
root@bmc:~# /tmp/streaming_telemetry_server --port 50051 --insecure --config /tmp/streaming_telemetry_server_config.textproto --emconfig /usr/share/entity-manager/configurations/BMCBoard.json,/usr/share/entity-manager/configurations/MotherBoard.json Running in insecure mode on port 50051...
Setup ssh tunnel from Linux workstation to target BMC
user1@workstation:~$ sshpass -p 0penBmc ssh -L 50051:localhost:50051 root@bmc
Build and run the test gRPC client from Linux workstation
user1@workstation:~/workspace/streaming-telemetry-server/streaming_telemetry$ cargo build --release --features=build-client user1@workstation:~/workspace/streaming-telemetry-server/streaming_telemetry$ ../target/release/streaming_telemetry_client --port 50051 --insecure
Assume we have a threshold configuration entry defined for a sensor “fantach_fan4_tach” in the server_config.
Run the same command as above, but only monitor the fan tach sensor “fantach_fan4_tach”:
user1@workstation:~/workspace/streaming-telemetry-server/streaming_telemetry$ ../target/release/streaming_telemetry_client --port 50051 --insecure | grep fantach_fan4_tach
Then try to adjust the sensor value manually from BMC console, the sensor polling rate will vary based on new sensor value (depends on which range it fall into its threshold configuration):
# Ensure the Fan Zone modes are all “Manual” or "Disabled" root@bmc:~# curl localhost/redfish/v1/Managers/bmc#/Oem/OpenBmc/Fan/FanZones/Zone_0 | grep FanMode # If not, disable PID control service: root@bmc:~# systemctl stop phosphor-pid-control # Get basis value root@bmc:~# curl http://localhost/redfish/v1/Chassis/ChassisOne/Sensors/fantach_fan4_tach root@bmc:~# curl http://localhost/redfish/v1/Chassis/ChassisOne/Sensors/fanpwm_fan4_pwm # Change fanpwm_fan4_pwm root@bmc:~# curl -X PATCH http://localhost/redfish/v1/Chassis/ChassisOne/Sensors/fanpwm_fan4_pwm -d '{"Reading": 30.0}' # Read back fan tach to confirm root@bmc:~# curl http://localhost/redfish/v1/Chassis/ChassisOne/Sensors/fantach_fan4_tach
Build from yocto:
mTLS related dependencies can only be built from yocto at this moment.
To build from yocto, use the bitbake recipes under yocto folder.
user1@workstation:/var/bmc/build/my_machine$ bitbake -c compile streaming-telemetry-server user1@workstation:/var/bmc/build/my_machine$ sshpass -p 0penBmc scp ../../meta-my_machine/recipes-google/streaming-telemetry-server-systemd/files/streaming_telemetry_server_config.textproto root@bmc:/tmp/ user1@workstation:/var/bmc/build/my_machine$ sshpass -p 0penBmc scp tmp/work/armv7ahf-vfpv3d16-openbmc-linux-gnueabi/streaming-telemetry-server/0.1.0/build/target/armv7-openbmc-linux-gnueabihf/release/telemetry_server root@bmc:/tmp/
Prepare test keys and test mTLS policy:
Run as gRPC telemetry server from BMC:
root@bmc:~# LD_LIBRARY_PATH=/tmp /tmp/streaming_telemetry_server \ --cert /tmp/test-server-cert.pem \ --key /tmp/test-server-key.pem \ --cacert /tmp/test-cacert.pem \ --crls /tmp/crls \ --policy /tmp/test-mtls.policy \ --config /tmp/streaming_telemetry_server_config.textproto \ --emconfig /usr/share/entity-manager/configurations/BMCBoard.json,/usr/share/entity-manager/configurations/MotherBoard.json \ --port 50051
Run gRPC client from Linux workstation:
user1@workstation:~/workspace/streaming-telemetry-server/streaming_telemetry$ ../target/release/streaming_telemetry_client \ --cert test-client-cert.pem \ --key test-client-key.pem \ --cacert test-cacert.pem \ --server_dns target_bmc_dns_name \ --port 50051