| .. SPDX-License-Identifier: GPL-2.0 |
| |
| Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1 |
| ================================================================================== |
| |
| Introduction |
| ------------ |
| |
| This documentation is designed for two primary audiences: |
| |
| 1. **Users and System Administrators**: For those dealing with real-world |
| Ethernet issues, this guide provides a practical, step-by-step |
| troubleshooting flow to help identify and resolve common problems in Twisted |
| Pair Ethernet at OSI Layer 1. If you're facing unstable links, speed drops, |
| or mysterious network issues, jump right into the step-by-step guide and |
| follow it through to find your solution. |
| |
| 2. **Kernel Developers**: For developers working with network drivers and PHY |
| support, this documentation outlines the diagnostic process and highlights |
| areas where the Linux kernel’s diagnostic interfaces could be extended or |
| improved. By understanding the diagnostic flow, developers can better |
| prioritize future enhancements. |
| |
| Step-by-Step Diagnostic Guide from Linux (General Ethernet) |
| ----------------------------------------------------------- |
| |
| This diagnostic guide covers common Ethernet troubleshooting scenarios, |
| focusing on **link stability and detection** across different Ethernet |
| environments, including **Single-Pair Ethernet (SPE)** and **Multi-Pair |
| Ethernet (MPE)**, as well as power delivery technologies like **PoDL** (Power |
| over Data Line) and **PoE** (Clause 33 PSE). |
| |
| The guide is designed to help users diagnose physical layer (Layer 1) issues on |
| systems running **Linux kernel version 6.11 or newer**, utilizing **ethtool |
| version 6.10 or later** and **iproute2 version 6.4.0 or later**. |
| |
| In this guide, we assume that users may have **limited or no access to the link |
| partner** and will focus on diagnosing issues locally. |
| |
| Diagnostic Scenarios |
| ~~~~~~~~~~~~~~~~~~~~ |
| |
| - **Link is up and stable, but no data transfer**: If the link is stable but |
| there are issues with data transmission, refer to the **OSI Layer 2 |
| Troubleshooting Guide**. |
| |
| - **Link is unstable**: Link resets, speed drops, or other fluctuations |
| indicate potential issues at the hardware or physical layer. |
| |
| - **No link detected**: The interface is up, but no link is established. |
| |
| Verify Interface Status |
| ~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Begin by verifying the status of the Ethernet interface to check if it is |
| administratively up. Unlike `ethtool`, which provides information on the link |
| and PHY status, it does not show the **administrative state** of the interface. |
| To check this, you should use the `ip` command, which describes the interface |
| state within the angle brackets `"<>"` in its output. |
| |
| For example, in the output `<NO-CARRIER,BROADCAST,MULTICAST,UP>`, the important |
| keywords are: |
| |
| - **UP**: The interface is in the administrative "UP" state. |
| - **NO-CARRIER**: The interface is administratively up, but no physical link is |
| detected. |
| |
| If the output shows `<BROADCAST,MULTICAST>`, this indicates the interface is in |
| the administrative "DOWN" state. |
| |
| - **Command:** `ip link show dev <interface>` |
| |
| - **Expected Output:** |
| |
| .. code-block:: bash |
| |
| 4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ... |
| link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff |
| |
| - **Interpreting the Output:** |
| |
| - **Administrative UP State**: |
| |
| - If the output contains **"UP"**, the interface is administratively up, |
| and the system is trying to establish a physical link. |
| |
| - If you also see **"NO-CARRIER"**, it means the physical link has not been |
| detected, indicating potential Layer 1 issues like a cable fault, |
| misconfiguration, or no connection at the link partner. In this case, |
| proceed to the **Inspect Link Status and PHY Configuration** section. |
| |
| - **Administrative DOWN State**: |
| |
| - If the output lacks **"UP"** and shows only states like |
| **"<BROADCAST,MULTICAST>"**, it means the interface is administratively |
| down. In this case, bring the interface up using the following command: |
| |
| .. code-block:: bash |
| |
| ip link set dev <interface> up |
| |
| - **Next Steps**: |
| |
| - If the interface is **administratively up** but shows **NO-CARRIER**, |
| proceed to the **Inspect Link Status and PHY Configuration** section to |
| troubleshoot potential physical layer issues. |
| |
| - If the interface was **administratively down** and you have brought it up, |
| ensure to **repeat this verification step** to confirm the new state of the |
| interface before proceeding |
| |
| - **If the interface is up and the link is detected**: |
| |
| - If the output shows **"UP"** and there is **no `NO-CARRIER`**, the |
| interface is administratively up, and the physical link has been |
| successfully established. If everything is working as expected, the Layer |
| 1 diagnostics are complete, and no further action is needed. |
| |
| - If the interface is up and the link is detected but **no data is being |
| transferred**, the issue is likely beyond Layer 1, and you should proceed |
| with diagnosing the higher layers of the OSI model. This may involve |
| checking Layer 2 configurations (such as VLANs or MAC address issues), |
| Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and |
| above (firewalls, services, etc.). |
| |
| - If the **link is unstable** or **frequently resetting or dropping**, this |
| may indicate a physical layer issue such as a faulty cable, interference, |
| or power delivery problems. In this case, proceed with the next step in |
| this guide. |
| |
| Inspect Link Status and PHY Configuration |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Use `ethtool -I` to check the link status, PHY configuration, supported link |
| modes, and additional statistics such as the **Link Down Events** counter. This |
| step is essential for diagnosing Layer 1 problems such as speed mismatches, |
| duplex issues, and link instability. |
| |
| For both **Single-Pair Ethernet (SPE)** and **Multi-Pair Ethernet (MPE)** |
| devices, you will use this step to gather key details about the link. **SPE** |
| links generally support a single speed and mode without autonegotiation (with |
| the exception of **10BaseT1L**), while **MPE** devices typically support |
| multiple link modes and autonegotiation. |
| |
| - **Command:** `ethtool -I <interface>` |
| |
| - **Example Output for SPE Interface (Non-autonegotiation)**: |
| |
| .. code-block:: bash |
| |
| Settings for spe4: |
| Supported ports: [ TP ] |
| Supported link modes: 100baseT1/Full |
| Supported pause frame use: No |
| Supports auto-negotiation: No |
| Supported FEC modes: Not reported |
| Advertised link modes: Not applicable |
| Advertised pause frame use: No |
| Advertised auto-negotiation: No |
| Advertised FEC modes: Not reported |
| Speed: 100Mb/s |
| Duplex: Full |
| Auto-negotiation: off |
| master-slave cfg: forced slave |
| master-slave status: slave |
| Port: Twisted Pair |
| PHYAD: 6 |
| Transceiver: external |
| MDI-X: Unknown |
| Supports Wake-on: d |
| Wake-on: d |
| Link detected: yes |
| SQI: 7/7 |
| Link Down Events: 2 |
| |
| - **Example Output for MPE Interface (Autonegotiation)**: |
| |
| .. code-block:: bash |
| |
| Settings for eth1: |
| Supported ports: [ TP MII ] |
| Supported link modes: 10baseT/Half 10baseT/Full |
| 100baseT/Half 100baseT/Full |
| Supported pause frame use: Symmetric Receive-only |
| Supports auto-negotiation: Yes |
| Supported FEC modes: Not reported |
| Advertised link modes: 10baseT/Half 10baseT/Full |
| 100baseT/Half 100baseT/Full |
| Advertised pause frame use: Symmetric Receive-only |
| Advertised auto-negotiation: Yes |
| Advertised FEC modes: Not reported |
| Link partner advertised link modes: 10baseT/Half 10baseT/Full |
| 100baseT/Half 100baseT/Full |
| Link partner advertised pause frame use: Symmetric Receive-only |
| Link partner advertised auto-negotiation: Yes |
| Link partner advertised FEC modes: Not reported |
| Speed: 100Mb/s |
| Duplex: Full |
| Auto-negotiation: on |
| Port: Twisted Pair |
| PHYAD: 10 |
| Transceiver: internal |
| MDI-X: Unknown |
| Supports Wake-on: pg |
| Wake-on: p |
| Link detected: yes |
| Link Down Events: 1 |
| |
| - **Next Steps**: |
| |
| - Record the output provided by `ethtool`, particularly noting the |
| **master-slave status**, **speed**, **duplex**, and other relevant fields. |
| This information will be useful for further analysis or troubleshooting. |
| Once the **ethtool** output has been collected and stored, move on to the |
| next diagnostic step. |
| |
| Check Power Delivery (PoDL or PoE) |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| If it is known that **PoDL** or **PoE** is **not implemented** on the system, |
| or the **PSE** (Power Sourcing Equipment) is managed by proprietary user-space |
| software or external tools, you can skip this step. In such cases, verify power |
| delivery through alternative methods, such as checking hardware indicators |
| (LEDs), using multimeters, or consulting vendor-specific software for |
| monitoring power status. |
| |
| If **PoDL** or **PoE** is implemented and managed directly by Linux, follow |
| these steps to ensure power is being delivered correctly: |
| |
| - **Command:** `ethtool --show-pse <interface>` |
| |
| - **Expected Output Examples**: |
| |
| 1. **PSE Not Supported**: |
| |
| If no PSE is attached or the interface does not support PSE, the following |
| output is expected: |
| |
| .. code-block:: bash |
| |
| netlink error: No PSE is attached |
| netlink error: Operation not supported |
| |
| 2. **PoDL (Single-Pair Ethernet)**: |
| |
| When PoDL is implemented, you might see the following attributes: |
| |
| .. code-block:: bash |
| |
| PSE attributes for eth1: |
| PoDL PSE Admin State: enabled |
| PoDL PSE Power Detection Status: delivering power |
| |
| 3. **PoE (Clause 33 PSE)**: |
| |
| For standard PoE, the output may look like this: |
| |
| .. code-block:: bash |
| |
| PSE attributes for eth1: |
| Clause 33 PSE Admin State: enabled |
| Clause 33 PSE Power Detection Status: delivering power |
| Clause 33 PSE Available Power Limit: 18000 |
| |
| - **Adjust Power Limit (if needed)**: |
| |
| - Sometimes, the available power limit may not be sufficient for the link |
| partner. You can increase the power limit as needed. |
| |
| - **Command:** `ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>` |
| |
| Example: |
| |
| .. code-block:: bash |
| |
| ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000 |
| ethtool --show-pse eth1 |
| |
| **Expected Output** after adjusting the power limit: |
| |
| .. code-block:: bash |
| |
| Clause 33 PSE Available Power Limit: 18000 |
| |
| |
| - **Next Steps**: |
| |
| - **PoE or PoDL Not Used**: If **PoE** or **PoDL** is not implemented or used |
| on the system, proceed to the next diagnostic step, as power delivery is |
| not relevant for this setup. |
| |
| - **PoE or PoDL Controlled Externally**: If **PoE** or **PoDL** is used but |
| is not managed by the Linux kernel's **PSE-PD** framework (i.e., it is |
| controlled by proprietary user-space software or external tools), this part |
| is out of scope for this documentation. Please consult vendor-specific |
| documentation or external tools for monitoring and managing power delivery. |
| |
| - **PSE Admin State Disabled**: |
| |
| - If the `PSE Admin State:` is **disabled**, enable it by running one of |
| the following commands: |
| |
| .. code-block:: bash |
| |
| ethtool --set-pse <devname> podl-pse-admin-control enable |
| |
| or, for Clause 33 PSE (PoE): |
| |
| ethtool --set-pse <devname> c33-pse-admin-control enable |
| |
| - After enabling the PSE Admin State, return to the start of the **Check |
| Power Delivery (PoDL or PoE)** step to recheck the power delivery status. |
| |
| - **Power Not Delivered**: If the `Power Detection Status` shows something |
| other than "delivering power" (e.g., `over current`), troubleshoot the |
| **PSE**. Check for potential issues such as a short circuit in the cable, |
| insufficient power delivery, or a fault in the PSE itself. |
| |
| - **Power Delivered but No Link**: If power is being delivered but no link is |
| established, proceed with further diagnostics by performing **Cable |
| Diagnostics** or reviewing the **Inspect Link Status and PHY |
| Configuration** steps to identify any underlying issues with the physical |
| link or settings. |
| |
| Cable Diagnostics |
| ~~~~~~~~~~~~~~~~~ |
| |
| Use `ethtool` to test for physical layer issues such as cable faults. The test |
| results can vary depending on the cable's condition, the technology in use, and |
| the state of the link partner. The results from the cable test will help in |
| diagnosing issues like open circuits, shorts, impedance mismatches, and |
| noise-related problems. |
| |
| - **Command:** `ethtool --cable-test <interface>` |
| |
| The following are the typical outputs for **Single-Pair Ethernet (SPE)** and |
| **Multi-Pair Ethernet (MPE)**: |
| |
| - **For Single-Pair Ethernet (SPE)**: |
| - **Expected Output (SPE)**: |
| |
| .. code-block:: bash |
| |
| Cable test completed for device eth1. |
| Pair A, fault length: 25.00m |
| Pair A code Open Circuit |
| |
| This indicates an open circuit or cable fault at the reported distance, but |
| results can be influenced by the link partner's state. Refer to the |
| **"Troubleshooting Based on Cable Test Results"** section for further |
| interpretation of these results. |
| |
| - **For Multi-Pair Ethernet (MPE)**: |
| - **Expected Output (MPE)**: |
| |
| .. code-block:: bash |
| |
| Cable test completed for device eth0. |
| Pair A code OK |
| Pair B code OK |
| Pair C code Open Circuit |
| |
| Here, Pair C is reported as having an open circuit, while Pairs A and B are |
| functioning correctly. However, if autonegotiation is in use on Pairs A and |
| B, the cable test may be disrupted. Refer to the **"Troubleshooting Based on |
| Cable Test Results"** section for a detailed explanation of these issues and |
| how to resolve them. |
| |
| For detailed descriptions of the different possible cable test results, please |
| refer to the **"Troubleshooting Based on Cable Test Results"** section. |
| |
| Troubleshooting Based on Cable Test Results |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| After running the cable test, the results can help identify specific issues in |
| the physical connection. However, it is important to note that **cable testing |
| results heavily depend on the capabilities and characteristics of both the |
| local hardware and the link partner**. The accuracy and reliability of the |
| results can vary significantly between different hardware implementations. |
| |
| In some cases, this can introduce **blind spots** in the current cable testing |
| implementation, where certain results may not accurately reflect the actual |
| physical state of the cable. For example: |
| |
| - An **Open Circuit** result might not only indicate a damaged or disconnected |
| cable but also occur if the cable is properly attached to a powered-down link |
| partner. |
| |
| - Some PHYs may report a **Short within Pair** if the link partner is in |
| **forced slave mode**, even though there is no actual short in the cable. |
| |
| To help users interpret the results more effectively, it could be beneficial to |
| extend the **kernel UAPI** (User API) to provide additional context or |
| **possible variants** of issues based on the hardware’s characteristics. Since |
| these quirks are often hardware-specific, the **kernel driver** would be an |
| ideal source of such information. By providing flags or hints related to |
| potential false positives for each test result, users would have a better |
| understanding of what to verify and where to investigate further. |
| |
| Until such improvements are made, users should be aware of these limitations |
| and manually verify cable issues as needed. Physical inspections may help |
| resolve uncertainties related to false positive results. |
| |
| The results can be one of the following: |
| |
| - **OK**: |
| |
| - The cable is functioning correctly, and no issues were detected. |
| |
| - **Next Steps**: If you are still experiencing issues, it might be related |
| to higher-layer problems, such as duplex mismatches or speed negotiation, |
| which are not physical-layer issues. |
| |
| - **Special Case for `BaseT1` (1000/100/10BaseT1)**: In `BaseT1` systems, an |
| "OK" result typically also means that the link is up and likely in **slave |
| mode**, since cable tests usually only pass in this mode. For some |
| **10BaseT1L** PHYs, an "OK" result may occur even if the cable is too long |
| for the PHY's configured range (for example, when the range is configured |
| for short-distance mode). |
| |
| - **Open Circuit**: |
| |
| - An **Open Circuit** result typically indicates that the cable is damaged or |
| disconnected at the reported fault length. Consider these possibilities: |
| |
| - If the link partner is in **admin down** state or powered off, you might |
| still get an "Open Circuit" result even if the cable is functional. |
| |
| - **Next Steps**: Inspect the cable at the fault length for visible damage |
| or loose connections. Verify the link partner is powered on and in the |
| correct mode. |
| |
| - **Short within Pair**: |
| |
| - A **Short within Pair** indicates an unintended connection within the same |
| pair of wires, typically caused by physical damage to the cable. |
| |
| - **Next Steps**: Replace or repair the cable and check for any physical |
| damage or improperly crimped connectors. |
| |
| - **Short to Another Pair**: |
| |
| - A **Short to Another Pair** means the wires from different pairs are |
| shorted, which could occur due to physical damage or incorrect wiring. |
| |
| - **Next Steps**: Replace or repair the damaged cable. Inspect the cable for |
| incorrect terminations or pinched wiring. |
| |
| - **Impedance Mismatch**: |
| |
| - **Impedance Mismatch** indicates a reflection caused by an impedance |
| discontinuity in the cable. This can happen when a part of the cable has |
| abnormal impedance (e.g., when different cable types are spliced together |
| or when there is a defect in the cable). |
| |
| - **Next Steps**: Check the cable quality and ensure consistent impedance |
| throughout its length. Replace any sections of the cable that do not meet |
| specifications. |
| |
| - **Noise**: |
| |
| - **Noise** means that the Time Domain Reflectometry (TDR) test could not |
| complete due to excessive noise on the cable, which can be caused by |
| interference from electromagnetic sources. |
| |
| - **Next Steps**: Identify and eliminate sources of electromagnetic |
| interference (EMI) near the cable. Consider using shielded cables or |
| rerouting the cable away from noise sources. |
| |
| - **Resolution Not Possible**: |
| |
| - **Resolution Not Possible** means that the TDR test could not detect the |
| issue due to the resolution limitations of the test or because the fault is |
| beyond the distance that the test can measure. |
| |
| - **Next Steps**: Inspect the cable manually if possible, or use alternative |
| diagnostic tools that can handle greater distances or higher resolution. |
| |
| - **Unknown**: |
| |
| - An **Unknown** result may occur when the test cannot classify the fault or |
| when a specific issue is outside the scope of the tool's detection |
| capabilities. |
| |
| - **Next Steps**: Re-run the test, verify the link partner's state, and inspect |
| the cable manually if necessary. |
| |
| Verify Link Partner PHY Configuration |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| If the cable test passes but the link is still not functioning correctly, it’s |
| essential to verify the configuration of the link partner’s PHY. Mismatches in |
| speed, duplex settings, or master-slave roles can cause connection issues. |
| |
| Autonegotiation Mismatch |
| ^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - If both link partners support autonegotiation, ensure that autonegotiation is |
| enabled on both sides and that all supported link modes are advertised. A |
| mismatch can lead to connectivity problems or sub optimal performance. |
| |
| - **Quick Fix:** Reset autonegotiation to the default settings, which will |
| advertise all default link modes: |
| |
| .. code-block:: bash |
| |
| ethtool -s <interface> autoneg on |
| |
| - **Command to check configuration:** `ethtool <interface>` |
| |
| - **Expected Output:** Ensure that both sides advertise compatible link modes. |
| If autonegotiation is off, verify that both link partners are configured for |
| the same speed and duplex. |
| |
| The following example shows a case where the local PHY advertises fewer link |
| modes than it supports. This will reduce the number of overlapping link modes |
| with the link partner. In the worst case, there will be no common link modes, |
| and the link will not be created: |
| |
| .. code-block:: bash |
| |
| Settings for eth0: |
| Supported link modes: 1000baseT/Full, 100baseT/Full |
| Advertised link modes: 1000baseT/Full |
| Speed: 1000Mb/s |
| Duplex: Full |
| Auto-negotiation: on |
| |
| Combined Mode Mismatch (Autonegotiation on One Side, Forced on the Other) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - One possible issue occurs when one side is using **autonegotiation** (as in |
| most modern systems), and the other side is set to a **forced link mode** |
| (e.g., older hardware with single-speed hubs). In such cases, modern PHYs |
| will attempt to detect the forced mode on the other side. If the link is |
| established, you may notice: |
| |
| - **No or empty "Link partner advertised link modes"**. |
| |
| - **"Link partner advertised auto-negotiation:"** will be **"no"** or not |
| present. |
| |
| - This type of detection does not always work reliably: |
| |
| - Typically, the modern PHY will default to **Half Duplex**, even if the link |
| partner is actually configured for **Full Duplex**. |
| |
| - Some PHYs may not work reliably if the link partner switches from one |
| forced mode to another. In this case, only a down/up cycle may help. |
| |
| - **Next Steps**: Set both sides to the same fixed speed and duplex mode to |
| avoid potential detection issues. |
| |
| .. code-block:: bash |
| |
| ethtool -s <interface> speed 1000 duplex full autoneg off |
| |
| Master/Slave Role Mismatch (BaseT1 and 1000BaseT PHYs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - In **BaseT1** systems (e.g., 1000BaseT1, 100BaseT1), link establishment |
| requires that one device is configured as **master** and the other as |
| **slave**. A mismatch in this master-slave configuration can prevent the link |
| from being established. However, **1000BaseT** also supports configurable |
| master/slave roles and can face similar issues. |
| |
| - **Role Preference in 1000BaseT**: The **1000BaseT** specification allows link |
| partners to negotiate master-slave roles or role preferences during |
| autonegotiation. Some PHYs have hardware limitations or bugs that prevent |
| them from functioning properly in certain roles. In such cases, drivers may |
| force these PHYs into a specific role (e.g., **forced master** or **forced |
| slave**) or try a weaker option by setting preferences. If both link partners |
| have the same issue and are forced into the same mode (e.g., both forced into |
| master mode), they will not be able to establish a link. |
| |
| - **Next Steps**: Ensure that one side is configured as **master** and the |
| other as **slave** to avoid this issue, particularly when hardware |
| limitations are involved, or try the weaker **preferred** option instead of |
| **forced**. Check for any driver-related restrictions or forced modes. |
| |
| - **Command to force master/slave mode**: |
| |
| .. code-block:: bash |
| |
| ethtool -s <interface> master-slave forced-master |
| |
| or: |
| |
| .. code-block:: bash |
| |
| ethtool -s <interface> master-slave forced-master speed 1000 duplex full autoneg off |
| |
| |
| - **Check the current master/slave status**: |
| |
| .. code-block:: bash |
| |
| ethtool <interface> |
| |
| Example Output: |
| |
| .. code-block:: bash |
| |
| master-slave cfg: forced-master |
| master-slave status: master |
| |
| - **Hardware Bugs and Driver Forcing**: If a known hardware issue forces the |
| PHY into a specific mode, it’s essential to check the driver source code or |
| hardware documentation for details. Ensure that the roles are compatible |
| across both link partners, and if both PHYs are forced into the same mode, |
| adjust one side accordingly to resolve the mismatch. |
| |
| Monitor Link Resets and Speed Drops |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| If the link is unstable, showing frequent resets or speed drops, this may |
| indicate issues with the cable, PHY configuration, or environmental factors. |
| While there is still no completely unified way in Linux to directly monitor |
| downshift events or link speed changes via user space tools, both the Linux |
| kernel logs and `ethtool` can provide valuable insights, especially if the |
| driver supports reporting such events. |
| |
| - **Monitor Kernel Logs for Link Resets and Speed Drops**: |
| |
| - The Linux kernel will print link status changes, including downshift |
| events, in the system logs. These messages typically include speed changes, |
| duplex mode, and downshifted link speed (if the driver supports it). |
| |
| - **Command to monitor kernel logs in real-time:** |
| |
| .. code-block:: bash |
| |
| dmesg -w | grep "Link is Up\|Link is Down" |
| |
| - Example Output (if a downshift occurs): |
| |
| .. code-block:: bash |
| |
| eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx |
| eth0: Link is Down |
| |
| This indicates that the link has been established but has downshifted from |
| a higher speed. |
| |
| - **Note**: Not all drivers or PHYs support downshift reporting, so you may |
| not see this information for all devices. |
| |
| - **Monitor Link Down Events Using `ethtool`**: |
| |
| - Starting with the latest kernel and `ethtool` versions, you can track |
| **Link Down Events** using the `ethtool -I` command. This will provide |
| counters for link drops, helping to diagnose link instability issues if |
| supported by the driver. |
| |
| - **Command to monitor link down events:** |
| |
| .. code-block:: bash |
| |
| ethtool -I <interface> |
| |
| - Example Output (if supported): |
| |
| .. code-block:: bash |
| |
| PSE attributes for eth1: |
| Link Down Events: 5 |
| |
| This indicates that the link has dropped 5 times. Frequent link down events |
| may indicate cable or environmental issues that require further |
| investigation. |
| |
| - **Check Link Status and Speed**: |
| |
| - Even though downshift counts or events are not easily tracked, you can |
| still use `ethtool` to manually check the current link speed and status. |
| |
| - **Command:** `ethtool <interface>` |
| |
| - **Expected Output:** |
| |
| .. code-block:: bash |
| |
| Speed: 1000Mb/s |
| Duplex: Full |
| Auto-negotiation: on |
| Link detected: yes |
| |
| Any inconsistencies in the expected speed or duplex setting could indicate |
| an issue. |
| |
| - **Disable Energy-Efficient Ethernet (EEE) for Diagnostics**: |
| |
| - **EEE** (Energy-Efficient Ethernet) can be a source of link instability due |
| to transitions in and out of low-power states. For diagnostic purposes, it |
| may be useful to **temporarily** disable EEE to determine if it is |
| contributing to link instability. This is **not a generic recommendation** |
| for disabling power management. |
| |
| - **Next Steps**: Disable EEE and monitor if the link becomes stable. If |
| disabling EEE resolves the issue, report the bug so that the driver can be |
| fixed. |
| |
| - **Command:** |
| |
| .. code-block:: bash |
| |
| ethtool --set-eee <interface> eee off |
| |
| - **Important**: If disabling EEE resolves the instability, the issue should |
| be reported to the maintainers as a bug, and the driver should be corrected |
| to handle EEE properly without causing instability. Disabling EEE |
| permanently should not be seen as a solution. |
| |
| - **Monitor Error Counters**: |
| |
| - While some NIC drivers and PHYs provide error counters, there is no unified |
| set of PHY-specific counters across all hardware. Additionally, not all |
| PHYs provide useful information related to errors like CRC errors, frame |
| drops, or link flaps. Therefore, this step is dependent on the specific |
| hardware and driver support. |
| |
| - **Next Steps**: Use `ethtool -S <interface>` to check if your driver |
| provides useful error counters. In some cases, counters may provide |
| information about errors like link flaps or physical layer problems (e.g., |
| excessive CRC errors), but results can vary significantly depending on the |
| PHY. |
| |
| - **Command:** `ethtool -S <interface>` |
| |
| - **Example Output (if supported)**: |
| |
| .. code-block:: bash |
| |
| rx_crc_errors: 123 |
| tx_errors: 45 |
| rx_frame_errors: 78 |
| |
| - **Note**: If no meaningful error counters are available or if counters are |
| not supported, you may need to rely on physical inspections (e.g., cable |
| condition) or kernel log messages (e.g., link up/down events) to further |
| diagnose the issue. |
| |
| When All Else Fails... |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| So you've checked the cables, monitored the logs, disabled EEE, and still... |
| nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t |
| want to cooperate. |
| |
| But before you throw in the towel (or the Ethernet cable), take a deep breath. |
| It’s always possible that: |
| |
| 1. Your PHY has a unique, undocumented personality. |
| |
| 2. The problem is lying dormant, waiting for just the right moment to magically |
| resolve itself (hey, it happens!). |
| |
| 3. Or, it could be that the ultimate solution simply hasn’t been invented yet. |
| |
| If none of the above bring you comfort, there’s one final step: contribute! If |
| you've uncovered new or unusual issues, or have creative diagnostic methods, |
| feel free to share your findings and extend this documentation. Together, we |
| can hunt down every elusive network issue - one twisted pair at a time. |
| |
| Remember: sometimes the solution is just a reboot away, but if not, it’s time to |
| dig deeper - or report that bug! |
| |