Linux: Troubleshooting network connectivity issues

Troubleshooting network connectivity issues in Linux involves identifying and diagnosing the root cause of the problem by checking various network components and configurations. Here’s a systematic approach to troubleshoot network connectivity issues in Linux:

  1. Check Physical Connections:
    • Ensure that all network cables are securely connected, and network interfaces (Ethernet, Wi-Fi) are properly seated in their respective ports.
  2. Verify Network Interface Status:
    • Use the ip or ifconfig command to check the status of network interfaces.ip addr show orcssCopy codeifconfig -a
    • Ensure that the network interface is up (UP state) and has an IP address assigned.
  3. Check IP Configuration:
    • Use the ip or ifconfig command to verify the IP address, subnet mask, gateway, and DNS server settings of the network interface.
    • Ensure that the IP configuration is correct and matches the network configuration of your environment.
  4. Verify DNS Resolution:
    • Use the ping command to test DNS resolution by pinging a domain name.ping example.com
    • If DNS resolution fails, check the /etc/resolv.conf file for correct DNS server configurations and try using alternative DNS servers.
  5. Test Local Network Connectivity:
    • Use the ping command to test connectivity to other devices on the local network by pinging their IP addresses.ping <IP_address>
    • If local pings fail, check the network configuration of the local device, including IP address, subnet mask, and gateway settings.
  6. Check Firewall Settings:
    • Disable the firewall temporarily using the appropriate command for your firewall software (e.g., ufw disable for Uncomplicated Firewall).
    • If network connectivity improves after disabling the firewall, adjust firewall rules to allow necessary network traffic.
  7. Inspect Routing Table:
    • Use the ip route command to view the routing table and ensure that the default gateway is configured correctly.ip route show
    • If necessary, add or modify routing entries using the ip route add command.
  8. Check Network Services:
    • Verify that essential network services (such as DHCP client, network manager, and DNS resolver) are running using the systemctl command.systemctl status NetworkManager systemctl status systemd-resolved
    • Restart or troubleshoot network services as needed.
  9. Review System Logs:
    • Check system logs (e.g., /var/log/syslog, /var/log/messages) for any network-related errors or warnings that may provide clues about the issue.bashCopy codetail -n 50 /var/log/syslog
  10. Test Connectivity to External Resources:
    • Use the ping or traceroute command to test connectivity to external servers and websites.ping google.com traceroute google.com
    • If external pings or traceroutes fail, check for network issues outside your local network, such as ISP problems or internet service disruptions.

By following these steps and systematically checking network components and configurations, you can effectively troubleshoot and resolve network connectivity issues in Linux.

MTTR Definition

MTTR stands for Mean Time To Recovery. It is a key performance indicator (KPI) used to measure the average time it takes to restore a service or system to normal operation after a failure or incident occurs. MTTR is an important metric in incident management and is used to assess the efficiency of an organization’s response and resolution processes.

The formula to calculate MTTR is:

MTTR = Total Downtime / Number of Incidents

Where:

  • Total Downtime: The cumulative duration of time during which a service or system was unavailable or degraded due to incidents.
  • Number of Incidents: The total number of incidents that occurred during a specific period.

For example, if a service experiences three incidents in a month, with respective downtime durations of 2 hours, 3 hours, and 4 hours, the total downtime would be 2 + 3 + 4 = 9 hours. If we divide this total downtime by the number of incidents (3), we would get an MTTR of 3 hours.

A lower MTTR indicates that incidents are being resolved quickly, minimizing the impact on users and the business. Organizations strive to continuously reduce their MTTR by improving incident detection, response, and resolution processes, implementing automation, and investing in proactive monitoring and preventive measures. By reducing MTTR, organizations can improve service reliability, minimize downtime, and enhance overall customer satisfaction.