93

From time to time, Linux and Unix users are faced with various network problems. Many of these problems are presented here and at some other troubleshooting forums, but they are very concrete and contain a lot of additional technical information, and sometimes it's rather difficult to understand the main point and the real reason of buggy system behavior.

By asking this question, my intention isto start a community wiki page which allows generalizing our network troubleshooting and debugging experience. I hope the Linux and Unix users will be able to easier recognize and solve ("divide and conquer") their network problems using this page.

The parent of this page should beBest practise to diagnose problems. But here we should focus on troubleshootingthe network problems fromuser- and kernel-space.

I suppose, if you:

  1. share the information about using some great network diagnostic tool with concrete usage examples and examples of network bugs which they help to catch,
  2. share the link to the great network tutorial connected with this subject,
  3. tell us about a general method or recipe which allows to tackle some class of network problems, or
  4. share information about your tool set for network debugging and troubleshooting

then it perfectly fits this topic.


I'll begin by sharing the link to variousdiagnostic tools and a12 year old simple tutorial. Also thisArch Linux tutorial seems to have actual information about our subject. And for diving into Linux networking, we definitely need to visit theLinux Networking-HOWTO.

Matthias Braun's user avatar
Matthias Braun
8,8298 gold badges53 silver badges63 bronze badges
askedOct 6, 2012 at 13:55
Alex R's user avatar
2

3 Answers3

132

I think, general principles of network troubleshooting are:

  1. Find out at what level of theTCP/IP stack (or some other stack) the problem occurs.
  2. Understand what the correct system behavior is and what the deviation from the normal system state is.
  3. Try to express the problem in one sentence or in several words.
  4. Using obtained information from the buggy system, your own experience, and experience of other people (Google, various forums, etc.), try to solve the problem until success (or failure).
  5. If you fail, ask other people about help or some advice.

As for me, I usually obtain all required information using all needed tools, and try to match this information to my experience. Deciding what level of the network stack contains the bug helps to cut off unlikely variants. Using experience of other people helps to solve the problems quickly, but often it leads to a situation, where I can solve some problem without its understanding and if this problem occurs again, it's impossible for me to tackle it again without the Internet.

And in general, I don't know how I solve network problems. It seems that there is some magic function in my brain namedSolveNetworkProblem(information_about_system_state, my_experience, people_experience), which could sometimes return exactly the right answer, and also could sometimes fail (like hereTCP dies on a Linux laptop).

I usually use utils from this set for network debugging:

  • ifconfig (orip link,ip addr) - for obtaining information about network interfaces
  • ping - for validating if the target host is accessible from my machine.ping could also be used for basic DNS diagnostics - we could ping a host by its IP address or by its hostname and then decide if DNS works at all. And thentraceroute ortracepath ormtr to look what's going on on the way there.
  • dig - diagnose everything DNS
  • dmesg | less ordmesg | tail ordmesg | grep -i error - for understanding what the Linux kernel thinks about some trouble.
  • netstat -antp +| grep smth - my most popular usage of the netstat command, which shows information about TCP connections. Often I perform some filtering using grep. See also the newss command (fromiproute2 the newstandard suite of Linux networking tools) andlsof as inlsof -ai tcp -c some-cmd.
  • telnet <host> <port> - is very useful for communicating with various TCP services (e.g. on SMTP, HTTP protocols), also we could check general opportunity to connect to some TCP port.
  • iptables-save (on Linux) - to dump thefull iptables tables
  • ethtool - get all the network interface card parameters (status of the link, speed, offload parameters...)
  • socat - the Swiss army tool to test all network protocols (UDP, multicast, SCTP...). Especially useful (more so than telnet) with a few-d options.
  • iperf - to test bandwidth availability
  • openssl (s_client,ocsp,x509...) to debug all SSL/TLS/PKI issues.
  • wireshark - the powerful tool for capturing and analyzing network traffic, which allows you to analyze and catch many network bugs.
  • iftop - show big users on the network/router.
  • iptstate (on Linux) - current view of the firewall's connection tracking.
  • arp (or the newip neigh in Linux) - show the ARP table status.
  • route or the newer (on Linux)ip route - show the routing table status.
  • strace (ortruss,dtrace ortusc depending on the system) - is a useful tool that shows which system calls the problematic process performs. It also shows error codes (errno) when system calls fail. This information often says enough for understanding the system behavior and solving a problem. Alternatively, using breakpoints on some networking functions ingdb can let you find out when they are made and with which arguments.
  • to investigate firewall issues on Linux:iptables -nvL shows how many packets are matched by each rule (iptables -Z to zero the counters). TheLOG target inserted in the firewall chains is useful to see which packets reach them and how they have already been transformed when they get there. To get further,NFLOG (associated withulogd) will log the full packet.
Matthias Braun's user avatar
Matthias Braun
8,8298 gold badges53 silver badges63 bronze badges
answeredOct 6, 2012 at 13:55
Alex R's user avatar
4
  • 1
    Geez, talk about thorough!CommentedApr 1, 2017 at 1:02
  • 7
    I'd addnmap. The profile of open ports on a machine can quickly give you hints as to whether you are looking at a Linux or Windows server, for example.CommentedApr 21, 2017 at 18:54
  • 7
    I'd addtcpdump. As its the standard packet analyzer for TCP.CommentedMay 23, 2018 at 13:39
  • 1
    tcpdump isn't TCP-only. It's the standard packet analyzer.CommentedFeb 28, 2020 at 10:58
19

A surprising number of "network problems" boil down to DNS problems of one kind or another. Initial troubleshooting should useping -n w.x.y.z in order to leave out DNS resolution of a hostname, and just check IP connectivity. After that, useroute -n to check the default IP route without DNS resolution.

After verifying IP connectivity and routing,nslookup,host, anddig can yield information. Remember that "locking up" can indicate that DNS timeouts are occurring.

Don't forget to check the existence and the contents of/etc/resolv.conf. DHCP clients change that file with every lease, and sometimes they get it wrong, or if disk space is tight, an update might not happen.

Matthias Braun's user avatar
Matthias Braun
8,8298 gold badges53 silver badges63 bronze badges
answeredOct 6, 2012 at 16:25
9

Cabling problems can exist. If you have access to the hardware, ensure that the cables are all plugged in and mechanically engaged. If you can see routers or ethernet interfaces, ensure that the link lights are on.

Remotely, you have to depend onethtool andmii-tool.

[root@flask ~]# ethtool eth0Settings for eth0:        Supported ports: [ TP MII ]        Supported link modes:   10baseT/Half 10baseT/Full                                 100baseT/Half 100baseT/Full         Supported pause frame use: No        Supports auto-negotiation: Yes        Advertised link modes:  10baseT/Half 10baseT/Full                                 100baseT/Half 100baseT/Full         Advertised pause frame use: Symmetric        Advertised auto-negotiation: Yes        Speed: 10Mb/s        Duplex: Half        Port: MII        PHYAD: 24        Transceiver: internal        Auto-negotiation: on        Supports Wake-on: g        Wake-on: d        Current message level: 0x00000001 (1)                               drv        Link detected: yes

"Link detected: yes" is good, but 10Mb/s and Half duplex are not good, as the NIC on that computer can do better. I need to figure out if the NIC is goofed up or the cable is. Another computer plugged into the same router says 100Mb/s, Full duplex.

answeredOct 6, 2012 at 23:46

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.