LODES Diagnosis Examples


Cooperative Troubleshooting

Suppose that the system administrator using this expert system receives the following complaint.

I logged onto host P, then tried to log onto host Q from P using TELNET, but the connection could not be established. Host Q is attached to a remote network (so its path from P to Q has a number of IP routers).

In this case, the possible causes are: P or Q has a problem, an IP router is not functioning, one of the networks to Q is busy, and so on. The following trace is the actual one of the main diagnostic processes using the local and remote LODESs.

(-,-) is the pair of local status number and remote status numbers, which represent the certainties of expressing "I can diagnose this." Actions flush with the left margin are local LODES actions; indented ones are actions of remote LODESs in the network to which B is attached.

1. Send an ICMP echo request packet to P and Q.              (0,-)
2. An ICMP echo reply from P is found.                       (0,-)
3. An ICMP echo reply from Q cannot be found.                (-1,-)
4. Connection to a remote LODES is established.              (-1,0)
5. Send user's report and current results to a remote LODES. (-1,0)

6. LODES receives the host information about Q. (-1,0) 7. Send an ICMP echo request packet to Q. (-1,0) 8. An ICMP echo reply from Q is found. (-1,0) 9. Ready to collect ICMP packets (by NOBS). (-1,0) 10. Send an R-job request to the local LODES. (-1,0)

11. Receive an R-job request. (-1,0) 12. Send an ICMP echo request packet to B. (-1,0)

13. An ICMP echo request packet is found. (-1,0) 14. An ICMP echo reply packet cannot be found. (-1,2) 15. LODES suspects that the routing table in B is damaged. (-1,2) 16. SEND 14 and 15 to the local LODES. (-1,2)

17. Set status number = -2. (-2,2)

18. Check the routing table in B if possible. (-2,2) 19. Find the cause and send it to the local LODES. (-2,2)

20. Output the result to the system administrator. (-2,2)

The cause of the problem is an incorrect routing table in host Q. At 3, LODES suspects that its own network and host P do not have a problem, and at 4, IP routers along the way have no routing problems. Items 4 and 5 are communicated by IIC. At 6, LODES checks whether Q always sees an ICMP echo request packet when looking at the Host Information Database, and the answer is yes. The job request in 10 is a request that the local LODES sends an ICMP echo request packet to Q. At 14, LODES is aware that host Q has a problem; two probable causes are the routing definition in host Q is wrong, or the wrong network interface hardware name is assigned in Q.

During remote LODES operation, the local LODES also operates while LSN (local status number) = -1. Certainly, at 3, there is no ICMP echo reply, but Q may be a host such as a personal computer, which does not look at ICMP packets. While 6 - 10 are being performed, for example, the local LODES looks into host P. Two possible causes are: P has an incorrect routing table, or if P is a SUN machine, it may be in single-user mode or "inetd" may not be working.

Note that, in this case, the routing table in Q can be corrected by sending an ICMP redirect packet from LODES, but this sometimes causes another problem: host Q may be intentionally hidden from another network. Thus LODES sends a redirect packet only to hosts which are allowed to receive it.

Autonomous diagnosis

The following packet data are collected by NOBS and show that there are two ARP replies. These data are sent to LND and are analyzed automatically.

TCP nuesun.telnet > tip-1.62487 ...PA. L:1 S:56447674 A:1304470673 W:4096
TCP tip-1.63505 > ntt-20.telnet ....A. L:0 S:1307336283 A:-1036775238 W:1519
TCP golis.telnet > ntt-20.33729 ...PA. L:1 S:1211245137 A:-1023803300 W:2048
ARP 8:0:2b:6:e4:52 ff:ff:ff:ff:ff:ff nttgoso looks for the ether address of lucifer
TCP ntt-20.telnet > tip-1.63505 ...PA. L:1 S:-1036775238 A:1307336283 W:1330
ARP 8:0:20:0:7:76 8:0:2b:6:e4:52 lucifer replies to nttgoso its ether address is 8:0:20:0:7:76
ARP 8:0:46:0:1d:dc 8:0:2b:6:e4:52 lucifer replies to nttgoso its ether address is 8:0:46:0:1d:dc

IP addresses and MAC numbers are checked in LND to determine whether they are proxy ARP replies and which MAC number is correct based on the MAC number database in LODES. If they are proxy ARP replies, LND recommends to the network manager that only one of the IP routers should reply to proxy ARP. If an incorrect MAC number is seen, LND checks the internal table to determine whether its host has been changed.

Back to the LODES top page


Updated on July 23th, 1998.