Design Concept of LODES


Local Management

A basic concept of LODES is the local management, meaning that a copy of this system (each autonomous system is referred to as an agent in the following sections) is attached to each network segment and is made responsible for local events. Centralized systems that monitor all network components use a number of protocols for network management, such as ICMP (internetwork control message protocol) and SNMP, to determine the state of the entire network. The packets obtaining management information used by these protocols must traverse a number of segments, thus causing a certain amount of traffic. The larger the network, the heavier this extra traffic load. Furthermore, network management information may be unable to be obtained because of a network problem. In the LODES concept, each agent obtains and analyzes only the information needed to manage the local network. This not only reduces the non-local traffic load but also enables the agent to fully understand the state of the local network as described below.

Passive Management

Local management reduces the agent's work, so passive management becomes practical. By "passive management" we mean that, in a realtime manner, a system acquires network management information and manages the network by analyzing the data in packets captured in the promiscuous mode without sending or receiving additional management packets. For example, the "ping" (ICMP echo request) packets sent to hosts that are communicating normally with other hosts are not needed. Moreover, LODES can create a model of the local network configuration and features by observing packets. This model involves in local hosts, routers, and servers (such as DNS and DHCP/BOOTP servers) and network states. It is also used to recognize network activities such as the current activity state of each host (from IP and TCP packets), changes in network states (from traffic and other statistics), the detection of new and replaced hosts (from IP addresses and MAC numbers), partial routing information (from routing protocol packets), the services provided by local hosts (from the port numbers of TCP and UDP headers), and the IP addresses of DNSs and routers (from DNS and routing packets). SNMP, ICMP, and other test packets are used to get necessary complementary information for building the local network model and to diagnose network problems.

Problem Detection

Passive management also enables the system to detect problems (semi-)automatically. When the network has a problem, special kinds/forms of packets, different packet-flow patterns, and/or abnormal statistical data are often observed. Their correspondence to possible problems can point to the cause of a problem. For example, a packet whose destination IP address indicates another network segment but whose target MAC number is the broadcast number or is not that of the router will be lost or duplicated. This phenomenon suggests an incorrect routing definition in the source host. As another example, multiple ARP (address resolution protocol) reply packets indicate multiple IP address assignments; this usually makes communication impossible. Not all problems can be detected in this manner. For example, a host that cannot send packets (because of a hardware failure or a software bug) will not be detected because there are no packets to observe. Such problems must be reported to LODES by network managers or host users.

This automatic problem detection can achieve effective packet capture. When a problem or an indication of a problem is observed, packets are stored in a timely manner for detailed analysis. This reduces the amount of data to be analyzed and thus reduces the inference time. Intermittent and potential problems can also be found automatically. For example, a "broadcast storm" may not exhibit any symptoms in a network segment if only a few hosts are attached, but symptoms may arise as more hosts are added. The agent can advise the network managers of this situation.

Cooperative Monitoring and Diagnosis

Local management limits the cognitive area of an agent. Cooperation among agents overcomes this limitation for non-local problem diagnosis. Although each agent has a detailed management model of its local network segment, some problems require coordinated activities such as requesting tasks and/or getting information about the problem symptoms and non-local network models.

In general, actions are coordinated if their results will be used by other agents and/or if doing so will help balance the load. Coordinated actions are determined based on each agent's world model, which is subjective but contains a partial global perspective. Thus, they usually include non-local data (such as the inference process and planning states of other agents and the domain-level data in other agents) as well as local data.

In LODES, coordination is used for more specific purposes. Non-Local Problem Detection: Symptoms of a problem may be observed in a network segment at a distance from the segment in which the problem originated.

Problem Notification: It is often necessary for an agent to not only detect a problem but also notify the agent in the segment where it originated. This is not simple because the remote agent often cannot observe any problematic symptoms. It is thus necessary to send evidence of the problem to the remote agent so that the remote agent will change the hypothesis and plan it currently holds. Again in the example of the ISR problem, the routing function is never used in local communications; all tests in the local segment (i.e., those run by agent2) show that the local host (host2) is working correctly. Thus, agent1 must send evidence of the problem to agent2.

Network Model Comparison: The model of the local network and a partial (incomplete) model of the non-local network are exchanged by agents to help them to understand the configuration of the non-local network segments and/or to verify their generated models. For example, the ISR problem is observed only by agent1, while agent2 thinks that host2 can communicate with any other host because it does not observe any malfunction. Even if agent1 sends evidence that suggests host2 is not working correctly, it is meaningless if agent2 does not analyze the data appropriately. In such situations, a comparison of the network models may generate a sensible suggestion or an answer. In this example, the models of hosts differ. By identifying differing parts and using diagnostic knowledge about them, LODES can be associated with a number of specific hypotheses, one of which is the ISR problem.

Task and Data Requests: Tasks can be run to balance the load and to select and verify hypotheses. Requests for data needed to solve a problem may lead to new tasks in the receiver agents. In the ISR problem, for example, agent2 may think that the route between itself and agent1 is temporarily congested, making it impossible for the requesting service to receive reply packets. To verify this, agent2 requests agent1 to "ping to host2" and "ping to other hosts in NS2". Note that the results of network observation often differ between agents, so agents may perform the same actions even though they may be redundant.

Problem Reproduction: Reproducing a problem is one way to convince an agent that a problem exists. It can also enable an agent to capture timely data. Reproducing and observing a problem, however, often requires coordinated activities. Again in the ISR problem example agent2 never observes the ISR problem so it does not recognize that its local host has a problem. In this case, host1 (or agent1) sends test packets, and agent2 looks at them and at the packets sent in reply to them. This coordinated reproduction and observation will convince agent2 of the existence of a problem in host2. These actions can also be seen as synchronization, a type of coordination activity. Note that the help of human managers or users is sometimes necessary for reproduction.

In LODES, the agents are homogeneous even though they are situated in different environments. Their automatic management and troubleshooting capabilities are identical; only their local network data is different. Because they look at a problem differently, they generate different hypotheses about what is causing the problem and develop different plans about what to do next. Back to the LODES top page


Updated on July 22th, 1998.