


















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
*This research was conducted at the MIT Laboratory for Information and Decision Systems, and the. Center for Intelligent Control Systems.
Typology: Study notes
1 / 26
This page cannot be seen from the preview
Don't miss anything!
July 1987 LIDS-P-1543(revised)
Abstract An algorithm is presented which allows each node in a computer network to maintain a correct view of the network topology despite link and node failures. Reliability is achieved without transmitting any information other than the operational status of links. Messages are only sent in response to topological changes: periodic retransmission is not required. *This research was conducted at the MIT Laboratory for Information and Decision Systems, and the Center for Intelligent Control Systems. It was supported in part by a National Science Foundation Graduate Fellowship, by the National Science Foundation under Grant NSF-ECS-8310698, by the Defense Advanced Research Projects Agency under Grant ONR/N00014-84-K-0357, and by the Army Research Office under Grant ARO DAAL03-86-K-0171.
1 Introduction
At any time while a store and forward computer network is operating, one or more of its communication links or processing nodes may malfunction or be put back into service. The recovery of the network from such a change in its topology is an essential part of providing reliable data communication. This paper is concerned with the problem of keeping each network node informed of the entire network topology when that topology occasionally changes over time. Any network which uses decentralized adaptive routing needs to address this problem. The classic example of this is the ARPANET, where each node maintains^ a^ map^ of^ the^ entire network and uses it in making routing decisions [8]. Even networks which use some form of hierarchical routing need to solve this problem within some level of the hierarchy. Topological changes may occur at any time. Since all messages sent in the network are subject to delay, a node can never be certain that it knows the^ correct topology at some instant of time. However, distributed algorithms can be designed to guarantee that each node is made aware of the correct status of each link to which it has a physical path, provided that the topology^ does^ not^ change^ for^ a^ sufficient but finite time. Algorithms which accomplish this^ are^ called^ topology^ algorithms. A topology algorithm is a set of rules governing the topology information stored at a node, as well as the contents, transmission, and reception of algorithm mes- sages. These messages are called topology updates or update messages. They usually contain an indication of the operational status (up or down) of one or more network links. When a network is started or reinitialized, the algorithm must^ determine^ the initial network topology and communicate it to every node. Thereafter, the topology
[1,4,5,7,8,9,11,12,13,14,15,16]. The problem of broadcasting topology (^) changes has been formulated in many different ways, and is often solved as part of a distributed routing algorithm. A common approach is to include auxiliary information such as message counters, sequence numbers, or age fields in update messages along with the topology information itself. The sequence numbers are used to distinguish be- tween old and new information and to stop the flooding of messages. Timers at the nodes may be used to enforce some minimum or maximum time interval be- tween the transmission of topology information. Age fields or time stamps may be also be used as auxiliary information in algorithm messages to ensure that old topology information is eventually (^) deleted. The ARPANET algorithm [8] uses a combination of sequence numbers, age fields, and timers. It also requires periodic retransmission of topology information. Perlman [9] presents improvements to this algorithm which, among other things, make it less (^) timer dependent. Other algo- rithms [1,4,5,7,11,12,14,15,16] do not use timers, age fields, or time stamps, and send messages only in response to receiving other messages, or in response to topo- logical changes in adjacent links. We call this class of algorithm event driven. Such algorithms may still rely on bounded message counters to distinguish between old and new information. They may respond to topological changes by rebuilding the entire network topology as in [4], or by modifying an existing topology to reflect the change as in [11]. Several different types of messages are sometimes used for broadcasting information, collecting acknowledgements, and terminating the algo- rithm. The four issues mentioned earlier can make it quite difficult to prove the cor- rectness of topology algorithms, especially when the algorithms are complex. For
example, it was recently shown in [14] that the topology algorithms in [4] and [12] can fail to operate properly in some unusual circumstances. The problem with the algorithm in [4] went unnoticed for nearly ten years. This argues for topology algorithms which can be easily shown to be correct. In this paper we take a rather unconventional approach to solving the topol- ogy problem. The algorithm which we present, called the^ Shortest^ Path^ Topology Algorithm (SPTA), uses no auxiliary information at all. The update^ messages^ con- sist only of topological information, i.e. link status information. The algorithm^ is purely event driven in that nodes transmit messages only in response to receiving a topology update message from a neighbor, or to detecting a status change in an adjacent link. It does not rely on periodic retransmission of messages, or use timers, counters, or clocks of any kind. The update messages are used by a node to modify its existing topology, and there are no special cases for reconnection of disconnected parts of the network. The simple message structure allows a theoretical proof of correctness that is rather straightforward. There are two main motivations for constructing an algorithm with the above characteristics. Although the use of auxiliary information is a logical way to deal with the four difficulties mentioned earlier, it also introduces additional problems [9]. For example, if sequence numbers are used, the finite bit field used to store them may eventually wrap around. While this can be avoided by choosing a large bit field, some provisions should still be made for resetting the numbers [14]. The introduction of auxiliary information into update messages usually leads to increas- ing complexity. An algorithm which avoids auxiliary information entirely avoids also the complexities and difficulties associated with it. Apart from this, there is
The correct operation of any topology algorithm depen:ds strongly on the way in which link status changes (failures and repairs) are detected by the network nodes. Each link is considered to be bidirectional. There is a data link control protocol operating at each end node of a link which decides whether the link is up (^) (operating) or down (not operating). Due to communication delay, the decisions made by the two end nodes of a link may be nonsimultaneous. The following assumptions are made about the operation of the data link control protocol. Al) When a link is down at one end node, it must eventually be called down at the other end node, before either end node can call it up again. A2) If a link is called up at one end node, then within finite time, (^) either the opposite end node must call it up or the first must (^) call it down again. A3) If a message sent by a node i on link (i,j) does not arrive correctly at node j within a finite time, then link (i,j) will be called down by both i and j in a finite time. A4) Links preserve the order of transmitted messages. The following assumptions are made about the operation of the network nodes. A5) A node failure is represented by the (perhaps nonsimultaneous) failure of the links adjacent to the node. A6) While they are operating, nodes maintain the integrity of the data and mes- sages stored in their memory.
Data link control protocols for achieving assumptions Al thorough A4 are non- trivial. Examples (^) of such protocols may be found in [2]. Section 3.1 discusses the operation of SPTA when the above assumptions are violated.
2.2 SPTA Data Structures and Rules Each node i in the network maintains a topology table T i^ called its main topology table. A topology table is a list of the operational status of each link in the network. We refer to the single bidirectional link between nodes m and n as either (m, n) or (n,m), which ever is more convenient. T i^ contains an entry Ti(m,n) for each link (m, n), and reflects node i's current best estimate of the network topology. It is the official topology that would be used by a routing algorithm operating at node i. In addition to its main topology table, node i maintains a port topology table Tj associated with each neighboring node j. The entry in this table for link (m, n) is denoted Tj (m, n). The information stored in table Tj' is the latest topology information received by node i from node j. The tables stored at node i are shown in Figure 1. When a node's main topology table changes, it sends a message notifying each of its neighbors of the change. Therefore, Tj' is merely a delayed version of node j's main topology table, T j. Most algorithm messages consist of a single link name, (m, n) together with the link's status (up or down). However, when a link becomes operational, its end nodes exchange their entire main topology tables. The contents of a table is sent as a single message. SPTA consists of a set of rules for sending messages and updating the topology tables described above. The following rules are followed by each network node.
lists the link on which the message was received as up in its main (^) topology table. R6) When the application of rules R3, R4, or R5 causes an entry in a node's main or port topology tables to change, the node updates its main topology table by using the main topology update algorithm described below.
The following algorithm is used by (^) each node i to construct its main topology table T' based on its knowledge of the status of adjacent links, and the information stored in its port topology (^) tables. It consists of iterations which are very similar to those of Dijkstra's (^) shortest path algorithm [3] when all the links are taken to have a length of 1. The following variables are used by the algorithm by each node (^) i:
Pk: (for k > 1) the set of nodes whose shortest hop path to node i has k links, using only links which are up in topology Ti. Lk: (for k > 1) the set of links (m, n) such that the shortest (^) hop path from node i to the closer end node of (m, n) has k links, using only links which are up in topology TV. N(m): a neighbor of node i that is the first node on a shortest hop path from node i to node m, using only links which are up in topology T i^. N(m) is referred to as the label of node m. s(i, m): node i's current operational status for adjacent link (i, m). This is provided by the data link control algorithm operating at node i.
The purpose of the kth iteration of the algorithm is to enter into Ti the status of those links contained in Lk. This is done by selecting, for each link (m, n) E Lk, a port topology table to believe concerning link (m, n)'s status. The status entry for link (m, n) in this port table is then entered in Ti. At the start of the 1 st iteration of the algorithm. T' contains the status of each link adjacent to i. For completeness, we define Po = {i}. P 1 is the set of neighbors of i connected by working links. In general, at the start of the kth iteration the sets Pk and Pk-l have already been defined. The set Lk can be constructed from Pk and Pkl by chosing those links which have at least one end node in Pk, but no end node in Pk-1. For each link (m, n) E Lk, we chose an end node m such that m E Pk. Then N(m) is a neighbor of i which lies on a shortest hop path from i to link (m,n). The port topology table associated with this neighbor is the one that will be believed concerning the status of link (m, n). If link (m, n) is up and n , Pk then n E Pk+1, and m is on a shortest path from i to n. We can set N(n) equal to N(m), and construct the set Pk+1 in preparation for the (k + 1)St iteration. The algorithm terminates when Pk = 0. This indicates that the status of each link connected to node i has been entered in Ti.l In the following more formal presentation of the main topology update algo- rithm, all of the sets Pk are assumed to be initially empty. Main Topology Update Algorithm at node i Ti(m, n) = s(m,n) for each link adjacent to i down for other links 1A link 1 is connected to a node i if there is a path of operating links connecting i with one of the end nodes of 1.
link (m, n)'s status. The way in which ties are resolved depends upon the order in which the members of Lk are processed, and upon the end node of (m, n) which is selected if both m and n are in Pk. The tie breaking rule used does not affect the correctness of SPTA.
Assume that at some time t, a network is in steady state. This means that for each node within a connected component of the network, the main topology tables are correct for each link which is adjacent to a member of the component. In addition, no algorithm messages are being transmitted. Note that a node is in steady state immediately after being reinitialized. Between time ts and some later time to an arbitrary number^ of^ link^ topology^ changes occur.^ For^ a^ sufficient^ but^ finite^ time interval after^ to^ assume^ that^ no^ further^ topology^ changes^ occur.^ The required^ length of this interval will be addressed in section 3. We wish to show that at some later time tf > to steady state^ has^ been^ reestablished.^ Let^ T*^ be^ the^ correct network topology that an omniscient observer would see upon examining the network after to. We say that node i "knows the correct topology" if^ its^ main^ topology^ T'^ agrees with T* for all links that are connected to i. Node j is called an active neighbor of node i if link (i,j) is operating according to T. We begin by showing the following theorem. Theorem 1. SPTA works correctly in the sense that, under the preceding assump- tions, there is a finite time tf after which each node knows the correct topology. Proof. In what follows, we say that a link I is at distance n away from i if in the graph defined by T the shortest path from i to the closest end node^ of^^1 is^ n
hops long. We will show by induction that for each integer n > 0 there is a time
distance of n or less from i. The induction hypothesis (^) is clearly true for n = 0 since each node i knows the correct status of its adjacent links and records them in its main topology table T i^. We first establish the following lemma. Lemma 1. Assume that the induction hypothesis is true for time t,. Then there is a time t + 1 > t, after which the port topology table TJ, for each active neighbor j of (^) node i, agrees with T* for each link at a distance n or less from j. Proof. Consider waiting a sufficient time after tn for all messages which were sent from j to i before tn to arrive. By rules R1, R2, R4, and R5 of the algorithm, T2i agrees with T i^ for all links which are not adjacent to i. Therefore, by the induction hypothesis T; agrees with T* for each link at a distance of n or less from j which is not adjacent to i. By rules R3 and R5 the correct status of links adjacent to i are recorded in Tji. Therefore, Tj' also agrees with T* for all links adjacent to i. This proves the lemma. To complete the proof of Theorem 1 we must show that there is a time tn+1 > t + 1 such that for all t > t+l 1 and nodes i, T i^ agrees with^ T*^ for^ each^ link^ I^ which is at a distance n + 1 from i. Consider the first time that link I is processed by the main topology update algorithm after the conditions of Lemma 1 are satisfied. Then, link I will belong to the set Ln+1. Also, the closest end node of I to node i will belong to the set P,+1, and will have a label which is one of the active neighbors of i (say j) that is at distance n from 1. By Lemma (^) 1, the entry of Tj for link 1, which will be copied into T i^ when link I is processed, will agree with the corresponding entry in T*. Since Lemma 1 holds for all time t > t+' , the entry for link I in T'
notion of time complexity (^) described in [6]. The time complexity of an algorithm is the number of units of time required if each communication of a message over a link requires at most one time unit and computation (^) requires negligible time. We are interested in the amount of time that the algorithm requires to terminate following a set of status changes involving K links. It is shown in Appendix A that, subject to a few assumptions on the operation of link transmission queues, the time complexity, T, for SPTA is O(N + K), and in fact T < 2(N + K). Since the final status of each of K links must be communicated across the diameter of the network, this is an optimal result for the order of T. For single link topology changes, this is the same result as for the ARPANET update algorithm[8]. In the correctness proof of SPTA it was assumed that (^) no topology changes oc- curred for a "sufficient but finite time." It can be seen that the algorithm terminates in a time which is roughly equivalent to the (^) message propagation time across the network. Assuming that no status changes occur during this time is equivalent to assuming that the average time between status changes is much larger then the message propagation time across the network. The communication complexity of an algorithm is the sum, over all network links, of the number of messages sent on each link. For single link topology changes, this can be established by examining rule R1 of SPTA. When a node's main topol- ogy table changes, it sends a message on each of its adjacent links. (^) This results in 2L messages being sent on an L link network and gives O(L) communication complexity. This is the same result as for the ARPANET update algorithm. When multiple link status changes occur over a short period of time, the cal- culation of communication complexity for SPTA is complicated. A general bound
on communication will not be presented, but it is shown by example in^ Appendix B that in some situations the number of messages sent by a node can grow expo- nentially in the number of status changes. Such behavior is clearly undesirable,^ but it can be obtained only by very carefully choosing the message and status change timing. In practical situations, the ability to remove obsolete messages from queues, the low probability of many nearly simultaneous status changes, and the stochastic variation in message transmission time would tend to reduce the number of messages sent. The amount of processing required by the algorithm can be calculated by exam- ining the main topology update algorithm. Since each link is processed exactly once by this algorithm, its computational requirement is O(L). As stated in section 2.2, a node runs the main topology update algorithm each^ time^ a^ message^ is^ received,^ or^ a status change is detected in an adjacent link. In many situations this^ is^ unnecessary, since the algorithm only modifies the main topology table when status information is received over a shortest hop path. For the case of a single link topology change, if the main topology update algorithm is only run when a message arrives on a shortest path (or when an adjacent link changes status) then each node runs the algorithm exactly once. This gives O(L) computations per node, and is similar to the requirements of the ARPANET update procedure. The methods described in [8] for reducing the computational requirements of the ARPANET algorithm by maintaining a shortest path tree can also be applied to the main topology update algorithm of SPTA. The memory requirement of SPTA at each node i is O(LBi) where Bi is the number of neighbors of node i. Bi is^ usually^ a^ small^ integer,^ but^ can^ be^ as^ large^ as
before. Therefore, when periodic retransmission is used, SPTA will converge to the correct topology a finite (^) time after its operating assumptions are valid. Because it does not use sequence numbers or age fields, SPTA is immune to the type of problem which has occurred on the ARPANET [10].
Appendix (^) A: Time Complexity Consider some sequence of status changes involving at most K links. Each of the K links may change status one or more times (^) at either or both end nodes. We wish to show that if all status changes cease by time 0, then all nodes will know the correct topology by time 2(K + N). To obtain this result, we make the following assumptions:
There are two cases to consider: Case 1: h(j, ') = h(j, 1). Then the link number of 1' is less than the link number of 1. We have h(i,l') < h(j,l') + 1 = h(j,l) + 1 = h(i,l) so (Al) is satisfied.