

























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
SELF CHECKING NETWORK PROTOCOLS: A MONITOR BASED APPROACH. A Thesis. Submitted to the Faculty of. Purdue University by. Gunjan Khanna.
Typology: Study notes
1 / 65
This page cannot be seen from the preview
Don't miss anything!
i
SELF CHECKING NETWORK PROTOCOLS: A MONITOR BASED APPROACH
A Thesis
Submitted to the Faculty
of
Purdue University
by
Gunjan Khanna
In Partial Fulfillment of the
Requirements for the Degree
of
Master of Science
December 2003
ii
ACKNOWLEDGMENTS
Special thanks to Prof. Saurabh Bagchi without whose efforts and input this thesis would not have made its way to the Graduate Office. He gave me the inspiration to strive , think and achieve what stands in the form of this thesis. I would also like to thank Dale Talcot and Casey Carlson from the ITAP department in Purdue whose help in installation and running of TRAM was priceless. A special thanks to Padma and John for their immense help in the project. The inputs from Prof. R. K. Iyer and Zbigniew Kalbarczyk of the University of Illinois at Urbana-Champaign was helpful in giving the right shape to the project and helped in avoiding some pitfalls. I would also like to thank Prof. Ness Shroff and Prof. Rudolf Eigenmann for taking time out and reading the thesis.
iv 3.6.1 Error-Free Cases…………………… ……………………….. …...
v
LIST OF TABLES
Table Page
LIST OF FIGURES
Figure Page
vii
ABSTRACT
Khanna, Gunjan. Masters of Science. Purdue University, Dec 2003. Self Checking Network Protocols : A Monitor Based Approach. Major Professor: Saurabh Bagchi.
The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today’s connected world. The systems are affected by disruption i.e. errors within the protocol or intrusions. This motivates the need for building distributed systems that are capable of tolerating disruptions and providing highly available and correctly functioning services. The machines on which the applications are hosted are heterogeneous in nature, the applications often run legacy code without the availability of their source code, the systems are of very large scales (of the order of tens of thousands of protocol participants) and the systems often have soft real-time guarantees. While it may be possible to devise very optimized and targeted solutions for individual distributed applications, such approaches are not very interesting from a research standpoint due to their limited applicability. In developing this thesis we have focused on Monitor based detection of disruptions in a distributed environment. Monitor detects the disruptions by looking at only the external message exchanges, without looking at the internal transitions of the monitored entity. It is made to run asynchronously to the application thus avoiding the performance bottleneck. We have chosen a black box Monitor approach suitable for any generic protocol. By developing the "Monitor Based Detection Approach", aim is to provide higher reliability and dependability. We propose a Hierarchical Monitoring approach by placing a hierarchy of local and Global Monitors in the system. A Local Monitor only monitors a set of local nodes while a Global Monitor can have several local monitors reporting local interactions to it. This provides increased coverage and accuracy of detection. The Monitor consists of a Rule Classifier, Data Capture and Matching Engine as the main components. The rules are classified into Local and Global rules intelligently by the rule classifier. The Matching Engine consists of fast matching algorithms each for Temporal and Combinatorial rules. Testing of the Monitor is done on a Distributed Reliable Multicast Protocol called TRAM. The Monitor is tested by injecting faults into the running protocol using a Fault Injector.
1
1 INTRODUCTION
The wide deployment of high-speed computer networks has made distributed systems ubiquitous in today’s connected world. Distributed middleware, such as CORBA, DCOM, GLOBE, distributed file systems, such as NFS, XFS and distributed coordination based systems, such as publish-subscribe systems, distributed network protocols, such as reliable multicast, and above all, the distributed infrastructure of the world wide web form the backbone of much of the information technology infrastructure of the world today. The infrastructure, however, is increasingly facing the challenge of dependability outages. The outages result both from naturally occurring failures and malicious attacks. The naturally occurring failures can be crash failures (server halts but works correctly till it halts), omission send or receive failures (server fails to send or receive incoming messages), timing failures (server’s response falls outside the acceptable time bound), response failures (server’s response is incorrect because of value failure or incorrect flow of control), or, the worst scenario, arbitrary or Byzantine failures (server may produce arbitrary responses at arbitrary times). The potential causes of downtime are manifold – hardware failures, system or application software failures, operational failures (or operator errors), maintenance (such as, backups and software upgrades), environmental problems (such as, power outages and communication lines being down).
An example from recent memory is AT&T’s ATM network outage in February, 2001 which caused a downtime of 4 hours for 7% of all its ATM customers and was caused by a misconfiguration in its WAN switch resulting in a firestorm of system management messages [2]. The outages may also be caused by malicious intruders launching attacks against the infrastructure through sending viruses, worms, malformed network packets, etc. designed to exploit vulnerabilities in the hardware or software design of the systems. An example from recent times is the distributed denial of service (DDoS) attack that brought down 9 of the 13 root DNS servers that control the internet traffic. We refer to the combination of failures and intrusions as disruptions in the rest of this proposal. The consequences of downtime of distributed systems are catastrophic. A survey of 450 Fortune 1000 companies found the mean loss of revenue due to an hour of network outage was $82,500, with financial institutions being in the higher end of the curve with downtime costs of $6M/hour[32]. Failures of distributed systems employed in
3
temporal and combinatorial logic format. The monitor is designed to take the rule base as input and partition it into local and global rules according to the deployment.
The Monitor should have low latency of detection. This is ensured by partitioning the rules intelligently into local and global rules, which reduces the number of rules to be matched at each monitor. Highly speed optimized matching algorithms are designed for matching the temporal and the combinatorial rules. The matching algorithm uses multiple threads and therefore can leverage any concurrency available in the host.
We extend the Monitor approach by developing Hierarchical Monitor based detection system. In the hierarchical structure there are several Monitors placed at different logical levels in the system making a hierarchy of local, intermediate and global monitors. The Local Monitors directly snoop on the messages exchanged between the protocol participants. The Intermediate and Global Monitors are invoked for interactions that span across the hosts monitored by a local monitor. The Intermediate and Global monitors observe only messages forwarded by the local monitors and therefore perform rule matching on a subset (hopefully, a small subset) of messages. Each of the Intermediate and Global Monitors have several Local Monitors working on a subset of entire nodes which they are monitoring respectively. Diving the entire space under several Local and Global Monitors reduces the number of rules matched at each level. It helps in reducing the detection of latency because of reduction of load and also improves in coverage of the detection system because not all interaction patterns in distributed systems are local.
The Monitor-based approach is demonstrated on two real world distributed applications - A reliable multicast protocol called TRAM and a control protocol for managing sessions called SIP. In the next chapter we explain the protocols TRAM and SIP , their characteristics and how these protocols suffer from attacks. In chapter we demonstrate an orthogonal way of making a protocol robust – by augmenting the protocol with carefully designed extensions and embedding the extensions within the protocol with non-trivial code additions. We demonstrate the methodology on TRAM and come up with a new protocol called TRAM++ which is resilient to malicious and slow receivers and reduces the buffer requirement of the system as well. Chapter 4 discusses the monitor based detection approach. It describes the Monitor architecture and rule classification. Implementation and results form chapter 5 with system details included. Chapter 6 discusses the related research. Finally we conclude in chapter 7 and provide directions for future work.
4
2 SYSTEM DESCRIPTION
The networked systems are forming an integral part of human lives causing increased reliance on distributed computing. Several support systems and even critical life systems rely on these distributed protocols. Hence the need to make these underlying protocols robust is imperative and not just necessary. We look at the two protocols that are in wide use in building information technology infrastructures, are deployed in critical environments, and have a distributed nature. The two protocols are the Session Initiation Protocol (SIP) and the reliable multicast protocol called TRAM. These are described below.
2.1 Session Initiation Protocol (SIP)
It is an application layer control protocol for creating, modifying and terminating sessions involving internet telephone calls, multimedia distribution and conferences between two or more participants. SIP is an initiation protocol which helps the clients to agree on the characterization of the session which will exist between the two. It is not a vertically integrated system nor does it have any network reservation capabilities. But it does provide some security methods like DoS prevention, authentication, encryption etc. SIP supports the five facets namely :
6
message from a particular domain, CUA should not resend the same request without modifying request and changing CSeq.
∀ T ∈ ( t (^) N , t (^) N + k ) ⇒ VT ⇒ U (^) q q ∈ ( t (^) I , t (^) I + b ) where VT stands for receipt of 2XX response, and Uq stands for passing on response upstream.
∀ t ∈ (t (^) i ,t (^) i+k) L ≤ |Vt | ⇒ L’ ≤ |Bq | ≤ (n-1) ∀ q ∈ (t (^) i ,t (^) i+k) where Vt is requests generated in time t (^) i to t (^) i+k , and B is 482 responses generated. These are some of the vulnerabilities that SIP inherits because of a design which is not robust.
2.2 Tree Based reliable Multicast Protocol (TRAM)
IP Multicast is the basic multicasting framework which exists in the internet. This multicast is unreliable and earned a bad name because of initial problems like large network bandwidth usage and lack of underlying support mechanism. At the other front Reliable multicast protocols are important classes of protocols which reliably disseminating information from a sender to multiple receivers in the face of node and link failures. Guarantee of packet delivery make them important over the simple unreliable IP Multicast. A Tree-based Reliable Multicast Protocol (TRAM) provides scalable reliable multicast by grouping receivers in hierarchical repair groups and using a selective acknowledgment mechanism. The detailed description of TRAM can be found in [3][4]. TRAM is distributed as a part of the Java Reliable Multicast Service (JRMS) by Sun Microsystems [10]. JRMS is a set of libraries and services for building multicast-aware applications.
7
2.2.1 TRAM Protocol Features
TRAM ensures a reliability of packet delivery in case of network and node failures as long as the sender has sent that packet and received by at least one node. It ensures this reliability by placing Repair Heads (RH) at intermediate locations in each LAN for local repair. If a receiver in a particular LAN loses a packet then it asks for its local Repair Head for repair of that packet. The entire structure is formed like a TREE with the sender as root. RH’s form the intermediate repair nodes with receivers or multiple RH below each. Each RH is responsible for local repair in its region. This makes the protocol scalable as each RH is only responsible for a small group of receivers. The RH’s are dynamically chosen within each LAN called the LAN Head. In each LAN a node is chosen to serve as RH for the local nodes. If RH dies or goes dysfunctional then another node from the same LAN is made the RH.
2.2.1.1 Ack Implosion : The local repair heads are also responsible for Ack processing of the nodes under them. Each repair head sends a cumulative ack for all the nodes under them to the RH up above and so on until it reaches the sender. The sender deletes the packet from buffer only if all the nodes below (one level below) have acked the packet. This policy is followed by all RH’s as well i.e. until all the nodes ack the packet its not deleted from the RH’s local buffer. Since all nodes are not responsible to send ack’s to sender this prevents the Ack implosion problem and sender doesn’t get bogged down by too many acks. TRAM removes the problem of Ack implosion.
2.2.1.2 Tree Formation: TRAM has various options which can be set to do tree formation. One can do an optimized LAN formation in which if a sender is in some other LAN and the receivers are in another LAN then a node is chosen in that LAN to serve as repair head to the local receivers. TRAM tries to place RH’s optimally so that inter LAN traffic is minimized. A suitable LAN can be chosen by looking at the TTL fields of the packets. TRAM has different tree formation for unidirectional multicast and bidirectional multicast. For our study we have chosen unidirectional multicast mechanism. It assumes that only sender can do a multicast to the receivers and not other way round. Sender initiates the process of Tree formation by sending a Beacon message. The nodes interested in the data respond by sending a Head Bind message. The sender (or repair head) responds by sending an
9
repair group has a receiver that functions as a group head; the rest function as group members which are said to be affiliated with their head. All members receive data multicast by the sender. The group members report lost and successfully received messages to the group head using a selective acknowledgement mechanism. The RH’s maintain a high and low water mark for monitoring cache occupancy. If the amount of buffer occupied by the packets goes beyond the high water mark, an attempt is made to purge the cache. Failure to do so is taken as an indication of congestion in the network. The RHs aggregate acks from all its members and send an aggregate ack up to the sender to avoid the problem of ack implosion. The data rate sent out by the sender is bounded by maximum and minimum rates configured at the sender. Receivers that cannot keep up with the minimum data rate can be pruned from the repair tree.
2.2.1.4 Flow Control
TRAM incorporates rate based flow control to control the flow of packets and prevent cascading effects. Each RH and sender has a high and low water mark. When the buffer value reaches the high water mark any new packets are discarded and acks are demanded from the receiver below to get the buffer empty. Entities below also report the congestion above to the sender by setting their congestion bit in the flags. Congestion is detected on the basis of missing packets.
10
Figure 2: The above state diagrams represent (a) Receiving Module of a Repair Head (b) State Transitions caused by Hello Messages.
Ack
Data
Functioning State
Data
Check data/
Nack
Ac
Waiting for data
Timeou
Counter
Nack
RH Bad
Re-affiliation
Back to start state
Resent
Receiver leaving
Terminate Membership Request
Drop it
Functioning State
Hello Sent
Ack
Waiting for Ack
Timeout
Set Flag/ Counter values
Drop It/Bad receiver
(a) (b)
12
All the above disruptions make TRAM non-dependable and motivates a generic solution to improve dependability.
13
3 Making Protocol Robust : TRAM++
Making a protocol reliable and fail safe requires knowing the vulnerabilities that a protocol suffers from. In the previous chapter we looked at two example protocols namely SIP and TRAM which are widely deployed and popular in the research community for the standardization and availability of source code. We described a few potential threats to the protocol which hamper their deployment for in critical applications. In this chapter we have analyzed TRAM for one vulnerability and one inefficiency and propose an augmented protocol called TRAM++ [33].
3.1 TRAM++
TRAM++ builds upon TRAM with the following two goals
3.1.1 Buffer management at RH:
The design point in TRAM++ is that the RHs may be spread over a wide area and have constraints on available buffer, while the sender has higher, though not infinite, buffer capacity. TRAM++ optimizes the buffer requirement at the RHs by pruning old messages even if they have not been acknowledged by all its receivers. The advantage is that this frees up the buffer resources at the RH for accommodating new messages which are required for the well-behaved receivers to make progress. Consequently, a nack from a receiver may not always be satisfied locally at the immediate RH. A message is not discarded from the sender’s storage till it has been acked by all the receivers. Therefore, a nack can always be satisfied by the sender. When a RH cannot satisfy a nack, it indicates to the receiver to initiate a temporary re-affiliation with a RH at a higher level. This is shown through the dotted arrow in Figure 3(b), where the receiver re-affiliates temporarily for recovering the messages its RH does not have. Reaffiliation is transient