Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Model of Lossy Links in Wireless Sensor Networks, Slides of Wireless Networking

Abstract—Recently, several landmark wireless sensor network deployment studies clearly demonstrated a large discrepancy between experimentally observed ...

Typology: Slides

2022/2023

Uploaded on 05/11/2023

rothmans
rothmans 🇺🇸

4.7

(20)

252 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Statistical Model of Lossy Links
in Wireless Sensor Networks
Alberto Cerpa, Jennifer L. Wong, Louane Kuang, Miodrag Potkonjak, Deborah Estrin
Computer Science Department, University of California, Los Angeles, CA 90095
AbstractRecently, several landmark wireless sensor
network deployment studies clearly demonstrated a large
discrepancy between experimentally observed communica-
tion properties and properties produced by widely used sim-
ulation models. Our first goal is to provide sound founda-
tions for conclusions drawn from these studies by extract-
ing the relationship between pairs of location (e.g distance)
and communication properties (e.g. reception rate) using
non-parametric statistical techniques and by calculating in-
tervals of confidence for all claims. The objective is to deter-
mine not only the most likely value of one feature for an al-
ternate given feature value, but also to establish a complete
characterization of the relationship by providing a proba-
bility density function (PDF). The PDF provides the likeli-
hood that any particular value of one feature is associated
with a given value of another feature. Furthermore, we
study not only individual link properties, but also their cor-
relation with respect to common transmitters and receivers
and their geometrical location.
The second objective is to develop a series of wireless
network simulation environments that produce networks
of an arbitrary size and under arbitrary deployment
rules with realistic communication properties. For this
task we use an iterative improvement-based optimization
procedure to generate instances of the network that are
statistically similar to empirically observed networks. We
evaluate the accuracy of the conclusions drawn using the
proposed model and therefore comprehensiveness of the
considered properties on a set of standard communication
tasks, such as connectivity maintenance and routing.
Index terms: sensor networks, wireless channel model-
ing, simulations, network measurements, experimentation
with real networks/testbeds, statistics.
I. INTRODUCTION
It is well known that the performance of many pro-
tocols and localized algorithms for wireless multi-hop
sensor networks greatly depend on the underlying com-
munication channel. Hence, to evaluate performance in
simulation, we must have an accurate communication
model. Until recently, only two approaches have been in
widespread use in the sensor network community: unit
disk modelling and empirical data traces.
The unit disk model states that the reception rate be-
tween two nodes is solely a function of the distance be-
tween them and that communication is conducted without
any loss of packets if the nodes are closer than a spec-
ified communication range. The unit disk model has a
number of noble features. It is ideally suited for theoret-
ical analysis, for the derivation of a lower bound, and the
development of optimization algorithms. Moreover, it is
straight-forward to develop a wireless network simulator
that generates network instances that follow this commu-
nication model. However, the unit disk model also has a
large number of serious drawbacks. For example, it im-
plies complete correlation between the properties of ge-
ometric space and the topology of the network, a prop-
erty refuted by numerous experiments in real-life deploy-
ments [1], [6], [14].
At the other end of the spectrum are networks and com-
munication patterns that are empirical traces of deployed
systems. These networks are, of course, completely accu-
rate samples of real life wireless communications. How-
ever, it is difficult and expensive to create a large number
of large networks that are properly characterized. There-
fore, neither probabilistic nor statistical analysis of large
networks is feasible. In addition, since a given trace is the
result of communication over a specific topology, such a
trace does not permit a simulator to reposition nodes. Fi-
nally, without validated communication analysis for these
cases, theoretical analysis is not possible.
In an effort to address a middle ground, recently there
have been a number of efforts to empirically capture com-
munication patterns in wireless sensor networks. In par-
ticular, there have been several studies that use different
low power, narrow band radio transceivers chipsets [3],
[11] to deduce properties of communication links in wire-
less networks in several environments, such as open space
and laboratories. These hybrid models introduce empiri-
cally observed factors that modify the communication pat-
terns based on the unit disk communication model.
While these models are a significant step forward with
respect to the unit disk model, they are only an initial step
in the exploration of the space. These initial models do
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Statistical Model of Lossy Links in Wireless Sensor Networks and more Slides Wireless Networking in PDF only on Docsity!

Statistical Model of Lossy Links

in Wireless Sensor Networks

Alberto Cerpa, Jennifer L. Wong, Louane Kuang, Miodrag Potkonjak, Deborah Estrin

Computer Science Department, University of California, Los Angeles, CA 90095

Abstract —Recently, several landmark wireless sensor network deployment studies clearly demonstrated a large discrepancy between experimentally observed communica- tion properties and properties produced by widely used sim- ulation models. Our first goal is to provide sound founda- tions for conclusions drawn from these studies by extract- ing the relationship between pairs of location (e.g distance) and communication properties (e.g. reception rate) using non-parametric statistical techniques and by calculating in- tervals of confidence for all claims. The objective is to deter- mine not only the most likely value of one feature for an al- ternate given feature value, but also to establish a complete characterization of the relationship by providing a proba- bility density function (PDF). The PDF provides the likeli- hood that any particular value of one feature is associated with a given value of another feature. Furthermore, we study not only individual link properties, but also their cor- relation with respect to common transmitters and receivers and their geometrical location. The second objective is to develop a series of wireless network simulation environments that produce networks of an arbitrary size and under arbitrary deployment rules with realistic communication properties. For this task we use an iterative improvement-based optimization procedure to generate instances of the network that are statistically similar to empirically observed networks. We evaluate the accuracy of the conclusions drawn using the proposed model and therefore comprehensiveness of the considered properties on a set of standard communication tasks, such as connectivity maintenance and routing.

Index terms: sensor networks, wireless channel model- ing, simulations, network measurements, experimentation with real networks/testbeds, statistics.

I. INTRODUCTION

It is well known that the performance of many pro- tocols and localized algorithms for wireless multi-hop sensor networks greatly depend on the underlying com- munication channel. Hence, to evaluate performance in simulation, we must have an accurate communication model. Until recently, only two approaches have been in widespread use in the sensor network community: unit disk modelling and empirical data traces.

The unit disk model states that the reception rate be- tween two nodes is solely a function of the distance be- tween them and that communication is conducted without any loss of packets if the nodes are closer than a spec- ified communication range. The unit disk model has a number of noble features. It is ideally suited for theoret- ical analysis, for the derivation of a lower bound, and the development of optimization algorithms. Moreover, it is straight-forward to develop a wireless network simulator that generates network instances that follow this commu- nication model. However, the unit disk model also has a large number of serious drawbacks. For example, it im- plies complete correlation between the properties of ge- ometric space and the topology of the network, a prop- erty refuted by numerous experiments in real-life deploy- ments [1], [6], [14]. At the other end of the spectrum are networks and com- munication patterns that are empirical traces of deployed systems. These networks are, of course, completely accu- rate samples of real life wireless communications. How- ever, it is difficult and expensive to create a large number of large networks that are properly characterized. There- fore, neither probabilistic nor statistical analysis of large networks is feasible. In addition, since a given trace is the result of communication over a specific topology, such a trace does not permit a simulator to reposition nodes. Fi- nally, without validated communication analysis for these cases, theoretical analysis is not possible. In an effort to address a middle ground, recently there have been a number of efforts to empirically capture com- munication patterns in wireless sensor networks. In par- ticular, there have been several studies that use different low power, narrow band radio transceivers chipsets [3], [11] to deduce properties of communication links in wire- less networks in several environments, such as open space and laboratories. These hybrid models introduce empiri- cally observed factors that modify the communication pat- terns based on the unit disk communication model. While these models are a significant step forward with respect to the unit disk model, they are only an initial step in the exploration of the space. These initial models do

not capture many important features of communication links in empirically observed networks. For example, they do not address the correlation in communication reception rates between nodes that originated at the same transmit- ter or differences in the quality of transmitters. Moreover, these hybrid models were not subjected to statistical vali- dation tests against empirical data. Intervals of confidence on how well the hybrid models actually model the empir- ical data is also lacking. Our goal is to develop accurate simulations of sen- sor network communication environments that are statis- tically compatible with respect to several features that im- pact network protocols and algorithms in real networks. To generate these simulated environments, we construct a set of models that map communication properties such as absolute physical location, relative physical proxim- ity and radio transmission power into probability density functions describing packet reception likelihood. For all of these models, we calculate an interval of confidence. These models not only serve to generate simulated envi- ronments, they themselves have lent support to many hy- potheses relating to variation in communication link qual- ity [1], [6]. We note that the analysis of the MAC impact (e.g. contention, throughput) and temporal properties of the links is part of future work.

II. RELATED WORK

There is a large body of literature on mobile radio prop- agation models that have influenced this work. The em- phasis has been on large scale path loss models that pre- dict the average received signal strength at a given dis- tance from the transmitter and the variability of the sig- nal strength in proximity to a particular location [10]. Furthermore, the models are used to predict the cover- age area of a transmitter. In addition, small scale fading models are used for modeling localized time durations (a few microseconds) and space locations (usually one me- ter) changes. All these models are based on the Fries free space equation and indicate that reception quality decays with the inverse of distance raised to a small power [10]. Differences between the classical models and our ap- proach are numerous and include different modeling ob- jectives (reception rate of packets vs. signal strength), our radios have different features (e.g. communication range in meters instead of km), we capture phenomena that is not addressed by the classical channel models (asymme- try, different quality of receivers and transmitters, cor- relations between reception rate of links), we use dif- ferent modeling techniques (free of assumptions, non- parametric vs. parametric), and we use unique evalua- tion techniques (resubstitution and evaluation of multi-

hop routing). We believe existing and new techniques have complementary objectives, tools, and applications. More recently there have been many studies of significant-scale deployments in several environments. Majority of these studies used the TR1000 [11] and CC1100 [3] low power RF transceivers (used by the Mica 1 [7] and Mica 2 [4] motes respectively). For example, Ganesan et al. [6] performed large scale experiments with 150 Mica 1 motes in order to capture features of link, MAC, and application layers in wireless communication. Their data firmly established empirical evidence that ra- dio connectivity is not isotropic and there is a significant percentage of asymmetric links. In a series of studies conducted at Berkeley, Woo et al. [13] used experimen- tal measurements to derive a packet loss model based on aggregate statistical measures such as mean and standard deviation of reception rates. Using these models, they an- alyzed its consequences on a number of communication protocols and neighborhood management policies. More recently, Zhao et al. [14] has demonstrated heavy vari- ability in packet reception rate for a wide range of dis- tances between a transmitter and receiver. Furthermore, Cerpa et al. [1] used heterogeneous hardware platforms consisting of Mica 1 and Mica 2 motes in three different environments to collect comprehensive data about the de- pendency of reception rates with respect to a variety of parameters and conditions of operation. They also pro- vided substantial empirical evidence that the main cause of asymmetric links is due to differences in hardware cal- ibration as it was previously hypothesized. Another no- table work in this area was performed earlier this year by Zhou et al. [15] that studied radio symmetry in wireless communication networks. They postulated that variance in receive signal strength is mainly random, and used the empirical data to develop a new radio model (RAM). Us- ing RAM, asymmetry is generated due to the variance in RF sending power and the differences in path losses for different directions of propagation. Zhou at al. used the parametric goodness-of fit statistical testing to determine that the variance of strength of the received signal that fol- lows a Weibull distribution has the maximal likelihood of matching their experimental data. There are three major differences between the mod- els developed in this paper and the previously published models. The first is that we study the impact of a signif- icantly large number of factors that impact reception rate and attempt to model not only isolated pairs of transmit- ters and receivers, but also the correlation between dif- ferent pairs and different subsets of links. The second major difference is that our goal is not only to establish a model, but also to establish statistically sound measures

 (^)    (^)   





 

(a) Window placement

%%! (^) !!& "('$ ! "#$

)+*

  • ., /.^01

23

(b) Weighted data points

??; 8 : 7 ;;@ < (^7) B^5 A (^9) > (^456789) :; <=>

CED

GHF IJK HLM

(c) Distance and reception rate window placement

YYURTQUUZ VQO[SX NOPQRSTU VWX

]_^`

acb

debf

g

(d) Reception rate window on weighted distance data

Fig. 3. Application of sliding window on distance and reception rate axis.

culties is to use the smoothness principle.

B. Methodology

The global flow of our approach is shown in Figure 1 using pseudo-code format. The starting point for the pro- cedure is exploratory data analysis. As the first step of this phase, we examine a scatter plot of all available data points. Specifically, we position each communication link in a two dimensional space, placing on the x-axis distance and on the y-axis the reception rate. The goal is to iden- tify if there are any specific trends in the data and to deter- mine whether it is advantageous to split the data into two or more subsets that have specific features. Figure 2(a) illustrates a scatter plot of distance versus reception rate at medium power for the outdoor case. Fig- ure 2(b) shows a zoomed version of a subset of data which was examined during exploratory data analysis. Phase two consists of three steps shown in Figure 1 in lines 4-8. In the step shown in line 4, we sort all avail- able data according to distance to identify data points that are similar with respect to this parameter. Next, we use a sliding window for all points which are within a simi- larity range of a given point (distance). Each point within this range is weighted according to its quantified similar- ity to a given point. Note that both the scope of the win- dow as well as the weighting function can be defined in an arbitrary fashion, so long as the monotonicity property imposed by the smoothing principle is maintained. Figures 3(a) and 3(b) show how the data is weighted after the application of a triangular window to the set of points around distance 20 meters with window size ± 5 meters. The intuition behind the weighting function is that data points that are closer to the center of the win- dow are better approximations for the point at the center of the window. For each distance of interest we also build another sys- tem of sliding windows this time along the y-axis corre- sponding to the reception rate. The points within the win-

Fig. 4. PDF for distance versus reception rate.

dow are weighted as the product of the weight factor of both the distance window and the reception rate window. Exactly the same techniques that are used to determine the parameters of the distance window are applicable and used for the shaping and sizing the reception rate window. Figures 3(c) and 3(d) show a window of data for the point centered at distance 20m and reception rate 33%. Once the first eleven lines of the pseudo-code are ex- ecuted we have enough information to begin building a PDF that indicates how likely a particular reception rate is for a given distance. For this task we used quadratic least linear squares fitting for a particular pair of distance and reception rate. Once the model is built, the next step is PDF normal- ization that ensures that for a given distance the integral of the function below the PDF mapping function is equal to one. Figure 4 illustrates how the normalized reception rate PDF changes with respect to distance.

C. Evaluation The final step of our procedure is the evaluation of the quality of the developed statistical model for the PDF. The

0

0 0.2 0.4 0.6 0.8 1

PDF Value

Window Size/PDF Value Ratio

Fig. 5. PDF values of the different random points as a function of the confidence interval/PDF value ratio. Outdoor Urban, 90% Conf. Level.

evaluation procedure itself has three components: Monte Carlo sampling, resubstitution, and establishment of inter- val of confidence. Monte Carlo sampling selects k (in our experimentation we use 200) randomly selected pairs of distance and reception rate points.

Resubstitution is the process where a statistical model is built using the exact same procedure (same kernel win- dow scope and weight function) on randomly selected subsets of data. Specifically, in our simulations, we se- lect 70% of the available data to build a model on each resubstitution run. For each resubstitution run we record the value of the PDF function at each of the k selected points. After conducting m resubstitution runs (in our experimentation m was 100), we are ready to establish an interval of confidence for our statistical PDF model. This is performed in two stages. We first establish an in- terval of confidence for each point individually, and then by combining information from all local interval of confi- dence we establish a global interval of confidence. Figure 5 shows the relationship between the different confidence intervals for each random sample tuple (reception rate and distance) and the highest likelihood PDF value for differ- ent confidence levels. Each point in the graphs show the highest likelihood PDF value with its confidence interval. For example, the top left point in the graph of Figure 5 corresponds to sample point of distance 52 meters, recep- tion rate 0% with highest likelihood PDF value of 0.2 ± 0.0001953 with confidence level of 90%. The final step of resubstitution is to build a global measure of the model’s accuracy. To build a global interval of confidence we use the following procedure. First, for each separate point in k, we use the highest likelihood PDF value and normalize all other values against this value. After that, we combine all data from all sampling points into one set of the size k x m. Finally, we calculate the confidence intervals of the normalized global array. Table I shows the overall interval of confidence for indoors and outdoors with different con-

TABLE I GLOBAL EVALUATION RESULTS

Environment Conf. H.L.PDF Conf. Level Value Intervals Indoor 90 0.997627 ±0. Outdoor 90 1.064365 ±0. Indoor 95 1.023886 ±0. Outdoor 95 1.022372 ±0.

fidence levels. In general, the global highest likelihood PDF values are centered around one, which is a good sign of the statistical soundness of the model.

IV. EXPERIMENTAL DATA COLLECTION

In this section we discuss the methodology used for our experimental data collection. We used an existing data set and performed additional experiments using the SCALE wireless measuring tool [1]. The basic data collection ex- periments work as follows. Each node transmits a certain number of packet probes in a round robin fashion (one transmitter at a time). Each probe packet contains the sender’s node id and a sequence number. The rest of the nodes record the packets received from each neighbor and keep updated connectivity statistics, using the sequence numbers to detect packet losses. We refer to [1] for fur- ther details about the tool. The topology used for our experiments consisted of 16 nodes distributed in an ad-hoc manner in different envi- ronments. We also used up to 55 nodes for our indoor experiments distributed in the ceiling of our lab. For out- door experiments, nodes were placed in a variety of differ- ent positions, such as near the ground or elevated off the ground, with or without line of sight (LOS) between them, and with different levels of obstructions (furniture, walls, trees, etc.). The placement of the nodes also took into ac- count the distance between them, in order to create a rich set of links at distances varying from 2 to 50 meters and in multiple different directions from any particular sender.^1 In most of our experiments, each node sends up to 200 packets per round, transmitting 2 packets per second.^2. Using this setup, we varied five factors in our exper- iments: the choice of environments, the radio type (and frequency), the output transmit power settings, the packet size settings, and the antenna height. The first factor we (^1) There are algorithms to find the optimal placement of nodes to get a uniform range of distances in the area of interest. We did not perform that optimization in our experiments. (^2) We have left for future work the evaluation of how accurate is an average reception rate

(a) Asymmetric links vs distance (b) Asymmetric links vs reception rate (c) Temporal variation vs reception rate

Fig. 7. PDFs for Asymmetric Link and Temporal Variability features.

to receiver A, but recently several empirical studies demonstrated this is not the case [1], [6]. Our goal is to quantitatively capture how frequently there is asymmetry in reception rates as a function of dis- tance.

  • Dependency of asymmetric reception rate as a function of reception rate. This property is an ex- ample where we study functional dependencies be- tween two communication properties. Our goal is to identify if it is more likely that high asymmetry hap- pens when links have high, low, or medium reception rates. For example, we are interested if it is more likely to have a pair of nodes with reception rates of 95% and 75%, or with 30% and 10% reception rates.
  • Dependency of reception rate standard deviation as a function of the average reception rate. The final property is an example where we study tempo- ral dependencies between two communication prop- erties. An empirical study [1] has shown that such correlation exists. Our goal is to quantitatively cap- ture this relationship and provide some initial results on how this property affects the link estimation algo- rithms used for online quality estimation. In addition to the listed properties, we also studied link quality dependency on angle, but were not able to iden- tify any statistically interesting patterns with significantly strong intervals of confidence.

In section III, Figure 4 we have illustrated how the re- ception rate changes as a function of distance. The figures in 6 show normalized PDFs for three typical distances for 8.75, 25, and 46.5 meters. These results confirm the find- ings of several studies in the literature that show that there is a significant percentage of the radio range where links are highly variable, with similar probabilities of having very high or low reception rates. In addition, we show that even for very short distances, the probability of hav-

ing very low reception rate links is not zero, and it starts growing fast as distance increases. More importantly, it is clear from the graph in Figure 6(b) that the average and standard deviation values of reception rate are insufficient parameters to model reception rate as a function of dis- tance. While the average reception rate is around 50% in this case, most of the links have either very high or low reception rates. A communication model built using only average and standard deviation values of reception rate may not represent the underlying communication proper- ties found in real environments. Figures 7(a) and 7(b) show the PDF of how asymmet- ric reception rate depends on distance and average recep- tion rate. Figure 7(a) shows that there is no clear correla- tion between link asymmetries and distance. Figure 7(b) shows an interesting pattern; links with very high or very low reception rates tend to be highly symmetrical, as it can be observed by the two peaks in the PDFs. Links with medium reception rate tend to be much less symmetrical. Figure 7(c) shows the temporal variability of the links as a function of the reception rate. We clearly see that links with very low or very high reception rates tend to be more stable over time (smaller standard deviation), while the links with intermediate values of reception rate tend to be more unstable (higher standard deviation). One inter- esting question we wanted to answer is how long a node needs to measure the communication channel in order to get an accurate estimate of reception rate with a certain confidence interval. This has a profound impact in the de- sign of algorithms for topology control that need to mea- sure the channel as little as possible in order to save energy by periodically turning the radio off. To evaluate this, we took long time series of reception rate data, and picked k window sizes. For each window size, we took p (set to 100) initial random points of measurements from the time series, generating a reception rate estimate for each

0

0.6^ 0.

1

1.6^ 1.

2

0 20 40 60 80 100 120

Absolute Difference between Window Estimation and Average Recption Rate Window Size (x 30sec)

(a) High Reception Rate

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120

Absolute Difference between Window Estimation and Average Recption Rate Window Size (x 30sec)

(b) Med. Reception Rate

Fig. 8. Time series for on-line link quality estimation.

p using only a window of size k (ranging from 30 sec- onds to 64 minutes) of data from the starting point. Then we compare the absolute difference between each of the p × k estimates with the absolute reception rate calcu- lated using the entire time series of data. Figure 8 shows the results of the previous analysis on two qualitatively different type of links. Figure 8(a) shows that links with very high reception rate need very short window sizes to get an accurate estimate of the reception rate, and they converge quite fast to an accurate estimate (low reception rate links show similar behavior). Figure 8(b) shows that links with intermediate reception rates take much larger window sizes to converge to accurate estimate values. We have left for future work the issue of optimal on-line link characterization using statistical methods. From the spatial, asymmetrical, and temporal proper- ties presented in Figures 4, 7(b) and 7(c) we can see an interesting pattern that has emerged. For a large range of distances there is a low but non-zero probability of links with medium reception rates. These links are also the ones that present the most highly asymmetrical and temporal variability properties. We believe these links may intro- duce serious stability and convergence problems for sev- eral routing algorithms, and it might be useful to design mechanisms to detect these types of links and filter them out. Another interesting point is that reliable, highly sym- metrical and stable links exist even at very long distances. Online detection and use of these type of links could affect algorithm design.

VI. GROUP LINK PROPERTIES

Group link properties are joint properties of related sub- sets of links. These links may be links that originate from the same transmitter or received by the same receiver, pro- cessed by the same radio, or communicated by nodes that are geometrically close. These properties are of crucial importance for any analysis and answer the frequently asked fundamental questions about reasons for particular behavior of communication patterns. These questions in-

clude whether the performance of a particular node as a transmitter mainly depends on the quality of its radio or its geometric position. Another frequently asked question is whether asymmetry is a consequence of different radio properties between two nodes or their location. However, with the exception of the property which examines pairs of links between two nodes, group link properties have been rarely studied due to their perceived complexity. The first question we answer is to what extent the qual- ity of transmitters and receivers on different nodes is uni- form. We normalized the quality of each link at each node versus the average link quality at the corresponding dis- tance in terms of reception rate. After that, we calculated the geometric mean of all links that originate or end at a particular node. Figures 9(a) and 9(b) show the PDFs for normalized transmitter and receiver quality of nodes in indoor and outdoor environments. We see that a large percentage of nodes are either significantly better transmitters or re- ceivers than average, in particularly for the outdoor envi- ronment. The result is important because if nodes were completely uniform, deployment strategy would be based solely on topology. However, our results show that further communication optimization can be made by considering transmitting/receiving quality. The second question we answer is whether there is cor- relation between the quality of the transmitter and receiver on the same node. Figures 9(c) and 9(d) show a sum- mary of our results. In this case, stronger statistical proof was established when considering the arithmetic mean be- tween the transmitter and receiver correlations. All stud- ies indicate that there is a significant positive correla- tion of transmitting and receiving capability of the nodes. Therefore, a designer could optimize the placement of few very good transmitters and/or receivers in strategic loca- tions (in particular for broadcast and multicast routing) at the cost of fairness (those few nodes may be over utilized). Once we conclude that some nodes are much better transmitters or receivers than other nodes, the natural question is to what extent they are uniformly better trans- mitters or receivers with respect to all their links. In order to answer this question we calculated the correlation be- tween all transmitting (receiving) links related to the same node. Table II shows the correlation value r, the t-test value, the degree of freedom (DofF) which is equal to the number of samples reduced by 2, and the probability that this correlation is accidental. For both indoor and outdoor environments we see essentially very small or no corre- lation with very high probability (the probability of this result being accidental is lower than 0.1% for the indoor case). This essentially means that no node has perfectly

or very low close to -1. The Spearman t-test indicates that all covariance values have probability of accidentally hap- pen well below 0.1%. In other words, group of nodes in a particular distance range can communicate to each other significantly better than other group of nodes in the same distance range. Identification of these groups of nodes could be important for tree-based routing algorithms; it would be convenient that at least one node in each of these groups join the tree since it could communicate better to the other nodes in the group than any other node.

VII. WIRELESS NETWORKS GENERATORS

Using the knowledge gained from analysis of single and multiple link properties, we have built a series of wireless multi-hop network instance generators to be used in sim- ulation environments. In this Section, we present three models, increasing in complexity, which create communi- cation links for an arbitrary network that are statistically similar to observed networks. The basic model assigns communication links based solely on the relationship be- tween reception rate and distance. To build the more complex models, we introduce an iterative improvement- based procedure for creating communication links which abide by multiple link properties. The starting point for all models is the generation of a specified number of nodes in the given area. We allow the user to either specify specific locations for each node or to specify which distribution the placement of the nodes in the given area must follow.

A. Probabilistic Disk

The basic model, probabilistic disk, considers only the dependency between distance and reception rate. It is cre- ated by generating for each calculated distance between two nodes a randomly selected reception rate according to the PDF (Figure 4) for the respected distance. We first translate the PDF into the corresponding cumulative dis- tribution function (CDF) and use a uniform random gen- erator between 0 and 1 to generate a value of CDF. The resulting value is then mapped into the corresponding re- ception rate.

B. Bi-Directional Correlated Probabilistic Disk Model

In this model, we consider two functional dependen- cies. In addition to the dependency between reception rate and distance (Figure 4), we also consider the depen- dency between asymmetric reception rate and reception rate (Figure 7(b)). Our goal is to generate an instance of the network where all communication links follow the PDF for a corresponding distance and for any given pair of nodes, we have a reception rate between transmitter A

and receiver B and transmitter B and receiver A that fol- lows the PDF for asymmetric reception rate. An instance of the network which follows this model is generated in the following way. We first generate for each pair of nodes the reception rate between the transmitter of node A and receiver of node B, where the notation of nodes A and B for a given pair of nodes is randomly conducted. Next, we generate the reception rate of the transmitter of node B and the receiver of node A following the PDF for reception rate into probabilistically selected asymmetric rate using one of the previously mentioned methods. One can prove using Bayesian rules that the network generated using this procedure does follow both PDF functions.

C. Non-parametric Statistical Model

While the the Bayesian rule is powerful enough to gen- erate instances of the network that follow one and in some cases two PDFs, it is easy to see that when a large number of statistical measures must be followed, it does not pro- vide an adequate solution. For example, it is not clear how to simultaneously generate a network which follows PDFs for reception rate versus distance, asymmetric reception versus distance, and non-uniform quality of transmitters and receivers. In order to overcome this difficulty we have developed an iterative improvement-based algorithm that generates an instance of the network that approximately, or arbitrarily closely, follows an arbitrary number of in- teracting PDFs defined on arbitrary pairs of network and communication properties. The key idea is to first sep- arately generate an instance of the network that follows each of the considered PDFs and to randomly select one of them as a starting point for the iterative improvement pro- cedure. At each step of the iterative improvement proce- dure, we attempt all possible changes at all possible pairs of nodes A and B and select one which makes the over- all discrepancy between the parameters of that network more similar to a combination of the originally generated networks that separately considers the PDF of only a sin- gle property. The similar PDF function is defined using standard L 2 measure. The procedure is repeated until no further improvement can be found. In order to improve the quality of the generated network, one can perform restarts or employ probabilistic mechanisms for escaping local minima (e.g. simulated annealing). If a restart is performed, there is the option to probabilistically select one of the final solutions for the restart according to their maximum likelihood expectation. These expectations are generated from the space that contains all networks that follow all the specified PDFs.

TABLE VI COMPARISON OF FOUR STATISTICAL MODELS USING FLOYD-WARSHALL ALL PAIR SHORTEST PATH ALGORITHM.

Unit Unit Real Prob. Prob-Real Assym Assym Real Statistical Statistical-Real MIN 2 2 2.0079 2.0079 2.00188 2.00188 2.00002 2. MAX 26 20.0569 41.881 43.354 45.9964 44.1535 42.99 42. AVE 6.87574 5.78918 14.687 15.002 14.8176 14.6217 14.6991 14.

(a) Empirical Network Data.

(b) Probabilistic Model Generated Network.

(c) Large Network with Empirical Data

Fig. 10. Illustration of three phases for large network generation.

D. Generation of Large Network Instances

Scalability is one of the key issues in wireless sensor networks both during deployment as well as during pro- tocol and algorithm development. In particular for the demonstration of localized algorithms it is important to have instances of the wireless network with a large num- ber of nodes. Unfortunately, it is both expensive and time consuming to deploy large networks solely for the pur- pose of building a model or developing a localized proto- col. Therefore, there is a need to develop a methodology and approach that creates and validates networks of an ar- bitrary size. While their creation is often rather straight- forward, since all that is needed is to place the communi- cation between a pair of nodes according to our statistical model, we can not compare their statistical similarity to existing networks directly. At first glance it seems that empirical evaluation of the accuracy of a large generated network is an infeasible task. However, we have developed a perturbation-based anal-

0

0 20 40 60 80 100 120 140 160 180 200

Total Distribution

Path Weight (weight units)

Asymm 400 Real 400

Fig. 11. Similarity between path weights in large networks.

ysis that facilitates sound statistical validation of networks of an arbitrary size with respect to experimentally avail- able and characterized networks. The key idea is to begin by creating an instance of the network using a specific communication model, and run the algorithm or protocol of interest on the network instance. Next, we replace ran- domly selected subparts of the instance with instances of data from actually deployed networks. Figure 10 illus- trates how the replacements is conducted. Figure 10(a) shows small actually deployed networks. For the sake of clarity we show only the area in which nodes are deployed and represent each node by a dot. Figure 10(b) shows an instance generated by one of the models using the same format. Finally, Figure 10(c) presents the perturbation- based compound instance ready for evaluation. The last step of the procedure is to generate, using the selected statistical model, the connectivity between nodes in the patches of the real networks and the neighboring nodes from the generator. After the procedure is completed, we compare the initial and perturbed networks with respect to results they produce on a task of interests. The extent to which the results are similar, the large instance is rep- resentative of the real-life networks. Figure 11 illustrates the similarity in terms of all-pairs shortest path between two large network examples of 400 nodes built using the asymmetric link model. We compared using the perturbation-based method four models: unit disk model, probabilistic disk model, asym- metric probabilistic disk model, and a non-parametric sta- tistical model. For this purpose we compare the length of all-pairs shortest paths for an instance with 400 nodes. Ta-