Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Random Data: Self-Similarity and Exponential Arrivals, Slides of Computer Science

The concept of random data through a simulation example, focusing on self-similarity and exponential arrivals. It discusses the differences in results for various processes and compares them with theoretical distributions. The document also covers data generation and the importance of inverse functions.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

divyaa
divyaa 🇮🇳

4.4

(59)

71 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Performance
Engineering
Looking at Random Data &
A Simulation Example
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Understanding Random Data: Self-Similarity and Exponential Arrivals and more Slides Computer Science in PDF only on Docsity!

Performance

Engineering

Looking at Random Data &

A Simulation Example

Goals:

1. Look at the nature of random data. What happens as

random data is used in multiple operations?

2. Look at how network arrivals really work – are arrivals

random or do they follow some other pattern?

3. Use our simulation techniques to study these patterns

(so this is really an example of simulation usage).

4. Determine the difference in behavior as a result of

network arrival patterns.

Random Data

1. Let’s take a very simple piece of code:

if ( random() >= 0.5 )
HeadsGreaterThanTails++;
else
HeadsGreaterThanTails--;

2. When we run the program, we collect the value of the variable every

100 million iterations – and do it for a total of 1 billion iterations.

3. Here’s a sample run.

Iterations Proc 0 100,000,000 - 200,000,000 - 300,000,000 5141 400,000,000 3197

500,000,000 - 600,000,000 - 700,000,000 - 800,000,000 - 900,000,000 - 1,000,000,000 -

After 400 million iterations, there were 3192 more “heads” than “tails”.

Random Data

1. Now lets do that same thing for 8 processes

2. What do you think will happen to the numbers?

  • Will some process always have more heads than tails?
  • Will the difference between results for processes depend on how many
iterations have been done?

3. Here’s the result for 8 processes:

Iterations Proc 0 Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7 100,000,000 -10299 -9319 -1063 6743 8633 -4421 8123 - 200,000,000 -4245 -10227 3657 -23059 24885 -26655 25865 - 300,000,000 5141 -6819 255 -20175 14469 -33389 27077 -

400,000,000 3197 -8155 -5379 -6633 27387 -50509 24531 2339 500,000,000 -1313 -10547 -153 -14679 29335 -51963 23097 - 600,000,000 -25941 -29847 -26371 5027 32857 -49505 27089 - 700,000,000 -24093 -26331 -43401 13153 24471 -26899 4561 -

800,000,000 -24661 -35315 -31233 41 20425 -11861 13837 - 900,000,000 -27123 -33049 -44461 -11769 -3283 -12477 15865 -

1,000,000,000 -23997 -15483 -44535 22889 -8447 -13671 15743 6023

Random Data

As you can see in the last graph, the statistics are terrible – it’s hard to

determine the pattern for multiple runs.

So the program was run 10,000 times. And the minimum and maximum

count was taken at each time interval for those 10,000 runs.

The Max and Min values in All Runs

0

50000

100000

150000

200000

250000

0 200000000 400000000 600000000 800000000 1000000000 1200000000 Iterations

Max/Min + 125000

Min of all runs Max of all runs

Random Data

But, what happens if the processes doing random events interact with each
other?
This is the case if the programs are all accessing the same disk – we randomly
choose which block in a large file is being written to. But each process
must compete for the file lock and for disk access.
Here’s the behavior of 10 disk-writing processes for 10,000 seconds. The
numbers represent disk writes for that process during the time interval.

Secs Proc 0 Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7 Proc 8 Proc 9

1000 21660 21650 21810 21800 21790 21720 21850 21740 21640 21730

2000 43000 42960 43080 43120 43220 42960 43190 43110 42900 43080

3000 64790 64650 64850 64930 65060 64680 64900 64860 64770 64940

4000 86610 86450 86620 86680 86750 86530 86640 86660 86560 86690

5000 108450 108280 108370 108450 108520 108410 108480 108380 108400 108580

6000 130010 129860 129990 129950 129980 130050 130090 130010 129910 130080

7000 151730 151600 151710 151730 151730 151770 151750 151820 151750 151800

8000 173340 173340 173400 173640 173480 173400 173520 173660 173470 173500

9000 194950 195050 195010 195300 195090 195000 195230 195440 195130 195150

10000 216760 216880 216780 217140 216860 216740 216990 217240 216880 216960

Random Data

Comparing the 10 processes. This is the spread (difference) of the maximum
less the minimum accesses for the process.

Disk Access Rates With Time - It is (Max Access - Min Access)

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 Time (Seconds)

Difference In Accesses

Random Data

Comparing the 10 processes. Here’s how their relative performance varies over
time. Note that no one process is always the minimum or the maximum
performer.

Process Writes - How they deviate from the minimum value

0

100

200

300

400

500

600

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Tim e (Seconds)

Process Writes compared to minimum process

Proc 0 Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7 Proc 8 Proc 9

Another Numerical Example

X
X
A A
B B

Another Numerical Example

// ////////////////////////////////////////////////////////////////////// // We're trying to solve the following problem. // Given two circles, how close should the centers of the circles be such // that the area subtended by the arcs of the two circles is exactly one // half the total area of the circle. // // See example 2.3.8 in Leemis & Park. // We use the book's definition for Uniform - see 2.3. // Here's how this works. Try a number of different distances between // the two circle centers. Then for the ones that are most successful, // zoom in to do them in more detail. // ////////////////////////////////////////////////////////////////////// #include <math.h> #include <stdlib.h> #define PI 3. #define TRUE 1 #define FALSE 0

// Prototypes double GetRandomNumber( void ); void InitializeRandomNumber( ); double ModelTwoCircles( double, int ); double Uniform( double min, double max) { return( min + (max - min)*GetRandomNumber() ); }

double ModelTwoCircles ( double Distance, int NumberOfSamples ) { double HitsInOneCircle = 0, HitsInTwoCircles = 0; double x, y, SecondDistance; int Samples; for ( Samples = 0; Samples < NumberOfSamples; Samples++ ) { do { x = Uniform( -1, 1 ); y = Uniform( -1, 1 ); } while ( (x * x) + (y * y) >= 1 ); // Loop until value in circle

HitsInOneCircle++; SecondDistance = sqrt( ( x - Distance ) * (x - Distance ) + (y * y) ); if ( SecondDistance < 1.0 ) { HitsInTwoCircles++; // printf( "Samples: Second Distance = %8.6f\n", SecondDistance ); } } // End of for return( HitsInTwoCircles / HitsInOneCircle ); }

Network Arrivals

1. In our queueing analysis, we’ve assumed random arrivals (Poisson

distribution, with exponentially distributed inter-arrival times.)

2. This leads to our analysis of M/M/1 queues with

  • Utilization = Service Time/Arrival Time and with
  • Queue Length = U / ( 1 – U ).

3. We generated uniformly distributed random numbers and based on

those were able to derive the exponential arrival times and Poisson

distributions.

But is this how networks behave?

Random Arrivals

What Did Leland et. al Measure?

Millions of packets from many workstations, as recorded on Bellcore internal networks.

What Did Leland et.al Measure?

Significance of self-similarity

  • Nature of traffic generated by individual Ethernet users. Aggregate

traffic study provides insights into traffic generated by individual

users. Nature of congestion produced by self-similar models differs

drastically from that predicted by standard formal models. We will

show this by the simulation we perform here.

Why is Ethernet traffic self-similar?

  • Plausible physical explanation of self similarity in Ethernet traffic.

(People don’t generate traffic randomly. They come to work at the

same time, get tired at the same time, etc.)

Mathematical Result

  • Superposition of many ON/OFF sources whose ON-periods and OFF-

periods have high variability or infinite variance produces aggregate

network traffic that is self-similar or long range independent.

(Infinite variance here means that there are some samples with a very

long inter-arrival time (lunch hour is a very long time!)