Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Fault-Tolerance in Borealis Distributed Stream Processing System: Lecture 15 - Prof. David, Study notes of Computer Science

A lecture note from cs 410/510 data streams course focusing on fault-tolerance in the borealis distributed stream processing system. The note covers topics such as fault-tolerant stream processing, replication, availability vs. Tentative tuples, and techniques like checkpoint/redo and undo/redo. It also discusses the role of sunion in handling ordering problems and ensuring consistency.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-a6g
koofers-user-a6g 🇺🇸

10 documents

1 / 54

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
11/15/2007 Data Streams: Lecture 15 1
CS 410/510
Data Streams
Lecture 15: Fault-Tolerance in the
Borealis Distributed Stream
Processing System
Kristin Tufte, David Maier
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36

Partial preview of the text

Download Fault-Tolerance in Borealis Distributed Stream Processing System: Lecture 15 - Prof. David and more Study notes Computer Science in PDF only on Docsity!

11/15/ Data Streams: Lecture 15

CS 410/510 Data Streams Lecture 15: Fault-Tolerance in the Borealis Distributed Stream Processing System

Kristin Tufte, David Maier

11/15/ Data Streams: Lecture 15

Borealis Fault-Tolerance „

Fault-tolerant stream processing

Configurable trade-off betweenavailability and consistency

Make progress even when some inputs areunavailable

Handle node failures, network failures,network partitions

Use replication – run multiple copies of thequery network on distinct nodes

11/15/ Data Streams: Lecture 15

Availability vs. Tentative Tuples „

Guarantee input data is processed and resultsproduced within a user-specified threshold

„

Minimize number of tentative tuples processed toavoid overhead of processing tentative tuples

„

Change threshold to tradeoff betweenavailability (threshold) vs. consistency (#tentative tuples)

„

After failure heals, re-run computation oncorrect input streams

„

Ensure eventual consistency

11/15/ Data Streams: Lecture 15 Example Distributed Query Network Figure courtesy: Fault-Tolerance in the BorealisDistributed Stream Processing System, Balazinska et al.

11/15/ Data Streams: Lecture 15

Failure Assumptions „

Handles ‰

Node failures (e.g software crashes)

‰

Network failures

‰

Network partitions

„

N replicas - handles up to N-1 node failures

„

Sources and clients (not just Aurora system)must implement fault-tolerance protocols ‰

Data sources or proxies log input tuples persistently -ensures all replicas eventually see the same input tuples

„

Designed for low level of replication and lowfailure frequency

11/15/ Data Streams: Lecture 15

Design Goals „

Ensure for each node, any data tuple on aninput stream is processed within aspecified time bound (regardless offailures on other input streams)

Try to produce the fewest tentative tuples

As long as non-blocking path of operatorsis available from source->client, client willreceive results

Once failures heal, replicas converge toconsistent state

11/15/ Data Streams: Lecture 15

State Diagram

Figure courtesy: Fault-Tolerance in the BorealisDistributed Stream Processing System, Balazinska et al.

11/15/ Data Streams: Lecture 15 Choices for UPSTREAM-FAILURE „

Suspend processing until failure heals andnode starts receiving stable data ‰

Favors consistency – does not producetentative tuples, used only for short failures

Delay new tuples for a short period of timebefore processing

Process each new tuple without any delay „

Last two options produce tentative tuples

11/15/ Data Streams: Lecture 15

Borealis Tuples „

(tuple_type, tuple_id, tuple_time, a

, … a

m

tuple_type – indicates the type of thetuple

tuple_id – uniquely identifies the tuple inthe stream

tuple-time – tuple timestamp

Paper adds two new types of tuples:TENTATIVE and UNDO

11/15/ Data Streams: Lecture 15

New Tuple Types „

TENTATIVE Tuple ‰

Tuple produced while processing a portion ofinputs, may subsequently be replaced with astable tuple

UNDO Tuple ‰

Indicates that a suffix of tuples on a streamshould be deleted and associated state rolledback

‰

Contains tuple_id of last tuple not to be undone

11/15/ Data Streams: Lecture 15

New Data Stream Tuple Types Tuple Type

Description

STABLE

Regular Tuple TENTATIVE Tuple that results fromprocessing a subset of inputsand may be corrected later UNDO Suffix of tuples should be rolledback BOUNDARY All following tuples will have atimestamp

the one indicated UNDO_START Control message from runtime toSUnion to trigger undo-basedrecovery REC_DONE Indicates end of reconciliation

11/15/ Data Streams: Lecture 15

New Control Stream Tuple Types Tuple Type

Description

UP_FAILURE

Entering inconsistent state REC_DONE Input was corrected, canreconcile state Signals from SUnion (are these control stream tuple types signals fromSUnion – is that what is meant?)

11/15/ Data Streams: Lecture 15

Query Plan with Replicas

source Node 1

J

A

O

A source Node 2

J

B

O

B „

J

A

and J

B

are joins, O

A

and O

B

are other

(unary) ops

J

B

is a replica of J

A

, O

B

is a replica of O

A

J

A

and J

B

must process data in same order

11/15/ Data Streams: Lecture 15

Tuple Ordering Problem

„

Joins may produce different results if theyreceive inputs in different orders ‰

Many stream join algorithms assume input streams areordered and synchronized

‰

A tuple with timestamp t always arrives before tupleswith timestamps > t (on either input stream)

„

Two types of ordering problems ‰

Disorder in stream (i.e stream produced by op)

‰

Streams are interleaved differently (by differentoperator replicas)

‰

TCP connection can guarantee order on a stream, butcan’t guarantee synchronization of two streams