Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Nearly-Sortedness Test and Its Applications in Databases, Study notes of History of Education

The concept of nearly-sorted data in databases and presents a test for nearly-sortedness, as well as efficient sorting algorithms for nearly-sorted data. The authors argue that these algorithms can improve query processing efficiency in databases. The document also includes experimental results demonstrating the effectiveness of the proposed methods.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-nmh-1
koofers-user-nmh-1 🇺🇸

5

(1)

10 documents

1 / 40

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Tester for Nearly-Sortedness
and its Applications in Databases
Arie Matsliah
Sagi Ben-Moshe
Eldar Fischer
Yaron Kanza
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28

Partial preview of the text

Download Nearly-Sortedness Test and Its Applications in Databases and more Study notes History of Education in PDF only on Docsity!

Tester for Nearly-Sortedness

and its Applications in Databases

Arie Matsliah

Sagi Ben-Moshe Eldar Fischer Yaron Kanza

Outline

  • New definition: “nearly-sorted”
  • Tolerant tests for the property of being

nearly-sorted

  • Applications in Databases

Example (data)

apartments for rent

Address Fee ($) Size (m^2 ) … Main rd 29, … 500 70 … First ave 3, … 850 120 … … …

records

attributes

Example (query)

query

select all apts. in NY, having size between 70 and 80 m^2 , present sorted by monthly fee

Facts

  • Some queries/operations can be processed more efficiently if the data is ordered (sorted acd. to some attribute)
  • Additional examples: natural join, intersection, union, except, …
  • However, in many cases processor cannot assume the data is ordered

Observation (from experiments)

  • Monitoring the “sort” function of a DB-management system (PostgreSQL)
  • In many cases, even before sorting the data is “nearly sorted”
  • Idea:
    1. test whether the data is “nearly sorted”
    2. if it is – use sorting algorithm that is tailored for nearly-sorted data

Definition: Nearly-Sorted

  • f:[n]R
  • R – attribute values, total order (<,≤,>,≥)
  • f is sorted if for all i<j, f(i)≤f(j)
  • f is k-sorted if for all i,j: i≤j-k  f(i)≤f(j)
  • 1-sorted  sorted
  • f is e-close to being sorted if for some E[n], |E|≤en: f|[n]\E is sorted
  • f is (e,k)-nearly-sorted if for some E[n], |E|≤en: f|[n]\E is k-sorted

Example 1

  • 1/n-close (e=1/n), n-sorted (k=n)

i

f(i)

Example 3

  • (1/5,2)-nearly-sorted

i

f(i)

Next

  • (Tolerant) Test for nearly-sortedness
  • Algorithm for sorting nearly-sorted

functions

  • Experiments
  • (i,j) is a k-violation if i≤j-k and f(i)>f(j)

k-Violations and (δ,k)-Active Indices

50 …^5

k

  • (i,j) is a k-violation if i≤j-k and f(i)>f(j)
  • i is (δ,k)-active if for ≥δ(j-i) indices h[i,j], (i,h) is a k-violation

k-Violations and (δ,k)-Active Indices

50 …^5

k

δ(j-i) values < 50

  • either i or j must be (½-k/(j-i),k)-active

 if f is not (e,k)-nearly sorted,

(½-k/(j-i),k)-actives ≥ en

k-Violations and (δ,k)-Active Indices

50 …^5

k

(j-i)-2k

k

Towards tolerant testing based on [ACCL]

Lemma

  • if f is (e,k)-nearly sorted then

(1/4,k)-actives ≤ 5 en

  • if f is not (6e,6k)-nearly sorted then

(1/3,k)-actives ≥ 6 en