Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Implementing a Domain Model for Data Structures, Exercises of Data Structures and Algorithms

Austin, Texas 78712. Abstract: We present a model of the data structure domain that is expressed in terms of the GenVoca domain modeling concepts [Bat91].

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

anoushka
anoushka 🇺🇸

4.1

(15)

241 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Implementing a Domain Model for Data Structures1,2
Don Batory, Vivek Singhal, and Marty Sirkin
Department of Computer Sciences
The University of Texas
Austin, Texas 78712
Abstract: We present a model of the data structure domain that is expressed in terms of
the GenVoca domain modeling concepts [Bat91]. We show how familiar data structures
can be encapsulated as realms of plug-compatible, symmetric, and reusable components,
and we show how complex data structures can be formed from their composition. The tar-
get application of our research is a precompiler for specifying and generating customized
data structures.
Keywords: software building-blocks, domain modeling, software reuse, data structures.
1.0 Introduction
A fundamental goal of software engineering is to understand how software components fit together to
form complex systems. Domain modeling is a means to achieve this goal; it is the study of a domain of
similar software systems to identify the primitive and reusable components of that domain and to show
how compositions of components not only explain existing systems but also predict families of yet unbuilt
systems that have interesting and novel properties. In essence, domain models can be blue-prints for the
as-yet-to-be-achieved software building-block technologies.
Domain modeling is presently an immature discipline [Pri91]. Besides the general skepticism that
characteristically accompanies new areas of research, domain modelers face three difficult barriers to the
development and popularization of their ideas:
Domain models must be expressed in terms of constructs and software organization principles that are
domain-independent. Domain-specific constructs are, by definition, not applicable to other domains.
Models based exclusively on domain-specific ideas are both difficult to understand and contribute little
to helping other researchers understand how other domains can be modeled.
In order to understand a domain model, one must be intimately familiar with the domain itself. It is
often hard for non-experts (and even experts) to appreciate the difficulty of a domain modeling effort
and its contributions.
Domain models are nontrivial. One cannot accept a domain model on face value; it is essential that
there be an accompanying implementation for model validation. Unfortunately, because of the time and
expense involved, few models are validated. Without validation, however, the significance of a model is
questionable.
In this paper, we attempt to cross all three barriers (or alternatively, we show why each is hard to
cross!). First, we review the GenVoca domain modeling concepts, which have been validated on the com-
1. This research was supported in part by grants from Texas Instruments and Digital Equipment Corporation.
2. This paper also appeared in International Journal of Software Engineering and Knowledge Engineering, 2(3):375-
402, Septermber 1992.
International Journal of Software Engineering and Knowledge Engineering 2(3):375-402, September 1992.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Implementing a Domain Model for Data Structures and more Exercises Data Structures and Algorithms in PDF only on Docsity!

Implementing a Domain Model for Data Structures

Don Batory, Vivek Singhal, and Marty Sirkin

Department of Computer Sciences

The University of Texas

Austin, Texas 78712

Abstract: We present a model of the data structure domain that is expressed in terms of the GenVoca domain modeling concepts [Bat91]. We show how familiar data structures can be encapsulated as realms of plug-compatible, symmetric, and reusable components, and we show how complex data structures can be formed from their composition. The tar- get application of our research is a precompiler for specifying and generating customized data structures.

Keywords: software building-blocks, domain modeling, software reuse, data structures.

1.0 Introduction

A fundamental goal of software engineering is to understand how software components fit together to form complex systems. Domain modeling is a means to achieve this goal; it is the study of a domain of similar software systems to identify the primitive and reusable components of that domain and to show how compositions of components not only explain existing systems but also predict families of yet unbuilt systems that have interesting and novel properties. In essence, domain models can be blue-prints for the as-yet-to-be-achieved software building-block technologies.

Domain modeling is presently an immature discipline [Pri91]. Besides the general skepticism that characteristically accompanies new areas of research, domain modelers face three difficult barriers to the development and popularization of their ideas:

- Domain models must be expressed in terms of constructs and software organization principles that are domain-independent. Domain-specific constructs are, by definition, not applicable to other domains. Models based exclusively on domain-specific ideas are both difficult to understand and contribute little to helping other researchers understand how other domains can be modeled. - In order to understand a domain model, one must be intimately familiar with the domain itself. It is often hard for non-experts (and even experts) to appreciate the difficulty of a domain modeling effort and its contributions. - Domain models are nontrivial. One cannot accept a domain model on face value; it is essential that there be an accompanying implementation for model validation. Unfortunately, because of the time and expense involved, few models are validated. Without validation, however, the significance of a model is questionable. In this paper, we attempt to cross all three barriers (or alternatively, we show why each is hard to cross!). First, we review the GenVoca domain modeling concepts, which have been validated on the com-

  1. This research was supported in part by grants from Texas Instruments and Digital Equipment Corporation.
  2. This paper also appeared in International Journal of Software Engineering and Knowledge Engineering , 2(3):375- 402, Septermber 1992.

plex and disparate domains of database and network software [Bat91]^3 Next, we apply these concepts to develop a model of data structures, a domain that should be familiar to all readers. Finally, we explain how we are validating our domain model by explaining the mechanics of a precompiler for data structures. As we are covering a broad sweep of issues, we present in the introduction of every section the “big picture” of what we are about to do.

We believe our research makes two contributions. First, our domain model outlines the beginnings of a technology for synthesizing customized object base software from prewritten building-blocks. Persis- tency is but one of many options (i.e., building-blocks) that can be selected. Our paper will stress the non- persistent (i.e., data structure) aspects of this technology. Second, we see our work as an example of the types of activities and problems that are commonly encountered in a domain modeling effort. Other researchers may benefit in their domain modeling efforts by following our approach.

2.0 The GenVoca Domain-Modeling Concepts

As mentioned earlier, it is important for domain models to be expressed in terms of domain-indepen- dent concepts. Object-oriented software design models, as examples, offer a collection of concepts (e.g., entities, attributes, relationships, and inheritance) that software engineers can use to express the designs of their systems. These concepts are domain-independent as they have been shown to be applicable to a wide-variety of problems in very different domains [Rum91, Boo87, Teo86]. The ER model is a common graphical representation of object-oriented concepts [Che76, Kor91]. We will use the ER model as the vehicle for discussing OO concepts in this paper.

The concepts of the ER model are necessary, but not sufficient, to explain important kinds of software components and to express hierarchical software systems as compositions of these components. To do so requires additional concepts that transcend traditional ER/object-oriented models. We set the stage for introducing these concepts in Section 2.1 where we explain the relationship of ER/OO models with hierar- chical system designs. We then introduce the concepts of components, realms, and type expressions for modeling hierarchical systems in Section 2.2. We conclude the section by explaining how our notion of parameterized components is different than traditional notions of parameterized types.

2.1 Hierarchical Software Systems and Domain Modeling

Hierarchical software systems are designed in terms of levels , where the interface of each level is a vir- tual machine. All operations on level i+1 are expressed in terms of operations of level i. A layer is the mapping of operations between levels. The idea is to localize or encapsulate specific complexities of a system within layers, thereby simplifying overall system design [Dij68, Hab76].

Object-oriented programming languages and design methodologies have shown that interface specifi- cations should encompass both objects and operations, rather than operations alone. An object model or object-oriented virtual machine (OOVM) is, in general, a set of classes and their interrelationships. ER diagrams depict object models using boxes to represent classes and arrows to represent relationships between classes. Figure 1 shows two object models: Model R exhibits classes A, B, and C with relation- ships A-C and B-C; Model S exhibits classes D and E with relationship D-E.

A hierarchical system design, from an object-oriented perspective, is a set of object-oriented virtual machines, one for each level of a system. Figure 2 shows the object model interfaces for levels i and i+ in a hierarchical system; the interface for level i+1 is object model R and object model S is the interface for level i.

  1. GenVoca is the model that resulted from the merging of the design philosophies of two different domain-specific software system generators, Genesis and Avoca.

Because every component within a realm exports exactly the same interface, all components are plug- compatible and interchangeable. Moreover, components may have parameters. Consider component f[x:Realm_S]. f[ ] exports interface R (because f[ ] belongs to Realm_R) and imports interface S (because f[ ] has parameter x:Realm_S). Moreover, f[ ] translates operations and objects of OOVM R to objects and operations of OOVM S without depending on how OOVM S is implemented. How OOVM S is implemented is specified by parameter x.

Two useful results follow. First, a software system of potentially enormous complexity is represented as a type expression (i.e., a composition of components). The following three systems present OOVM R as their interface:

system1 = f[r] system2 = f[t] system3 = g[t]

The above syntax was borrowed from the parameterized type notation in [Gog84, Gog86]. While the syn- tax is similar, we will show in Section 2.3 that the semantics of our components are rather different.

Second, component reuse corresponds to common subexpressions. Whenever different systems (dif- ferent expressions) reference a common subsystem (subexpression), then that subsystem (and its compo- nents) is being reused. Thus, system1 and system2 reuse component f[], and system2 and system3 reuse component t.

A fundamental concept of GenVoca is symmetric components, i.e., components that can be composed in arbitrary orders. More precisely, a symmetric component of realm W has a parameter of type W. Compo- nents m and n of realm W below are symmetric, while components p and q are not:

W = { m[x:W], n[x:W], p, q[y:R], ...} Since m[ ] and n[ ] are symmetric, compositions m[n[...]] and n[m[...]] are possible. As a general rule, different composition orders have different meanings. While symmetric components seem strange, readers are probably familiar with a classical example of symmetric components: UNIX file filters. File filters present the same byte-stream interface for both their input and output; composing UNIX file fil- ters together in a pipe is an example of composing symmetric components. As in the general case, the order in which one composes file filters (symmetric components) makes a difference in both semantics and per- formance.

The concepts of type expressions and realms of plug-compatible and parameterized components are not part of the ER/OO models. These concepts are domain-independent extensions to these models and have been validated in two rather different domains: databases and network software [Bat85-91, Oma90, Hut91]. The domain models for both databases and networks had their own sets of realms of interchange- able components. As this paper develops, we will show that the domain of data structures has a similar organization.

2.3 On the Relationship of Components to Parameterized Types

Our type expression notation was borrowed from research on parameterized types [Gog84]. However, the semantics of the components that we examine in this paper are very different than traditional constructs (e.g., templates in C++ [Str91], ADA generics [Ghe82], OBJ sorts [Gog86]).

Common to all parameterized types (both ours and traditional) is the concept of a container , i.e., a set of generic objects. Stacks, lists, binary trees, queues, etc. are specialized containers whose algorithms are defined independently of the types of objects that are to be stored.

Traditional. Traditional parameterized types take the container idea a step further. An instance of a traditional parameterized type (TPT) (e.g., stack, queue) is itself an object (i.e., a container). The essential idea is that a TPT maps a container of objects to a single object.

Consider the LIST[] and BINARY_TREE[] parameterized types. An instance of LIST[T] is a single object (a list) that represents a list of objects of type T. An instance of BINARY_TREE[LIST[T]] is a single object (a binary tree), where each tree node references a list object (which itself is a container of objects of type T). Figure 3 shows how a container of three objects of type T, namely { t1, t2, t3 }, is mapped by BINARY_TREE[LIST[T]] to a binary tree with a single node. (We depict containers whose implementation is partially specified as a set of objects/records within an enclosed boundary - i.e., as in Figure 3a. When a container implementation is fully specified, the container’s objects are drawn without an enclosed boundary, as in Figure 3b).

Nontraditional. We propose different semantics for parameterization; a nontraditional parameterized type (NPT) maps a container of abstract objects to one or more containers of concrete objects. The con- tainer abstraction and interface remain unchanged by the mapping.

Consider the LIST’[] and BINARY_TREE’[] NPTs. LIST’[T] maps the objects {t 1 ... tn} of container T to the set {(t 1 , next 1 ) ... (tn, nextn)}, where each nexti is a pointer to the object (ti+1, nexti+1). BINARY_TREE’[T] maps the objects {t 1 ... tn} of container T to the set {(left 1 , t 1 , right 1 ) ... (leftn, tn, rightn)}, where lefti and righti are pointers to the left and right tree objects of ti.

BINARY_TREE’[LIST’[T]] maps a container of T objects to a container of 4-tuples of the form (lefti, ti, nexti, righti) (see Figure 4a). Figure 4b shows how the container of Figure 3a is mapped by BINARY_TREE'[LIST’[T]] to a container of objects that are independently interconnected by both a binary tree and list. Note that the resulting container can be implemented in a variety of ways (e.g., the 4- tuples could be stored by the LIST[] or BINARY_TREE[] TPTs). It is worth noting that the situation

t

t

t

binary_tree

(a) BINARY_TREE[LIST[T]] Mapping

list

binary_tree

t1 t2 t

(b) Resulting Data Structure

Figure 3: Traditional Parameterized Type Mappings

Two different containers C1 and C2 of T objects would be declared as instances of a C++ template [Str91]:

list_container C1, C2; That is, the objects of both C1 and C2 are stored in list_container data structures, which are parameterized by the type T of objects that they store. Containers are first-class objects and may them- selves be stored in containers. Recalling the example of Figure 4, the container Directory is a binary tree container of list containers:

binary_tree_container<list_container> Directory; A run-time mechanism called an iterator or cursor [Ghe82, Kor91, ACM91] is used to enumerate subsets of objects in a container.. Cursors for the above containers are declared by:

cursor<list_container> c1(C1), c2(C2); cursor<binary_tree_container<list_container>> c3(Directory); That is, cursors c1 and c2 are of the same type; both traverse containers of type list_contain- er. Furthermore, cursor c1 is bound to container C1 and cursor c2 is bound to container C2. Cursor c3 has a similar interpretation. We'll see in the next section that programming with cursors is straightfor- ward.

3.1.1 Programming with Cursors

In general, operations on objects inside a container must be performed through cursors. In order to update an attribute of an object, for example, one must first position a cursor on the object before the update operation is issued. Figure 6 summarizes operations that can be performed on a cursor c. This list is not intended to be comprehensive; it represents our first attempt to define a standardized interface for con- tainers.

  1. Recall that a class variable is effectively a shared attribute; there is only one copy of each class variable and all instances of the class share that copy [Str86].

Figure 5: A General Definition of a C++ Class

class T { private: // private class variables, functions and instance variables ... public: // public instance variables and functions T1 A1; // T1 is the type for attribute A T2 A2; ... Tn An; T (); // constructor ~T (); // destructor ... // other public functions };

To make our discussions concrete, consider the emp class and a container EMP:

The following program fragments illustrate how cursors are used.

Example 1. Insert an object for ‘Batory’ into container EMP.

cursor<Emp_Container> e(EMP); // constructor # e.newobj(); // create new EMP object e.name = "Batory"; // fill in attribute values e.age = 38; e.city = "Austin"; e.dept = "Software";

Figure 6: Operations on Cursors

cursor c(C) constructor #1 for a cursor c on objects in container C of type K. cursor c(C, predi- cate)

constructor #2 for a cursor c on objects in container C of type K. c ranges only over objects that satisfy predicate. ~cursor() destroy a cursor. c.reset() c is positioned prior to the first object that satisfies predicate. c++ advance cursor c to the next object of container C that satisfies the cur- sor predicate. c.status = EOR (end of retrieval) or OK, depend- ing if a next object was found. c.status returns the status OK or EOR of the last advance (++) operation on c. c.ref(Ai) attribute Ai of the object referenced by c. c.Ai syntactic sugar for c.ref(Ai). c.newobj() create a new object in the container referenced by cursor c; c is posi- tioned over the new object. c.del() delete the object referenced by c. c.oid the identifier of the object referenced by c. c.pos(oid) reposition cursor c over an object whose identifier is oid. c.upd(Ai, value) update attribute Ai of the object referenced by c to be value. c.Ai = value syntactic sugar for c.upd(Ai, value).

class emp { public: char name[12]; int age; char city[15]; char department[25]; // emp operations ... }; typedef list_container Emp_Container; Emp_Container EMP;

element instances. That is, all operations on objects within a container are now accomplished through operations on cursors (i.e., newobj is called for object creation instead of the T constructor T(), and so on).

We emphasize that that the definitions in Figure 8 are not in their final form. When data structures are viewed as a composition of components, each component may transform each of these class definitions by adding or removing attributes and possibly creating new container, cursor, and element types. As none of these transformations are particularly obvious, we need to consider specific examples of data structure components to show that these class definitions are indeed modified. This is one of the subjects of the next section.

Figure 8: Definition of Class T and its Cursor Definition

template class container { // cursors are friends friend class cursor <container>;

private: // nothing initially

public: container (); // constructor ~container (); // destructor (a) Definition of container template class element { // cursors on element are friends friend class container;

private: // private class variables, // functions, & instance variables ...

// formerly public instance // variables and functions T1 A1; T2 A2; ... Tn An;

element (); // constructor ~element (); // destructor ... // user-defined functions

// no public variables, functions }; (b) Definition of element

template class cursor // parameterized by container { // cursors have access to private // members of their containers and // container elements // note K = container friend class K; friend class element;

public: // instance variables K *k; // the container enum {OK, EOR} status; // cursor status

// public cursor operations cursor (container); // constructor 1 cursor (container, predicate); // constructor 2 ~cursor (); // destructor newobj (); // new instance del (); // delete instance operator ++ (); // advance cursor pos (oid); // position cursor upd (attr, value); // update instance

// operations to reference instance // variables of elements in container A1; // attribute A A2; // attribute A ... An; // attribute An

// previously user-define function ... }; (c) Definition of cursor

3.2 The DS Realm

Let DS be the realm of components that implement our container OOVM of Figure 7 and Figure 8. Each DS component encapsulates a primitive data structure. Clearly there is an enormous number of such components; listing them all is impossible. Rather than enumerating realm membership, we will review specific examples and discuss the basic paradigm that accounts for a wide spectrum of useful data struc- tures. Having done so, it is a simple intuitive leap to recognize the scope of DS realm. The specific mem- bers of DS that we will examine are:

DS = { DEL_FLAG[x:DS], ARRAY, MALLOC, DLIST[x:DS], BINTREE[x:DS], SEGMENT[p,s:DS], INDEX[d,i:DS], ... } The ‘big picture’ that we want to convey in this section is that DS components are NPTs. Composi- tions of DS components implement TPTs. The container types list_container and bina- ry_tree_container that we encountered in Section 3.1 are TPTs that can be constructed from DS components. In general, users can specify their own container types using the container_def state- ment, which defines TPTs as a composition of NPTs (DS components). Three examples are shown below:

container_def Simple_Array = DEL_FLAG[ARRAY]; container_def list_container = DLIST[MALLOC]; container_def binary_tree_container = BINTREE[DEL_FLAG[ARRAY]]; Of the three container types, Simple_Array is the most elementary. We will first examine a mono- lithic implementation (i.e., one without layering) of Simple_Array as implementations of the ele- ment<>, container<>, and cursor<> classes. Then we will show how this implementation is actually a composition of the two independently definable components, DEL_FLAG[] and ARRAY.

3.2.1 A Close Look At A Simple (But Composite) Data Structure: Simple_Array

Suppose a container is implemented by a preallocated array. Every object that it contains is augmented with a boolean attribute DF (delete flag). When an object is inserted, it is placed at the end of the array and its delete flag is set to false. When an object is deleted, its delete flag is set to true and the space the object occupies is not reclaimed. When a user requests the retrieval of all objects that satisfy a predicate P, the predicate that is actually applied to objects is (P ∧ ¬DF). (That is, retrieved objects must both satisfy the user's request and must not be deleted.) The figure below shows both the abstract and concrete repre- sentations of a small container of objects. Note that two of the concrete objects have been deleted.

Recall that we said in Section 3.1.2 that the definitions of the container<>, element<>, and cursor<> classes are transformed by container implementations. Simple_Array specializes the container<> class definition with the additional attributes array (which is the actual array) and

o

o

o

abstract representation

false true true false false ...

o o o o o ...

concrete representation (^) array

tations of a small set of objects. Note that two of the concrete objects have been logically deleted. The DEL_FLAG component is parameterized (x:DS) because its abstract-to-concrete mappings are not in any way dependent on how its concrete objects are stored.

Figure 10 shows the class specializations and methods of DEL_FLAG. Note that we have taken two liberties with our C++ notation. Abstract functions have the name of the component (_DEL_FLAG) appended; concrete functions (i.e., calls to the component beneath DEL_FLAG) have the name _conc appended. Our motivation for doing so is to clearly distinguish functions on abstract objects from functions on concrete objects. Later we will see that this distinction is useful when components are composed.

3.2.3 ARRAY : DS

ARRAY is the component that encapsulates the concept of a preallocated array. ARRAY appends new objects to the end of an array; objects cannot be deleted (as there is no way to distinguish deleted objects from nondeleted objects). As ARRAY makes no calls to lower-level components, it has no DS parameters.

o

o

o

abstract representation

false

true false false

o

o o o

concrete representation

true o

// new attribute for element boolean DF; // boolean delete flag

newobj_DEL_FLAG() { // make new concrete instance // and flag if not deleted newobj_conc(); upd_conc(DF, TRUE); }

del_DEL_FLAG() { // flag instance deleted upd_conc(DF, TRUE); }

upd_DEL_FLAG(attr, val) { upd_conc(attr, val); }

ref_DEL_FLAG(attr) { return(ref_conc(attr)); }

cursor_DEL_FLAG(p) { // modify selection predicate cursor_conc((p) && !DF); } operator++_DEL_FLAG () { ++_conc; } pos_DEL_FLAG(newoid) { pos_conc(newoid); } reset_DEL_FLAG() { reset_conc(); }

Figure 10: Class Specializations and Methods for DEL_FLAG

ARRAY specializes the definition of the container class with the addition of attributes array and free_index, which are the preallocated array and the index to the first free row. It also specializes the definition of cursor with the additional attributes oid and pred, which are the row number of the cur- rent object and the selection predicate. Figure 11 summarizes these specializations and lists the methods for each major cursor operation.

3.2.4 The Composition DEL_FLAG[ARRAY]

Keeping explicit interfaces to components is a bad idea for data structures: component methods are often so small that the overhead of function calls can seriously degrade performance. Compositions of DS components must be accomplished primarily by function renaming and inline expansion. The monolithic code for Simple_Array can be generated from DEL_FLAG and ARRAY by the following steps.

newobj_ARRAY() { // find first free row // and remember ref to // object. no array // bounds error checking oid = k->free_index++; }

del_ARRAY(c) { // not implemented by // this component }

upd_ARRAY(attr, value) { // direct update of attr attr = value; }

cursor_ARRAY(p) { // remember the predicate pred = p; }

ref_ARRAY(attr) { return(attr); }

operator++_ARRAY() { element *t;

// examine next object; return EOR // if end of array or OK if object // is found that satisfies predicate loop { if (oid++ >= T.free_index) { status = EOR; return; } t = &k->array[oid]; if (pred(t)) { status = OK; return; } } } pos_ARRAY(newoid) { // no error checking on oid oid = newoid; } reset_ARRAY() { oid = -1; }

// new attributes for container objects element array[MAX_ARRAY_SIZE]; // array for class objects int free_index = 0; // index to first free row // new attributes for cursor objects pred pred; // selection predicate int oid; // index of current object

Figure 11: Class Specializations and Methods for ARRAY

Note that there are many other ways objects could be linked together on lists; each could be encapsu- lated as a distinct DS component.

3.2.5.3 INDEX[data, indx : DS] : DS

INDEX is the component that maps a nonindexed container to an indexed container; that is, it creates secondary indices over selected attributes of stored objects. For each attribute that is to be indexed, a con- tainer of index objects is created. The attributes to index are conveyed to the INDEX component by a spe- cial class variable called indexed_attr, which is assigned the list of names of attributes to be indexed. As an illustration, the emp class below would have two index containers created: one for the name attribute and another for age.

class emp { public: static ATTR_LIST indexed_attr = {name, age}; char name[12]; int age; char city[15]; char department[25]; }; There is an index object in an index container for every distinct value that the indexed attribute assumes. (Thus, indexing a boolean attribute would yield a container of two index objects). There are many possible structures that can be used to interconnect an index object with its data objects. The method illustrated below shows each index object at the head of a list of data objects that have the same attribute value as the index object. Note that only one index container is shown in this figure. Also note that this is

the first example of a component that creates new containers (i.e., index containers) as part of its class def- inition transformation.

o

o

o

abstract representation concrete representation

o

o

o

next prior

head

.. a ..

.. a ..

.. a ..

abstract representation concrete representation

.. a ..

.. a ..

.. a ..

a

a

index objects data objects

3.2.5.4 SEGMENT[p, s : DS] : DS

SEGMENT is the component that partitions each abstract object of a container into a pair of intercon- nected concrete objects that are stored in separate concrete containers. Figure 12 shows an abstract class of five attributes plus an additional class variable part. The transformation of the element definition per- formed by SEGMENT results in two concrete element classes. The primary concrete element class contains all attributes up to but not including the attribute identified by class variable part. (In the figure, this is attribute A4). The secndary concrete element class contains the remaining abstract attributes. Moreover, an additional pointer ptr is stored in each primary object to connect it to its corresponding sec- ondary object.

Segmenting records in database applications is occasionally useful [Bat78]; one need not retrieve from disk both halves of a record if only one half is needed. The performance advantages for SEGMENT in a data structure context are different. An example (from Genesis) is the partitioning of buffer header attributes from the page-buffer itself. Users write programs using a BUFFER class, which contains all attributes (headers + page buffer) of a buffer. For increased performance, page buffers should be aligned along page boundaries to minimize page faults when reading and writing buffers. Since the length of a BUFFER instance is greater than the size of a page, this alignment can only be guaranteed if the page-buffer attribute is stored separately from buffer header attributes. SEGMENT hides the ugly details of this partitioning, thereby simplifying the development of user programs [Roy91].

3.2.5.5 More Components

Hopefully by now, it should be evident that the cardinality of the DS realm is very large. Every DS component maps a container of abstract objects to one or more concrete containers of concrete objects. Classical data structures (BINARY_TREE[x:DS], AVL_TREE[x:DS], HASH[x:DS]) add extra attributes to objects to interlink them together to provide fast retrievals for certain types of queries. (Pri- mary key information for these structures would be conveyed using a class variable key, similar to the way attributes that are to be indexed are conveyed). They are parameterized in that their mappings do not depend on how their concrete objects are stored.

There are other broad classes of mappings which are not normally recognized as ‘data structures’. Per- haps the most important of these are components that realize persistent storage (e.g. [Mos88]). That is, abstract objects appear to be main-memory resident, while their concrete counterparts reside in persistent

Figure 12: Class Definition Mapping of SEGMENT

class element { static ATTR part = { A4 };

T1 A1; T2 A2; T3 A3; T4 A4; T5 A5; };

(a) Abstract Element Definition

class element_primary { static ATTR part = { A4 };

T1 A1; T2 A2; T3 A3; element_secondary *ptr; };

class element_secondary { T4 A4; T5 A5; }; (b) Concrete Element Definitions

segment objects, as mapped by D3, are listed in Figure 13. Keep in mind that each of the class (actually element) definitions of Figure 13 would be created by a preprocessor and are hidden from users. Users always write programs based on unmapped class definitions.

3.3.2 Reverse Engineering

Reverse engineering assumes that designers are not familiar with components. Rather, designers out- line the concrete class definitions and the algorithms for basic cursor operations. The problem is to infer what composition of DS components yields this target data structure. An example of this type of problem is given below.

Consider Figure 14 which shows a container of objects of type R. A binary tree maintains objects in key order, where nodes of the tree store pointers to objects rather than containing the objects themselves.

class element_primary { public: static ATTR_LIST indexed_attr = {age}; static ATTR part = city; // augmented attributes element_primary *next_index; // INDEX ptr element_primary *next; // DLIST ptr element_primary *prior; // DLIST ptr element_secondary *ptr; // SEGMENT ptr boolean DF; // DF delflag // original emp attributes char name[12]; int age; } (a) element_primary Class Definition class element_secondary { public: // augmented attributes element_secondary *next; // DLIST ptr element_secondary *prior; // DLIST ptr boolean DF; // DF delflag // original emp attributes char city[15]; char department[25]; } (b) element_secondary Class Definition class age_index { public: int age; element_primary *ptr; // INDEX list head } (c) age_index Class Definition

Figure 13: Mappings of the D3 Data Structure

Binary tree nodes are dynamically allocated. What is the DS type expression for this data structure/con- tainer?

There are three steps in reverse engineering.

  1. Identify the abstract class and its objects that are to be stored.

In this example, abstract 3-tuples are being stored. The class definition for these tuples is shown below. (For the moment, ignore the class variables in the definition).

class R { static ATTR key = ABC; // class variables static ATTR part = ABC; enum {a, b, c} ABC; enum {r, s, t} RST; enum {x, y, z, w} XYZW; }

  1. Recognize the mappings/components that are used.

From the informal description, we suspect that the mappings of MALLOC, BINARY_TREE[x:DS], and SEGMENT[x,y:DS] are referenced. Since the binary tree maintains objects in attribute ABC order, we convey key information to BINARY_TREE through the key = ABC class variable. Also, since all attributes appear to be stored in secondary objects of a SEGMENT mapping, the part = ABC class variable must also be added.

  1. Propose a combination of components from those identified in previously and confirm or refute that the combination produces the desired implementation. As a first guess, consider the expression: E1 = SEGMENT[BINARY_TREE[MALLOC], MALLOC]

ABC RST XYZW

attributes:

c t w b r z a s x b r z a q y

Figure 14: A Layered Data Structure