
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Austin, Texas 78712. Abstract: We present a model of the data structure domain that is expressed in terms of the GenVoca domain modeling concepts [Bat91].
Typology: Exercises
1 / 24
This page cannot be seen from the preview
Don't miss anything!
Abstract: We present a model of the data structure domain that is expressed in terms of the GenVoca domain modeling concepts [Bat91]. We show how familiar data structures can be encapsulated as realms of plug-compatible, symmetric, and reusable components, and we show how complex data structures can be formed from their composition. The tar- get application of our research is a precompiler for specifying and generating customized data structures.
Keywords: software building-blocks, domain modeling, software reuse, data structures.
A fundamental goal of software engineering is to understand how software components fit together to form complex systems. Domain modeling is a means to achieve this goal; it is the study of a domain of similar software systems to identify the primitive and reusable components of that domain and to show how compositions of components not only explain existing systems but also predict families of yet unbuilt systems that have interesting and novel properties. In essence, domain models can be blue-prints for the as-yet-to-be-achieved software building-block technologies.
Domain modeling is presently an immature discipline [Pri91]. Besides the general skepticism that characteristically accompanies new areas of research, domain modelers face three difficult barriers to the development and popularization of their ideas:
- Domain models must be expressed in terms of constructs and software organization principles that are domain-independent. Domain-specific constructs are, by definition, not applicable to other domains. Models based exclusively on domain-specific ideas are both difficult to understand and contribute little to helping other researchers understand how other domains can be modeled. - In order to understand a domain model, one must be intimately familiar with the domain itself. It is often hard for non-experts (and even experts) to appreciate the difficulty of a domain modeling effort and its contributions. - Domain models are nontrivial. One cannot accept a domain model on face value; it is essential that there be an accompanying implementation for model validation. Unfortunately, because of the time and expense involved, few models are validated. Without validation, however, the significance of a model is questionable. In this paper, we attempt to cross all three barriers (or alternatively, we show why each is hard to cross!). First, we review the GenVoca domain modeling concepts, which have been validated on the com-
plex and disparate domains of database and network software [Bat91]^3 Next, we apply these concepts to develop a model of data structures, a domain that should be familiar to all readers. Finally, we explain how we are validating our domain model by explaining the mechanics of a precompiler for data structures. As we are covering a broad sweep of issues, we present in the introduction of every section the “big picture” of what we are about to do.
We believe our research makes two contributions. First, our domain model outlines the beginnings of a technology for synthesizing customized object base software from prewritten building-blocks. Persis- tency is but one of many options (i.e., building-blocks) that can be selected. Our paper will stress the non- persistent (i.e., data structure) aspects of this technology. Second, we see our work as an example of the types of activities and problems that are commonly encountered in a domain modeling effort. Other researchers may benefit in their domain modeling efforts by following our approach.
2.0 The GenVoca Domain-Modeling Concepts
As mentioned earlier, it is important for domain models to be expressed in terms of domain-indepen- dent concepts. Object-oriented software design models, as examples, offer a collection of concepts (e.g., entities, attributes, relationships, and inheritance) that software engineers can use to express the designs of their systems. These concepts are domain-independent as they have been shown to be applicable to a wide-variety of problems in very different domains [Rum91, Boo87, Teo86]. The ER model is a common graphical representation of object-oriented concepts [Che76, Kor91]. We will use the ER model as the vehicle for discussing OO concepts in this paper.
The concepts of the ER model are necessary, but not sufficient, to explain important kinds of software components and to express hierarchical software systems as compositions of these components. To do so requires additional concepts that transcend traditional ER/object-oriented models. We set the stage for introducing these concepts in Section 2.1 where we explain the relationship of ER/OO models with hierar- chical system designs. We then introduce the concepts of components, realms, and type expressions for modeling hierarchical systems in Section 2.2. We conclude the section by explaining how our notion of parameterized components is different than traditional notions of parameterized types.
Hierarchical software systems are designed in terms of levels , where the interface of each level is a vir- tual machine. All operations on level i+1 are expressed in terms of operations of level i. A layer is the mapping of operations between levels. The idea is to localize or encapsulate specific complexities of a system within layers, thereby simplifying overall system design [Dij68, Hab76].
Object-oriented programming languages and design methodologies have shown that interface specifi- cations should encompass both objects and operations, rather than operations alone. An object model or object-oriented virtual machine (OOVM) is, in general, a set of classes and their interrelationships. ER diagrams depict object models using boxes to represent classes and arrows to represent relationships between classes. Figure 1 shows two object models: Model R exhibits classes A, B, and C with relation- ships A-C and B-C; Model S exhibits classes D and E with relationship D-E.
A hierarchical system design, from an object-oriented perspective, is a set of object-oriented virtual machines, one for each level of a system. Figure 2 shows the object model interfaces for levels i and i+ in a hierarchical system; the interface for level i+1 is object model R and object model S is the interface for level i.
Because every component within a realm exports exactly the same interface, all components are plug- compatible and interchangeable. Moreover, components may have parameters. Consider component f[x:Realm_S]. f[ ] exports interface R (because f[ ] belongs to Realm_R) and imports interface S (because f[ ] has parameter x:Realm_S). Moreover, f[ ] translates operations and objects of OOVM R to objects and operations of OOVM S without depending on how OOVM S is implemented. How OOVM S is implemented is specified by parameter x.
Two useful results follow. First, a software system of potentially enormous complexity is represented as a type expression (i.e., a composition of components). The following three systems present OOVM R as their interface:
system1 = f[r] system2 = f[t] system3 = g[t]
The above syntax was borrowed from the parameterized type notation in [Gog84, Gog86]. While the syn- tax is similar, we will show in Section 2.3 that the semantics of our components are rather different.
Second, component reuse corresponds to common subexpressions. Whenever different systems (dif- ferent expressions) reference a common subsystem (subexpression), then that subsystem (and its compo- nents) is being reused. Thus, system1 and system2 reuse component f[], and system2 and system3 reuse component t.
A fundamental concept of GenVoca is symmetric components, i.e., components that can be composed in arbitrary orders. More precisely, a symmetric component of realm W has a parameter of type W. Compo- nents m and n of realm W below are symmetric, while components p and q are not:
W = { m[x:W], n[x:W], p, q[y:R], ...} Since m[ ] and n[ ] are symmetric, compositions m[n[...]] and n[m[...]] are possible. As a general rule, different composition orders have different meanings. While symmetric components seem strange, readers are probably familiar with a classical example of symmetric components: UNIX file filters. File filters present the same byte-stream interface for both their input and output; composing UNIX file fil- ters together in a pipe is an example of composing symmetric components. As in the general case, the order in which one composes file filters (symmetric components) makes a difference in both semantics and per- formance.
The concepts of type expressions and realms of plug-compatible and parameterized components are not part of the ER/OO models. These concepts are domain-independent extensions to these models and have been validated in two rather different domains: databases and network software [Bat85-91, Oma90, Hut91]. The domain models for both databases and networks had their own sets of realms of interchange- able components. As this paper develops, we will show that the domain of data structures has a similar organization.
Our type expression notation was borrowed from research on parameterized types [Gog84]. However, the semantics of the components that we examine in this paper are very different than traditional constructs (e.g., templates in C++ [Str91], ADA generics [Ghe82], OBJ sorts [Gog86]).
Common to all parameterized types (both ours and traditional) is the concept of a container , i.e., a set of generic objects. Stacks, lists, binary trees, queues, etc. are specialized containers whose algorithms are defined independently of the types of objects that are to be stored.
Traditional. Traditional parameterized types take the container idea a step further. An instance of a traditional parameterized type (TPT) (e.g., stack, queue) is itself an object (i.e., a container). The essential idea is that a TPT maps a container of objects to a single object.
Consider the LIST[] and BINARY_TREE[] parameterized types. An instance of LIST[T] is a single object (a list) that represents a list of objects of type T. An instance of BINARY_TREE[LIST[T]] is a single object (a binary tree), where each tree node references a list object (which itself is a container of objects of type T). Figure 3 shows how a container of three objects of type T, namely { t1, t2, t3 }, is mapped by BINARY_TREE[LIST[T]] to a binary tree with a single node. (We depict containers whose implementation is partially specified as a set of objects/records within an enclosed boundary - i.e., as in Figure 3a. When a container implementation is fully specified, the container’s objects are drawn without an enclosed boundary, as in Figure 3b).
Nontraditional. We propose different semantics for parameterization; a nontraditional parameterized type (NPT) maps a container of abstract objects to one or more containers of concrete objects. The con- tainer abstraction and interface remain unchanged by the mapping.
Consider the LIST’[] and BINARY_TREE’[] NPTs. LIST’[T] maps the objects {t 1 ... tn} of container T to the set {(t 1 , next 1 ) ... (tn, nextn)}, where each nexti is a pointer to the object (ti+1, nexti+1). BINARY_TREE’[T] maps the objects {t 1 ... tn} of container T to the set {(left 1 , t 1 , right 1 ) ... (leftn, tn, rightn)}, where lefti and righti are pointers to the left and right tree objects of ti.
BINARY_TREE’[LIST’[T]] maps a container of T objects to a container of 4-tuples of the form (lefti, ti, nexti, righti) (see Figure 4a). Figure 4b shows how the container of Figure 3a is mapped by BINARY_TREE'[LIST’[T]] to a container of objects that are independently interconnected by both a binary tree and list. Note that the resulting container can be implemented in a variety of ways (e.g., the 4- tuples could be stored by the LIST[] or BINARY_TREE[] TPTs). It is worth noting that the situation
t
t
t
binary_tree
(a) BINARY_TREE[LIST[T]] Mapping
list
binary_tree
t1 t2 t
(b) Resulting Data Structure
Figure 3: Traditional Parameterized Type Mappings
Two different containers C1 and C2 of T objects would be declared as instances of a C++ template [Str91]:
list_container
binary_tree_container<list_container
cursor<list_container
In general, operations on objects inside a container must be performed through cursors. In order to update an attribute of an object, for example, one must first position a cursor on the object before the update operation is issued. Figure 6 summarizes operations that can be performed on a cursor c. This list is not intended to be comprehensive; it represents our first attempt to define a standardized interface for con- tainers.
Figure 5: A General Definition of a C++ Class
class T { private: // private class variables, functions and instance variables ... public: // public instance variables and functions T1 A1; // T1 is the type for attribute A T2 A2; ... Tn An; T (); // constructor ~T (); // destructor ... // other public functions };
To make our discussions concrete, consider the emp class and a container EMP:
The following program fragments illustrate how cursors are used.
Example 1. Insert an object for ‘Batory’ into container EMP.
cursor<Emp_Container> e(EMP); // constructor # e.newobj(); // create new EMP object e.name = "Batory"; // fill in attribute values e.age = 38; e.city = "Austin"; e.dept = "Software";
Figure 6: Operations on Cursors
cursor
constructor #2 for a cursor c on objects in container C of type K. c ranges only over objects that satisfy predicate. ~cursor
class emp { public: char name[12]; int age; char city[15]; char department[25]; // emp operations ... }; typedef list_container
element
We emphasize that that the definitions in Figure 8 are not in their final form. When data structures are viewed as a composition of components, each component may transform each of these class definitions by adding or removing attributes and possibly creating new container, cursor, and element types. As none of these transformations are particularly obvious, we need to consider specific examples of data structure components to show that these class definitions are indeed modified. This is one of the subjects of the next section.
Figure 8: Definition of Class T and its Cursor Definition
template
private: // nothing initially
public: container
private: // private class variables, // functions, & instance variables ...
// formerly public instance // variables and functions T1 A1; T2 A2; ... Tn An;
element
// no public variables, functions }; (b) Definition of element
template
public: // instance variables K *k; // the container enum {OK, EOR} status; // cursor status
// public cursor operations cursor
// operations to reference instance // variables of elements in container A1; // attribute A A2; // attribute A ... An; // attribute An
// previously user-define function ... }; (c) Definition of cursor
Let DS be the realm of components that implement our container OOVM of Figure 7 and Figure 8. Each DS component encapsulates a primitive data structure. Clearly there is an enormous number of such components; listing them all is impossible. Rather than enumerating realm membership, we will review specific examples and discuss the basic paradigm that accounts for a wide spectrum of useful data struc- tures. Having done so, it is a simple intuitive leap to recognize the scope of DS realm. The specific mem- bers of DS that we will examine are:
DS = { DEL_FLAG[x:DS], ARRAY, MALLOC, DLIST[x:DS], BINTREE[x:DS], SEGMENT[p,s:DS], INDEX[d,i:DS], ... } The ‘big picture’ that we want to convey in this section is that DS components are NPTs. Composi- tions of DS components implement TPTs. The container types list_container and bina- ry_tree_container that we encountered in Section 3.1 are TPTs that can be constructed from DS components. In general, users can specify their own container types using the container_def state- ment, which defines TPTs as a composition of NPTs (DS components). Three examples are shown below:
container_def Simple_Array = DEL_FLAG[ARRAY]; container_def list_container = DLIST[MALLOC]; container_def binary_tree_container = BINTREE[DEL_FLAG[ARRAY]]; Of the three container types, Simple_Array is the most elementary. We will first examine a mono- lithic implementation (i.e., one without layering) of Simple_Array as implementations of the ele- ment<>, container<>, and cursor<> classes. Then we will show how this implementation is actually a composition of the two independently definable components, DEL_FLAG[] and ARRAY.
Suppose a container is implemented by a preallocated array. Every object that it contains is augmented with a boolean attribute DF (delete flag). When an object is inserted, it is placed at the end of the array and its delete flag is set to false. When an object is deleted, its delete flag is set to true and the space the object occupies is not reclaimed. When a user requests the retrieval of all objects that satisfy a predicate P, the predicate that is actually applied to objects is (P ∧ ¬DF). (That is, retrieved objects must both satisfy the user's request and must not be deleted.) The figure below shows both the abstract and concrete repre- sentations of a small container of objects. Note that two of the concrete objects have been deleted.
Recall that we said in Section 3.1.2 that the definitions of the container<>, element<>, and cursor<> classes are transformed by container implementations. Simple_Array specializes the container<> class definition with the additional attributes array (which is the actual array) and
o
o
o
abstract representation
false true true false false ...
o o o o o ...
concrete representation (^) array
tations of a small set of objects. Note that two of the concrete objects have been logically deleted. The DEL_FLAG component is parameterized (x:DS) because its abstract-to-concrete mappings are not in any way dependent on how its concrete objects are stored.
Figure 10 shows the class specializations and methods of DEL_FLAG. Note that we have taken two liberties with our C++ notation. Abstract functions have the name of the component (_DEL_FLAG) appended; concrete functions (i.e., calls to the component beneath DEL_FLAG) have the name _conc appended. Our motivation for doing so is to clearly distinguish functions on abstract objects from functions on concrete objects. Later we will see that this distinction is useful when components are composed.
ARRAY is the component that encapsulates the concept of a preallocated array. ARRAY appends new objects to the end of an array; objects cannot be deleted (as there is no way to distinguish deleted objects from nondeleted objects). As ARRAY makes no calls to lower-level components, it has no DS parameters.
o
o
o
abstract representation
false
true false false
o
o o o
concrete representation
true o
// new attribute for element
newobj_DEL_FLAG() { // make new concrete instance // and flag if not deleted newobj_conc(); upd_conc(DF, TRUE); }
del_DEL_FLAG() { // flag instance deleted upd_conc(DF, TRUE); }
upd_DEL_FLAG(attr, val) { upd_conc(attr, val); }
ref_DEL_FLAG(attr) { return(ref_conc(attr)); }
cursor
Figure 10: Class Specializations and Methods for DEL_FLAG
ARRAY specializes the definition of the container class with the addition of attributes array and free_index, which are the preallocated array and the index to the first free row. It also specializes the definition of cursor with the additional attributes oid and pred, which are the row number of the cur- rent object and the selection predicate. Figure 11 summarizes these specializations and lists the methods for each major cursor operation.
Keeping explicit interfaces to components is a bad idea for data structures: component methods are often so small that the overhead of function calls can seriously degrade performance. Compositions of DS components must be accomplished primarily by function renaming and inline expansion. The monolithic code for Simple_Array can be generated from DEL_FLAG and ARRAY by the following steps.
newobj_ARRAY() { // find first free row // and remember ref to // object. no array // bounds error checking oid = k->free_index++; }
del_ARRAY(c) { // not implemented by // this component }
upd_ARRAY(attr, value) { // direct update of attr attr = value; }
cursor
ref_ARRAY(attr) { return(attr); }
operator++_ARRAY() { element
// examine next object; return EOR // if end of array or OK if object // is found that satisfies predicate loop { if (oid++ >= T.free_index) { status = EOR; return; } t = &k->array[oid]; if (pred(t)) { status = OK; return; } } } pos_ARRAY(newoid) { // no error checking on oid oid = newoid; } reset_ARRAY() { oid = -1; }
// new attributes for container
Figure 11: Class Specializations and Methods for ARRAY
Note that there are many other ways objects could be linked together on lists; each could be encapsu- lated as a distinct DS component.
INDEX is the component that maps a nonindexed container to an indexed container; that is, it creates secondary indices over selected attributes of stored objects. For each attribute that is to be indexed, a con- tainer of index objects is created. The attributes to index are conveyed to the INDEX component by a spe- cial class variable called indexed_attr, which is assigned the list of names of attributes to be indexed. As an illustration, the emp class below would have two index containers created: one for the name attribute and another for age.
class emp { public: static ATTR_LIST indexed_attr = {name, age}; char name[12]; int age; char city[15]; char department[25]; }; There is an index object in an index container for every distinct value that the indexed attribute assumes. (Thus, indexing a boolean attribute would yield a container of two index objects). There are many possible structures that can be used to interconnect an index object with its data objects. The method illustrated below shows each index object at the head of a list of data objects that have the same attribute value as the index object. Note that only one index container is shown in this figure. Also note that this is
the first example of a component that creates new containers (i.e., index containers) as part of its class def- inition transformation.
o
o
o
abstract representation concrete representation
o
o
o
next prior
head
.. a ..
.. a ..
.. a ..
abstract representation concrete representation
.. a ..
.. a ..
.. a ..
a
a
index objects data objects
SEGMENT is the component that partitions each abstract object of a container into a pair of intercon- nected concrete objects that are stored in separate concrete containers. Figure 12 shows an abstract class of five attributes plus an additional class variable part. The transformation of the element definition per- formed by SEGMENT results in two concrete element classes. The primary concrete element class contains all attributes up to but not including the attribute identified by class variable part. (In the figure, this is attribute A4). The secndary concrete element class contains the remaining abstract attributes. Moreover, an additional pointer ptr is stored in each primary object to connect it to its corresponding sec- ondary object.
Segmenting records in database applications is occasionally useful [Bat78]; one need not retrieve from disk both halves of a record if only one half is needed. The performance advantages for SEGMENT in a data structure context are different. An example (from Genesis) is the partitioning of buffer header attributes from the page-buffer itself. Users write programs using a BUFFER class, which contains all attributes (headers + page buffer) of a buffer. For increased performance, page buffers should be aligned along page boundaries to minimize page faults when reading and writing buffers. Since the length of a BUFFER instance is greater than the size of a page, this alignment can only be guaranteed if the page-buffer attribute is stored separately from buffer header attributes. SEGMENT hides the ugly details of this partitioning, thereby simplifying the development of user programs [Roy91].
Hopefully by now, it should be evident that the cardinality of the DS realm is very large. Every DS component maps a container of abstract objects to one or more concrete containers of concrete objects. Classical data structures (BINARY_TREE[x:DS], AVL_TREE[x:DS], HASH[x:DS]) add extra attributes to objects to interlink them together to provide fast retrievals for certain types of queries. (Pri- mary key information for these structures would be conveyed using a class variable key, similar to the way attributes that are to be indexed are conveyed). They are parameterized in that their mappings do not depend on how their concrete objects are stored.
There are other broad classes of mappings which are not normally recognized as ‘data structures’. Per- haps the most important of these are components that realize persistent storage (e.g. [Mos88]). That is, abstract objects appear to be main-memory resident, while their concrete counterparts reside in persistent
Figure 12: Class Definition Mapping of SEGMENT
class element
T1 A1; T2 A2; T3 A3; T4 A4; T5 A5; };
(a) Abstract Element Definition
class element_primary
T1 A1; T2 A2; T3 A3; element_secondary
class element_secondary
segment objects, as mapped by D3, are listed in Figure 13. Keep in mind that each of the class (actually element) definitions of Figure 13 would be created by a preprocessor and are hidden from users. Users always write programs based on unmapped class definitions.
Reverse engineering assumes that designers are not familiar with components. Rather, designers out- line the concrete class definitions and the algorithms for basic cursor operations. The problem is to infer what composition of DS components yields this target data structure. An example of this type of problem is given below.
Consider Figure 14 which shows a container of objects of type R. A binary tree maintains objects in key order, where nodes of the tree store pointers to objects rather than containing the objects themselves.
class element_primary
Figure 13: Mappings of the D3 Data Structure
Binary tree nodes are dynamically allocated. What is the DS type expression for this data structure/con- tainer?
There are three steps in reverse engineering.
In this example, abstract 3-tuples are being stored. The class definition for these tuples is shown below. (For the moment, ignore the class variables in the definition).
class R { static ATTR key = ABC; // class variables static ATTR part = ABC; enum {a, b, c} ABC; enum {r, s, t} RST; enum {x, y, z, w} XYZW; }
From the informal description, we suspect that the mappings of MALLOC, BINARY_TREE[x:DS], and SEGMENT[x,y:DS] are referenced. Since the binary tree maintains objects in attribute ABC order, we convey key information to BINARY_TREE through the key = ABC class variable. Also, since all attributes appear to be stored in secondary objects of a SEGMENT mapping, the part = ABC class variable must also be added.
attributes:
c t w b r z a s x b r z a q y
Figure 14: A Layered Data Structure