









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Stanford University, AT&T Labs Research, MIT, Tel Aviv University. Abstract. We consider the problem of specifying data structures with.
Typology: Slides
1 / 17
This page cannot be seen from the preview
Don't miss anything!
Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv
Stanford University, AT&T Labs Research, MIT, Tel Aviv University
Abstract. We consider the problem of specifying data structures with complex sharing in a manner that is both declarative and results in provably correct code. In our approach, abstract data types are speci- fied using relational algebra and functional dependencies; a novel fuse operation on relational indexes specifies where the underlying physical data structure representation has sharing. We permit the user to specify different concrete shared representations for relations, and show that the semantics of the relational specification are preserved.
Consider the data structure used in an operating system kernel to represent the set of available file systems. There are two kinds of objects: file systems and files. Each file system has a list of its files, and each file may be in one of two states, either currently in use or currently unused. Figure 1 sketches the data structure typically used:^1 each file system is the head of a linked list of its files, and two other linked lists maintain the set of files in use and files not in use. Thus, every file participates in two lists: the list of files in its file system, and one of the in-use or not-in-use lists. A characteristic feature of this example is the sharing: the files participate in multiple data structures. Sharing usually implies that there are non-trivial high-level invariants to be maintained when the structure is updated. For example, in Figure 1, if a file is removed from a file system, it should be removed from the in-use or not-in-use list as well. A second characteristic is that the structure is highly optimized for a particular expected usage pattern. In Figure 1, it is easy to enumerate all of the files in a file system, but without adding a parent pointer to the file objects we have only a very slow way to discover which file system owns a particular file. We are interested in the problem of how to support high-level, declarative specification of complex data structures with sharing while also achieving ef- ficient and safe low-level implementations. Existing languages provide at most one or the other. Modern functional languages provide excellent support for inductive data structures, which are all essentially trees of some flavor. When multiple such data structures overlap (i.e., when there is more than one induc- tive structure and they are not separate), functional languages do not provide any support beyond what is available in conventional object-oriented and pro- cedural languages. All of these languages require the programmer to build and maintain mutable structures with sharing by using explicit pointers or reference
(^1) This example is a simplified version of the file system representation in Linux, where file systems are called superblocks and files are inodes.
filesystems
filesystem s list s files
file f list f fs list
file f list f fs list
file f list f fs list
filesystem s list s files
file f list f fs list
file f list f fs list
file in use file unused
Fig. 1. File objects simultaneously participate in multiple circular lists. Differ- ent line types denote different lists.
cells. While the programmer can get exactly the desired representation, there is no support for maintaining or even describing invariants of the data structure. Languages built on relations, such as SQL and logic programming languages, provide much higher-level support. We could encode the example above using the relation: file(filesystem : int, fileid : int, inuse : bool)
Here integers suffice as unique identifiers for file systems and files, and a boolean records whether or not the file is in use. Using standard query facilities we can conveniently find for a file system fs all of its files file(fs, , ) as well as all of the files not in use file( , , false). Even better, using functional dependencies we can specify important high-level invariants, such as that every file is part of exactly one file system, and every file is either in use or not; i.e., the fileid functionally determines the filesystem and inuse fields. Thus, there is only one tuple in the relation per fileid, and when the tuple with a fileid is deleted all trace of that file is provably removed from the relation. Finally, relations are general; since pointers are just relationships between objects, any pointer data structure can be described by a set of relations. Adding relations to general-purpose programming languages is a well-accepted idea. Missing from existing proposals is the ability to provide highly specialized implementations of relations, and in particular to take advantage of the potential for mutable data structures with sharing. Our vision is a programming language where low-level pointer data structures are specified using high-level relations. Furthermore, because of the high-level specification, the language system can produce code that is correct by construc- tion; even in cases where the implementation has complex sharing and destruc- tive update, the implementation is guaranteed to be a faithful representation of the relational specification. In this paper, we take only the first step in realizing this plan, focusing on the core problem of what it means to represent a given high-level relation by a low-level representation (possibly with sharing) that is provably correct. We do not address in this paper the design of a surface syntax for integrating relational operations into a full programming language (there are many existing proposals). This paper is organized into several parts, each of which highlights a separate contribution of our work:
emptyd : unit → (α 1 ,... , αk) relationd insertd : α 1 ∗ · · · ∗ αk → (α 1 ,... , αk) relationd → unit
removed : α 1 ∗ · · · ∗ αk → (α 1 ,... , αk) relationd → unit
queryd : (α 1 ,... , αk) relationd → α 1 option ∗ · · · ∗ αk option → (α 1 ∗ · · · αk) list
Fig. 2. Primitive operations on logical relations
records the list of successors and predecessors of each vertex v ∈ V. In ML, we might represent a graph via adjacency lists as the type
type g = (v, (v ∗ int) list) btree ∗ (v, (v ∗ int) list) btree,
assuming v is the type of vertices, and (α, β) btree is a binary tree mapping keys of type α to values of type β. Here the graph is represented as two collaborating data structures, namely a binary tree mapping each vertex to a list of its succes- sors, together with the corresponding edge weights, and a binary tree mapping each vertex to a list of its predecessors, and the corresponding edge weights. One problem with our proposed ML representation is that the successor and predecessor data structures represent the same set of edges; however it is the pro- grammer’s responsibility to ensure that the two data structure representations remain consistent. Another problem is that with only tree-like data structures there is no natural place to put the edge weight—we can place it in either the successor data structure or the predecessor data structure, increasing the time complexity of certain queries, or we can duplicate the weight, as we have here, which increases the space cost and introduces the possibility of inconsistencies. Instead, we can use a relation. We represent the edges of our directed graph as a relation g with three columns (src, dst, weight), in which each tuple represents the source, destination, and weight of an edge. The graph shown in Figure 5(a) can be represented as the relation {〈 1 , 2 , 17 〉 , 〈 1 , 3 , 42 〉}. We call the usual math- ematical view of a relation as a set of tuples the logical representation. We extend ML with a new type constructor (α 1 ,... , αk) relation which rep- resents relations of arity k, together with a set of primitive operations to ma- nipulate relations. Relations are mutable data structures conceptually similar to (α 1 ∗ · · · ∗ αk) list ref, with a very different representation. The primitives with which the client programmer manipulates relations, shown in Figure 2, are cre- ating an empty relation, operations to insert and remove tuples from a relation, and query, which returns the list of tuples matching a tuple pattern, a tuple in which some fields are missing. We describe a minimal interface to make proofs easier; a practical implementation should provide a richer set of primitives, such as an interface along the lines of LINQ [15].
2.2 Indices and Tree Decompositions
The data structure designer describes how to represent a logical relation using an index, which specifies how to decompose the relation into a collection of nested map and join operations over unit relations containing individual tuples. Different decompositions lead to different operations being particularly efficient. We do not maintain an underlying list of tuples; the only representation of a
d ::= unit(c) | map(ψ, c, d′) | join(d 1 , d 2 , L) indices ψ ::= option | slist | dlist | btree data struct. l ∈ L ::= (fuse, z 1 , z 2 ) | (link, z 1 , z 2 ) cross-links z ∈ contour ::= {m, l, r}∗^ stat. contours y ∈ dcontour ::= {mv, l, r}∗^ dyn. contours
Fig. 3. Syntax of indices
relation is that described by an index. Beyond the index definition programmers can remain oblivious of details of how relations are represented. Every relation r has an associated index d describing how to decompose the relation into a tree and how to lay that tree out in memory; Figure 3 shows the syntax of indices. Given an index d and a relation r we can form a tree decomposition ρ whose structure is governed by d; Figure 4 defines the syntax of tree decompositions. There are three kinds of index that we can use to decompose a relation, each of which has a corresponding kind of tree-decomposition node:
Static Contours We annotate each term in the index with a unique name called a static contour. Formally, a static contour z is a path in an index d which identifies a specific sub-index d′. A static contour z is drawn from the set {m, l, r}∗, where m means “move to the child index of a map index”, l means “move to the left sub-index of a join index”, and r means “move to the right sub-index of a join index”. We write d.z to denote the sub-index of d identified by a contour z. In our directed graph we want to find the set of successors and find the set of predecessors of a vertex efficiently. One index that satisfies this constraint is
(a) (b) (c)
1
2
3
17
42
{〈 1 , 2 , 17 〉 , 〈 1 , 3 , 42 〉}
17 42
l^ r
msrc 1 mdst 2 mdst 3
m 2 dst
m 3 dst (^) msrc 1 msrc 1
〈 17 〉 (^) 〈 42 〉 nil
nil
nil
nil 〈 3 〉^ 〈^1 〉
〈 2 〉
〈 2 〉
.jleft
.map
.jright
.map
.left
.map
.next
.map
.map
.udata .udata
.right
.right .left .next
.next
.left .right
〈 1 〉 .mdata 〈 3 〉 .mdata
.mdata
.next .mdata
.mdata
.mdata
.mdata
Fig. 5. Representations of a weighted directed graph: (a) An example graph, and its representation as a relation, (b) A tree decomposition of the relation in (a), with fused data structures shown as conjoined nodes, and (c) a diagram of the memory state that represents (b).
In the graph example, we would like to share the weight of each edge between the two representations. Observe that given a (src, dst) pair, the weight is the same whether we traverse the links in the left or the right tree. That is, there is a functional dependency: any (src, dst) pair determines a unique weight, and it does not matter whether we visit the src or the dst first. Hence instead of replicating the weight, we can share it between the two trees, specified here by the fuse declaration. The declaration says that the data structure we get after looking up a src and then a dst in the left tree should be fused with the data structure we get by looking up a dst and then a src in the right tree. Each join index takes an argument L which is a set of cross-linking decla- rations (link, z 1 , z 2 ) and fusion declarations (fuse, z 1 , z 2 ). A cross-linking decla- ration (link, z 1 , z 2 ) states that a pointer should be maintained from each object with static contour z 1 to the corresponding object with static contour z 2. Simi- larly, a fusion declaration (fuse, z 1 , z 2 ) states that objects with static contour z 1 should be placed adjacent to the corresponding object with static contour z 2. By “corresponding” object we mean the object with static contour z 2 , whose column values are drawn from the set bound by following static contour z 1. In the graph example, the contour rmm names the data structure we get by looking in the right component of the join (r) and then navigating down two map indices (mm), i.e., looking in the right tree and then following first the dst and then the src links. The contour lmm names the corresponding location in the left tree. The fuse declaration indicates these two nodes should be merged, with the weight data structure from the left tree being fused with the empty data structure from the right tree. Figure 5(b) depicts the index structure after fusion. Figure 5(c) graphically depicts the resulting physical memory state that represents the graph of Figure 5(b). The conjoined nodes in the figure are placed at a constant field offset from one another on the heap.
2.4 Process Scheduler
As another example, suppose we want to represent the data for a simple operat- ing system process scheduler (as in [13]). The scheduler maintains a list of live
(TWfEmp) {} |=T unit(c)
(TWfUnit)
|v| = |c| {v} |=T unit(c)
(TWfMap)
∀i ∈ I. |vi| = |c| ∀i ∈ I. ρi |=T d ∀i ∈ I. αt(ρi, d) 6 = ∅ {vi 7 → ρi}i∈I |=T map(ψ, c, d)
(TWfJoin)
ρ 1 |=T d 1 αt(ρ 1 , d 1 ) |= dom d 1 ∩ dom d 2 → dom d 1 \ dom d 2 ρ 2 |=T d 2 πdom d 1 ∩dom d 2 αt(ρ 1 , d 1 ) = πdom d 1 ∩dom d 2 αt(ρ 2 , d 2 ) (ρ 1 , ρ 2 ) |=T join(d 1 , d 2 , L)
Fig. 6. Well-formed tree decompositions: ρ |=T d
processes. A live process can be in any one of a number of states, e.g. running or sleeping. The scheduler also maintains a list of possible process states; for each state we maintain a tree of processes with that state. We represent the sched- uler’s data by a relation live(pid , state, uid , walltime, cputime), and the index
join·
mapl
btree, [pid ], unitlm([uid , walltime, cputime])
mapr
dlist, [state], maprm(btree, [pid ], unitrmm([])
, {(fuse, rmm, lm)}
The index allows us both to efficiently find the information associated with the pid of a particular process, and to manipulate the set of processes with any given state and their associated data. In this case the fuse construct allows us to jump directly between the pid entry in a per-state binary tree and the data such as walltime and cputime associated with the process.
2.5 Minesweeper
Another example is motivated by the game of Minesweeper. A Minesweeper board consists of a 2-dimensional matrix of cells. Each cell may or may not have a mine; each cell may also be concealed or exposed. Every cell starts off in the unexposed state; the goal of the game is to expose all of the cells that do not have mines without exposing a cell containing a mine. Some implementations of Minesweeper also implement a “peek” cheat code that iterates over the set of unexposed cells, temporarily displaying them as exposed. We represent a board by the relation board(x, y, ismined , isexposed ), with the index:
join·
mapl
btree, [x ], maplm(btree, [y], unitlmm([ismined , isexposed ]))
mapr
slist, [isexposed ], maprm(btree, [x, y], unitrmm([])
, {(link, rmm, lmm)}
In this example, the index specifies a cross-link rather than a fusion. Cross- linking adds a pointer from one object in a tree decomposition to another object, providing a “short-cut” from one data structure to another.
In this and subsequent sections we give the details of how we can specify data structures with sharing at a high-level using relations and then faithfully trans- late those specifications into efficient low-level representations. There are two
(LAUnit)
∆ fd ∅ → c c; ∆
l unit(c)
(LAMap)
C 2 ; ∆/c 1 l d c 1 ] C 2 ; ∆
l map(ψ, c 1 , d)
(LAJoin)
∆ fd C 1 → C 2 C 1 ∪ C 2 ; ∆
l d 1 C 1 ∪ C 3 ; ∆ l d 2 C 1 ] C 2 ] C 3 ; ∆
l join(d 1 , d 2 , L)
where ∆/C =
Fig. 7. Rules for logical adequacy C; ∆ `l d f ∈ {link(z 1 ,z 2 ), fuse(z 1 ,z 2 ),... } field names A = Z × f ∗^ addresses μ : A → A ∪ V memory Λ : dcontour → A layout
Fig. 8. Heaps
We say that an index d is adequate for a class of relations R if for every relation r ∈ R there is some tree decomposition ρ such that αt(ρ, d) = r. Figure 7 lists inference rules for a judgment C; ∆ `l d that is a sufficient condition for an index to be adequate for the class of relations with columns C that satisfy a set of FDs ∆. The inference rules enforce two properties. Firstly, the (LAUnit) and (LAMap) rules ensure that every column of a relation must be represented by the index; every column must appear in a unit or map index. Secondly, in order to split a relation into two parts using a join index, the (LAJoin) rule requires a functional dependency to prevent anomalies such as spurious tuples. We have the following lemma:
Lemma 1 (Soundness of Adequacy Judgement). If C; ∆ `l d then for each relation r with columns C such that r |= ∆ there is some ρ such that ρ |=T d and αt(ρ, d) = r.
3.3 Physical Representation
Heaps. Figure 8 defines the syntax for our model of memory. We represent the heap as function μ from a set of heap locations to a set of heap values. Our model of a heap location is based on C structs, except that we abstract away the layout of fields within each heap object. Heap locations are drawn from an infinite set A, and consist of a pair (n, f ) of an integer address identifying a heap object, together with a string of field offsets. Each integer location notionally has a infinite number of field slots, although we only ever use a small and bounded number, which can then be laid out in consecutive memory locations. The con- tents of each heap cell can either be a value drawn from V or an address drawn from A; we assume that the two sets are disjoint. The set of columns that are bound by following a static contour z is given by the function bound(z, d), defined as
bound(·, d) = ∅ bound(mz, map(ψ, c, d)) = c ∪ bound(z, d) bound(lz, join(d 1 , d 2 , L)) = bound(z, d 1 ) bound(rz, join(d 1 , d 2 , L)) = bound(z, d 2 )
Layouts. We use dynamic contours to name positions in a tree. A layout function Λ is a mapping from the dynamic contours of a tree to addresses from A. Layout
(PAUnit) ∆; Φ `p unit(c)
(PAMap)
∆/c 1 ; {x | mx ∈ Φ} p d ∆; Φ
p map(ψ, c 1 , d)
(PAJoin)
∀l ∈ L. ∆; Φ p d; l Φ′^ = Φ ∪ {z | (fuse, z, z′) ∈ L} ∆; {x | lx ∈ Φ′}
p d 1 ∆; {x | rx ∈ Φ′} p d 1 ∆; Φ
p join(d 1 , d 2 , L)
(PALink) bound(rz 1 m, d) ⊇ bound(lz 2 , d) ∆; Φ `p d; (link, rz 1 m, lz 2 )
(PAFuse) rz 1 m ∈/ Φ bound(rz 1 m, d) = bound(lz 2 , d) ∆; Φ `p d; (fuse, rz 1 m, lz 2 )
Fig. 9. Rules for physical adequacy ∆; Φ `p d [; l]
functions allow us to translate from semantic names for memory locations to a more machine-level description of the heap; the extra layer of indirection allows us to ignore details of memory managers and layout policies, and to describe fusion and cross-linking succinctly. All layouts must be injective; that is, different tree locations must map to different physical locations. We define operators that strip and add prefixes to the domain of a layout
Λ/x = {y 7 → a | (xy 7 → a) ∈ Λ}, and Λ × x = {xy 7 → a | (y 7 → a) ∈ Λ}.
Data Structures. In our present implementation, a map index can be represented by an option type (option), a singly-linked list (slist), a doubly-linked list (dlist), or a binary tree (btree). It is straightforward to extend the set of data structures by implementing a common data structure interface—we present this partic- ular selection merely for concreteness. The common interface views each data structure as a set of key-value pairs, which is a good fit to many, but not all possible data structures. Each data structure must provide low-level functions: pemptyψ a which creates a new structure with its root pointer located at address a, pisemptyψ a which tests emptiness of the structure rooted at a, plookupψ a v which returns the address a′^ of the entry with value v, if any, pscanψ a which
returns the set of all (a′, v) pairs of a value v and its address a′, pinsertψ a v a′ which inserts a new value v into the data structure rooted at address a′, and premoveψ a v a′^ which removes a value v at address a′^ from a data structure. Typical implementations can be found in the tech report [10]. For cross-linking and fusion to be well-defined in an index d, we need d to be physically adequate. This condition ensures that for cross-linking and fusion operations between static contours z 1 and z 2 , the mapping from z 1 to z 2 is a function for each cross-link declaration and an injective function for each fusion declaration. Further, as fusions constrain the location of an object in memory, we require any object is fused at most once for feasibility. We use the judgment form ∆; Φ `p d and the associated rules in Figure 9 to indicate that index d is physically adequate for functional dependencies ∆ where Φ denotes the set of static contours that have already been fused. The (PALink) and (PAFuse) rules ensure a suitable mapping by requiring the set of fields bound by the target contour of a link be a subset of the set of fields bound by the source contour; in the case of a fusion we require equality. The rule (PAFuse) ensures that no contour is fused twice. We assume that all indices are physically adequate.
returns the natural join of the two results. The qrjoin(q 1 , q 2 ) operator is sim- ilar, but executes the two queries in the opposite order. Both joins produce identical results, however the computational complexity may differ. Fuse Join The qfusejoin(z 0 , l, q 1 , q 2 ) operator switches the current index data structure by following a fuse or cross-link l and executes query q 2 ; it then switches back to the original location and executes q 1. The result is the natural join of the two sub-queries. Parameter z 0 identifies the join index that contains l; position y must be an instantiation of the source of l. For example, suppose in the directed graph example of Section 2.1 we want to find the set of successors of graph vertex 1, together with their edge weights. Figure 10 depicts one possible, albeit inefficient, query plan q consisting of the operations
q = qrjoin(qnone, qscan(qlookup(qfusejoin(·, (fuse, rmm, lmm), qunit, qunit)))).
Intuitively, to execute this plan we use the right-hand side of the join to iterate over every possible value for the dst field. For each dst value we check to see whether there is a src value that matches the query input, and if so we use a fuse join to jump over to the left-hand side of the join and retrieve the corresponding weight. (A better query plan would look up the src on the left- hand side of the join first, and then iterate over the set of corresponding dst nodes and their weights, but our goal here is to demonstrate the role of the qfusejoin operator.)
To find successors using query plan q, we start with the state (〈src 7 → 1 〉 , ·). Since the left branch of the join is qnone, the join reduces to a recursive execution of the query qscan(· · · ) with input (〈src 7 → 1 〉 , r). The qscan recursively invokes qlookup on each of the states (〈src 7 → 1 , dst 7 → 2 〉 , rm 2 ) and (〈src 7 → 1 , dst 7 → 3 〉 ,
qunit qunit
qnone
qrjoin
qscan
qlookup qfusejoin
l r
msrc^ mdst
mdst^ msrc
Fig. 10. A possible query plan for the graph example of Section 2.
rm 3 ). The qlookup operator in turn recursively in- vokes the qfusejoin operator on (〈src 7 → 1 , dst 7 → 2 〉 , rm 2 m 1 ) and the state (〈src 7 → 1 , dst 7 → 3 〉 , rm 3 m 1 ). To execute its second query argument the fuse join maps each instantiation of contour rmm to the cor- responding instantiation of contour lmm; we are guaranteed that exactly one such contour instan- tiation exists by index adequacy. The fuse join produces the states (〈src 7 → 1 , dst 7 → 2 〉 , lm 1 m 2 ) and (〈src 7 → 1 , dst 7 → 3 〉 , lm 1 m 3 ). Finally the invocations of qunit on each state produces the tuples
{〈src 7 → 1 , dst 7 → 2 , weight 7 → 17 〉 , 〈src 7 → 1 , dst 7 → 3 , weight 7 → 17 〉}.
We need a criteria for determining whether a particular query plan does in fact return all of the tuples that match a pattern. We say a query plan is valid, written d, z, X `q q, Y if q correctly answers queries in index d at dynamic instantiations of con- tour z, where X is the set of columns bound in the input tuple pattern t and Y is the set of columns bound in the output tuples (see the technical report [10]).
In this section we describe implementations for the primitive relation operators for the tree-decomposition and physical representations of a relation, and we prove our main result: that these primitive operators are sound with respect to their higher-level specification. Complete code is given in the tech report [10].
5.1 Operators on the Tree Decomposition
We implement queries over tree decompositions by a function tquery d t ρ, which finds tuples matching pattern t over tree decompositions ρ under index d. The core routine is a function tqexec ρ d q t y which, given a tree decomposition ρ, index d, and a tuple t, executes plan q at the position of the dynamic contour y. Creation/update are handled by tempty d, which constructs a new empty relation with index d, tinsert d t ρ, which inserts a tuple t into a tree-decomposed relation ρ with index d, and tremove d t ρ which removes a tuple t from a tree- decomposed relation ρ with index d. It is the client’s responsibility to ensure that functional dependencies are not violated; the implementation contains dynamic checks that abort if the client fails to comply. These checks can be removed if there is an external guarantee that the client will never violate the dependencies. To show that the primitive operations on tree decompositions faithfully im- plement the corresponding primitive operations on logical relations, we first show executing valid queries over tree decompositions soundly implements logical tu- ple pattern queries. We then prove a soundness result by induction.
Lemma 2 (Tree Decomposition Query Soundness). For all ρ, r, d such that ρ |=T d and αt(ρ, d) = r, if d, ·, dom t `q q, dom d for a tuple pattern t and query plan q we have tqexec ρ d q t · = query r t.
Theorem 1 (Tree Decomposition Soundness). Suppose a sequence of insert and remove operators starting from the empty relation produce a relation r. The corresponding sequence of tinsert and tremove operators given tempty d as input either produce ρ such that ρ |=T d and αt(ρ, d) = r, or abort with an error.
5.2 Physical Representation Operators
In this section we describe implementations of each of the primitive relation operations that operate over the physical representation of a relation. We prove soundness of the physical implementation with respect to the tree decomposition. For space reasons we omit the code for physical operators but we give a brief synopsis of each function; for a complete definition see the full paper [10]. We execute physical queries via a query execution function pqexec d q y a y. Function pqexec is structurally very similar to the query execution function tqexec over tree decompositions. Instead of a tree decomposition ρ the physical function accesses the heap, and in place of a dynamic contour y the physical function represents a position in the data structure by a pair (z, a) of a static contour z and an address a. The main difference in implementation is that the qfusejoin case follows a fusion or cross-link simply by performing pointer arith- metic or a pointer dereference, respectively, rather than traversing the index. Creation/update are handled by pempty d a (creates an empty relation with index d rooted at address a), pinsert d t a (inserts tuple t into a relation with
Inferring Shared Representations Some static analysis algorithms infer some sharing between data structures in low level code [13; 12]. In contrast we allow the programmer to specify sharing in a concise way and guarantee consistency only assuming that functional dependencies are maintained. Functional dependencies or their equivalent are an essential invariant for any shared data structure.
Verification Approaches The Hob system uses abstract sets of objects to specify and verify properties that characterize how multiple data structures share ob- jects [14]. Monotonic typestates enable aliased objects to monotonically change their typestates in the presence of sharing without violating type safety [9]. Re- searchers have developed systems to mechanically verify data structures (e.g., hash tables) that implement binary relational interfaces [22; 5]. The relation im- plementation presented here is more general, allowing relations of arbitrary arity and substantially more sophisticated data structures than previous research.
We have presented a system for specifying and operating on data structures at a high level as relations while implementing those relations as the composition of low-level pointer data structures. Most unusually we can express, and prove correct, the use of complex sharing in the low-level representation, allowing us to express many practical examples beyond the capabilities of previous techniques.
[1] C. Beeri, R. Fagin, and J. H. Howard. A complete axiomatization for functional and multivalued dependencies in database relations. In SIGMOD, pages 47–61. ACM, 1977. [2] J. Berdine, C. Calcagno, B. Cook, D. Distefano, P. O’Hearn, T. Wies, and H. Yang. Shape analysis for composite data structures. In CAV, pages 178–192, 2007. [3] G. Bierman and A. Wren. First-class relationships in an object-oriented language. In ECOOP, volume 3586 of LNCS, pages 262–286, 2005. [4] J. Cai and R. Paige. “Look ma, no hashing, and no arrays neither”. In POPL, pages 143–154, 1991. [5] A. J. Chlipala, J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Effective interactive proofs for higher-order imperative programs. In ICFP, pages 79–90,
[6] E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, 1970. [7] R. B. K. Dewar, A. Grand, S.-C. Liu, J. T. Schwartz, and E. Schonberg. Pro- gramming by refinement, as exemplified by the SETL representation sublanguage. ACM Trans. Program. Lang. Syst., 1(1):27–49, 1979. [8] D. Distefano and M. J. Parkinson. jStar: towards practical verification for Java. In OOPSLA, pages 213–226, 2008. [9] M. Fahndrich and R. Leino. Heap monotonic typestates. In Int. Work. on Alias Confinement and Ownership, July 2003. [10] P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. Data structure fusion (full), 2010. URL http://theory.stanford.edu/˜hawkinsp/papers/rel-full.pdf. [11] N. Klarlund and M. I. Schwartzbach. Graph types. In POPL, pages 196–205, Charleston, South Carolina, 1993. ACM.
[12] J. Kreiker, H. Seidl, and V. Vojdani. Shape analysis of low-level C with overlapping structures. In Proceedings of VMCAI, volume 5044 of LNCS, pages 214–230, 2010. [13] V. Kuncak, P. Lam, and M. Rinard. Role analysis. In POPL, pages 17–32, 2002. [14] P. Lam, V. Kuncak, and M. C. Rinard. Generalized typestate checking for data structure consistency. In VMCAI, pages 430–447, 2005. [15] E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling objects, relations and XML in the .NET framework. In SIGMOD, page 706. ACM, 2006. [16] C. Olston et al. Pig Latin: A not-so-foreign language for data processing. In SIGMOD, June 2008. [17] R. Paige and F. Henglein. Mechanical translation of set theoretic problem speci- fications into efficient RAM code. J. Sym. Com., 4(2):207–232, 1987. [18] J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In LICS, 2002. Invited paper. [19] T. Rothamel and Y. A. Liu. Efficient implementation of tuple pattern based retrieval. In PEPM, pages 81–90. ACM, 2007. [20] E. Schonberg, J. T. Schwartz, and M. Sharir. Automatic data structure selection in SETL. In POPL, pages 197–210, 1979. [21] O. Shacham, M. Vechev, and E. Yahav. Chameleon: adaptive selection of collec- tions. In PLDI, pages 408–418, 2009. [22] K. Zee, V. Kuncak, and M. C. Rinard. Full functional verification of linked data structures. In PLDI, pages 349–361, 2008.