Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Compiler Construction of Idempotent Regions and Applications in Architecture Design, Thesis of Compiler Design

University of Wisconsin (UW) - Whitewater Compiler Design

This dissertation is about the construction of idempotent regions and their applications in architecture design. The author discusses static analysis of idempotent regions, program transformation, optimizing for dynamic behavior, and code generation of idempotent regions. The document also includes acknowledgments to the author's advisor and committee members, as well as fellow students. The dissertation was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the University of Wisconsin-Madison in 2012.

Typology: Thesis

2011/2012

Uploaded on 05/11/2023

freddye 🇺🇸

4.3

(11)

235 documents

1 / 164

This page cannot be seen from the preview

Don't miss anything!

COMPILER CONSTRUCTION OF IDEMPOTENT REGIONS AND APPLICATIONS IN

ARCHITECTURE DESIGN

Marc A. de Kruijf

A dissertation submitted in partial fulfillment of

the requirements for the degree of

Doctor of Philosophy

(Computer Sciences)

at the

UNIVERSITY OF WISCONSIN–MADISON

2012

Date of final oral examination: 07/20/12

The dissertation is approved by the following members of the Final Oral Committee:

Karthikeyan Sankaralingam, Assistant Professor, Computer Sciences

Mark Hill, Professor, Computer Sciences

Gurindar Sohi, Professor, Computer Sciences

Somesh Jha, Professor, Computer Sciences

Mikko Lipasti, Professor, Electrical and Computer Engineering

Partial preview of the text

Download Compiler Construction of Idempotent Regions and Applications in Architecture Design and more Thesis Compiler Design in PDF only on Docsity!

COMPILER CONSTRUCTION OF IDEMPOTENT REGIONS AND APPLICATIONS IN

ARCHITECTURE DESIGN

By Marc A. de Kruijf

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy (Computer Sciences)

at the UNIVERSITY OF WISCONSIN–MADISON 2012

Date of final oral examination: 07/20/

The dissertation is approved by the following members of the Final Oral Committee: Karthikeyan Sankaralingam, Assistant Professor, Computer Sciences Mark Hill, Professor, Computer Sciences Gurindar Sohi, Professor, Computer Sciences Somesh Jha, Professor, Computer Sciences Mikko Lipasti, Professor, Electrical and Computer Engineering

Acknowledgments

This research product owes many things to many people. First is my advisor, Karu, who was instrumental in many ways. More than simply mentoring me in research, he has helped me to frame my own life—to strive to live happily, peacefully, and positively. His tireless work ethic in combination with his profound respect for life-balance is something that I admire greatly. Thanks almost entirely to him, my graduate career was rarely frustrating, full of interesting and exciting work, and very rewarding. I do not know what the future awaits, but thanks to you, Karu, I feel more prepared than ever before. The other committee member who deserves special thanks is Mark, my other professional role model. When I arrived at UW-Madison, Mark was my initial mentor, my CS 552 instructor, and he was the one to invite me to my first Computer Architecture Affiliates meeting only a month after my arrival. Mark never wavered in his support or his willingness to offer guidance. He may not know it, but Mark made me a computer architect. Thank you, Mark. Among the other members of committee, Guri forced me to think critically about my own ideas while remaining always supportive, Somesh was a crucial resource in developing key pieces of my research—he taught me to think in very precise terms, and his jovial and spirited nature was always an inspiration to me—and Mikko was an excellent resource for technical discussions in addition to being just an all-around great person. Thank you, Guri, Somesh, and Mikko. There is no shortage of fellow students to thank. First, the Vertical group. From the days when I was the only student of the group (with an office all to myself), the group is now over ten students strong. Thanks to everyone, with special thanks to my officemates, Venkat and Tony, who always provided great fuel for discussion and distraction. Among the other members, Emily, Chen-han, Jai, Raghu, Zach, Ryan, and Chris also deserve special mention for their support and camaraderie.

iii

Contents................................................. iii

Abstract

In the field of computer architecture today, out-of-order execution is important to maximize archi- tectural efficiency, the shadow of unreliable hardware is ever-looming, and, with the emergence of mainstream parallel hardware, programmability is once again an important and fundamental challenge. Traditionally, hardware checkpointing and buffering techniques are used to assist with each of these problems. However, these techniques introduce overheads, add complexity to the hardware, and often save more state than necessary. With today’s renewed focus on energy effi- ciency, and with the commercial importance of reduced hardware complexity in today’s processor market, the efficacy of these techniques is no longer absolute. This thesis develops a novel compiler-based technique to efficiently support a range of hardware features without the need for checkpoints or buffers. The technique breaks programs into idempotent regions —regions that can be freely re-executed—to enable recovery by simple re-execution. The thesis observes that programs can be executed entirely as sequences of idempotent regions, and builds a classification framework to concretely reason about different interpretations of idempotence that apply in the context of computer architecture. It develops static analysis and compiler code gen- eration algorithms and techniques to construct idempotent regions and subsequently demonstrates low overheads and potentially large region sizes for an LLVM-based compiler implementation. Finally, it demonstrates applicability across a range of modern architecture designs in addressing a variety of problems. The thesis presents several findings. First, it finds that inherently large idempotent regions, in the range of tens to hundreds of instructions, exist across entire programs. It also finds that a compiler algorithm for constructing the largest possible regions, through careful allocation of function-local state, is capable of constructing regions close to these sizes. Various algorithms are

vii

demonstrated that are able to sub-divide these regions into smaller regions to optimize for specific constraints. In the end, however, code generation of small idempotent regions forces relatively high compiler-induced run-time overheads in the range of 10-20% (often increasing register pressure by over 50%), while, for larger regions, this overhead quickly approaches zero as region size grows beyond a few tens of instructions. Thus, the compiler-induced costs of constructing small regions are often out-weighed by any benefits, and optimization trade-offs thus generally favor constructing regions that are a few tens of instructions or more. This optimization goal tailors the suitability of idempotence-based recovery to specific architecture domains; this thesis considers specifically architecture design and evaluation for general exception support in GPUs, out-of-order retirement in general-purpose processors, and hardware fault tolerance in emerging processor designs.

Correctness: As transistor technology continues to scale to lower feature sizes, hardware is be- coming increasingly unreliabile [ 19 ]. To allow programs to continue to operate correctly even in the face of hardware transient or permanent faults, some form of recovery support is increasingly needed. To reconcile the need to reduce processor overheads with the desire to support program recovery, this thesis develops idempotence to support efficient recovery by re-execution. Idempotence is the property that re-execution has no side-effects; that is, an operation can be executed multiple times with the same effect as executing it only once. At the coarsest granularity any application whose inputs do not change during execution is idempotent. At the finest granularity every instruction that does not modify its source operands is also idempotent. In both cases, re-executing the operation does not change the effect of the initial execution.

Why Idempotence?

An operation is idempotent if its inputs do not change over the course of its execution. Hence, idempotence can be thought of as implicitly forming a checkpoint with respect to the inputs of the operation. In this manner, idempotence over a region of code can render traditional hardware checkpointing techniques unnecessary; in the event of failure, idempotence can be used to correct the state of the system by simple re-execution. Moving from explicit hardware checkpointing to an implicit checkpointing model built upon idempotence can benefit computer systems at multiple levels. First, at the microarchitecture level, the absence of hardware buffering and/or checkpointing reduces interdependencies between processor structures, reduces power and area, and allows existing hardware resources to be used more efficiently in the absence of contention. Second, at the circuit level, lower threshold voltages and tighter noise margins on transistors make hardware design and verification increasingly difficult; hence, less functionality in hardware implies substantially lower hardware design and verification effort. Finally, at the program level, checkpoints can be inflexible. This inflexibility is not only inconvenient, but it can also hurt overall efficiency if the checkpoints are overly conservative.

1.1 Contributions

This thesis observes that applications can fully decompose into idempotent regions of code, and that these regions can be used to recover from a range of failure scenarios. The size and arrangement of idempotent regions is configurable during compilation, and a compiler can construct idempotent regions that are usefully large at the expense of only a small amount of run-time overhead. This thesis makes the following specific contributions:

Idempotence in computer architecture: It presents the first comprehensive analysis of idempo- tence and its implications for architecture and compiler design. In particular:

it is the first work to observe that programs can decompose entirely into idempotent regions;
it observes that idempotence can can be used to recover from a variety of failure conditions by simple re-execution;
it identifies a variety of idempotence “models” that apply in the context of computer architecture and presents a taxonomy to concretely reason about them;
it analyzes the potential sizes of idempotent regions that arise from the space of idempo- tence models and finds that the regions can be large; and
it converges on two idempotence models that have the desirable properties of allowing (1) the decomposition of programs entirely into idempotent regions and (2) the construction of idempotent regions of maximal size with respect to the common case of data-race-free multi-threaded execution. Compiler design: It develops a complete end-to-end compiler design and implementation for the automated formation and code generation of idempotent regions considering a range of design trade-offs. In particular:
it describes a static analysis algorithm to uncover the minimal set of idempotent regions in a function given semantic constraints;

Compiler design – code generation: Code generation of small (semantically-constrained) idempo- tent regions commonly forces performance overheads of over 10%. For larger regions, this overhead approaches zero in the limit as region size grows beyond a few tens of instructions. Compiler design – ISA sensitivity: Among three ways in which the ISA could affect the run- time overheads of idempotence-based compilation, none appear significant. Independent of the ISA, small (semantically-constrained) idempotent regions increase register pressure by approximately 60%. For larger regions, register pressure effects approach zero in the limit as region size grows beyond a few tens of instructions. Architecture design: GPUs can support general exceptions cleanly using idempotence with run- time overheads of less than 2% (for traditional GPU workloads). CPUs can be simplified to support exceptions with out-of-order retirement with typical run-time overheads of 10%. Adding support for efficient branch-misprediction recovery using idempotence on CPUs increases the typical run-time overheads to 20%. Finally, architectures can use idempotence to support hardware fault recovery with run-time overheads of roughly 10%, assuming low-latency fault detection capability.

1.3 Organization

The core of the dissertation is organized into three parts: idempotence models in computer architecture , compiler design & evaluation , and architecture design & evaluation. These three parts span Chapters 2-6, with the closing Chapters 7-8 presenting related work and conclusions.

Idempotence Models in Computer Architecture Chapter 2 explores and analyzes the concept of idempotence as it applies to computer architecture. As background, it presents examples of idempotence applied in computer science and subsequently develops a taxonomy to reason about idempotence specifically as it applies to computer architecture. Leveraging this taxonomy, it performs an empirical study of the sizes of idempotent regions that could be attained for different idempotence models arising from the taxonomy given semantic

program constraints. Finally, it identifies the two idempotence models— architectural and contextual idempotence—that are developed in the remainder of the dissertation.

Compiler Design & Evaluation Chapters 3-5 present the static analysis, code generation, and evaluation of a compiler design that constructs idempotent regions in programs, optimizing for architectural and contextual idempo- tence, across a range of application and environmental constraints. Chapter 3 develops a static analysis for identifying the largest idempotent regions given semantic program constraints. Chap- ter 4 develops support for sub-dividing regions and preserving the idempotence property of these regions as they are compiled down to machine instructions. Finally, Chapter 5 presents a comprehensive evaluation of a full, end-to-end compiler implementation.

Architecture Design & Evaluation Chapter 6 motivates and develops the architecture support to utilize idempotence for recovery across a range of architecture designs. The overall architectural vision is one where the analysis of idempotence occurs in software (e.g. in a compiler), and the hardware consumes the output of this analysis to enable hardware design simplification and flexibility. Specifically, the applications to GPU, CPU, and emerging fault-tolerant architecture designs are explored and evaluated. In constrast to the rigorous compiler implementation evaluation of Chapter 5, the individual architecture evaluations are more abstract, using simulation-based evaluation. Detailed microarchitecture design and implementation is left as a topic for follow-on work.

1.4 A Note on Experimental Methodology

All three parts of the dissertation are empirically grounded with a largely common experimental methodology used throughout. However, there are differences as the experimental purpose varies. Table 1.1 highlights the primary differences. Regarding benchmarks, the benchmark suites we study throughout are SPEC 2006 [ 99 ], a suite targeted at conventional single-threaded workloads, PARSEC [ 16 ], a suite targeted at emerging

Prior Work Topic Chapters PLDI 2012 [29] Static analysis and compiler design 3, 5, 6 ISCA 2012 [70] Application to GPU architecture 6 MICRO 2011 [28] Application to CPU architecture 5, 6 Table 1.2: The relation of the author’s prior work to the dissertation material.

compiler implementation that balances the execution overheads associated with smaller idempotent regions against those potentially associated with larger regions. Namely, Chapter 2, Chapter 4, and parts of Chapter 5 are largely unique to this thesis and are not part of previously published work.

2 Idempotence in Computer Architecture

This chapter analyzes idempotence and idempotence-based recovery specifically in the context of application programs executed as sequences of instructions. It develops a framework for the analysis of idempotence in this context and develops a taxonomy to reason about a spectrum of idempotence models. It subsequently offers empirical and qualitative analysis to identify two specific models— architectural and contextual idempotence—that are deemed meaningful for exploration in subsequent chapters. Parts of this chapter are heavy on formalism; with an understanding of certain specific char- acteristics of architectural and contextual idempotence, the impatient reader is free to skip this chapter and continue on to the remaining chapters of this dissertation. The relevant characteristics are as follows. Both models allow the construction of idempotent regions of maximal size with respect to the common case of data-race-free multi-threaded execution. Importantly, both models specifically assume invariable control flow semantics upon re-execution with respect to non-local memory state. Where the two models differ is in what they assume with respect to other (local) state: while architectural idempotence again assumes invariable control flow, contextual idempotence allows for variable control flow semantics. The chapter is organized as follows. Section 2.1 presents the intuition behind taxonomy it develops, presenting example idempotence models over sequences of instructions. Section 2. then formally defines key terms and Section 2.3 presents the taxonomy, identifying three axes of variation within the taxonomy. A permutation of the points along these axes forms an idempotence model. Section 2.4 analyzes the space of idempotence models and then distills the space to two models, architectural and contextual idempotence, that are deemed most meaningful. Section 2. presents a summary and conclusions.

processing state of switches used to deliver the request over the network. Finally, a load or store instruction causing an exception may invoke an operating system service routine that updates some system-internal state (e.g. page table entries). From this discussion, it is evident that the power of idempotence applied over a system lies in part with how that system is defined. Considering the architecture underlying the execution of an application program as the system, there are multiple definitions, or models , of idempotence that are meaningful. This chapter develops a formal taxonomy to concretely reason about these different models as they emerge from assumptions about the architecture environment. The discussion below presents intuition by presenting example models.

Example Idempotence Models

As stated earlier, an operation is idempotent if the effect of executing it multiple times is the same as the effect of executing it only once. This property is achieved if the operation’s inputs are preserved throughout its execution; with the same inputs, the operation will produce the same outputs each time it executes. However, what it means to “preserve an input” is subject to interpretation, and many different interpretations make sense depending on the context. This section presents four different example interpretations (models) that are all meaningful in the context of programs executed as sequences of instructions. A region is considered the unit of operation, and the following definitions are assumed:

Region: A region is defined as a collection of instructions uniquely identified by the single instruc- tion that forms its entry point. A region contains the set of instructions reachable by control flow from its entry point up to its exit points.

Live-in: A variable is live-in to a region if the variable may hold a value that is (a) defined (written) before entry to the region and (b) potentially used (read) after entry to the region.

The code of the function shown in Figure 2.1, written in the C programming language, is used as an example of inherently non-idempotent code that can be divided into idempotent regions. The function, list_push, checks a list for overflow and then pushes an integer element onto the

Figure 2.1: Example source code.

end of the list. The left side of Figure 2.2 shows the function compiled to a stylized assembly code organized into basic blocks, with arrows connecting the control flow between basic blocks. The code assumes four registers are available, R0-R3, with function arguments held in registers R0 and R1, and R0 also the return register. In the discussion that follows, the effect of a given idempotence model is measured by forming the set of maximally-sized idempotent regions found by greedily scanning and incrementally adding instructions to a region until doing so would render the region non-idempotent, at which point a new idempotent region is formed starting at the next instruction that is itself idempotent. In practice, identifying idempotent regions—in particular, semantically idempotent regions—requires a more sophisticated analysis (see Chapter 3); this algorithm is assumed for illustration purposes only.

Compiler Construction of Idempotent Regions and Applications in Architecture Design, Thesis of Compiler Design

Related documents

Partial preview of the text

Download Compiler Construction of Idempotent Regions and Applications in Architecture Design and more Thesis Compiler Design in PDF only on Docsity!

COMPILER CONSTRUCTION OF IDEMPOTENT REGIONS AND APPLICATIONS IN

ARCHITECTURE DESIGN

Acknowledgments

iii

Contents

Abstract

1.1 Contributions

1.3 Organization

1.4 A Note on Experimental Methodology

2 Idempotence in Computer Architecture