






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Some concept of Dependable Computing Systems are Terminology and Concepts, Software Fault Tolerance, Markov Models, Information Redundancy, Fault Detection in Processors, Defect Tolerance in Vlsi Circuits. Main points of this lecture are: Defect Tolerance in Vlsi Circuits, Duplication with Complementary Logic, Boolean Algebra, Boolean Function, Complement Operations, Dual Function, Fault Detection, Time Redundancy, Alternating Logic
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
We will consider the following redundancy techniques to tolerate defects in VLSI circuits.
This technique duplicates a given module and compares the outputs of the resulting two modules. As long as the comparator works correctly, a failure of any one of the two modules is detected. The problems with duplication with comparison are two fold: (1) the comparator may fail and (2) the approach assumes that only one of the two duplicated modules will fail at any given time, that is, it ignores common mode failures that cause the two modules to fail in the same fashion at the same time. So, we need to modify the design of duplication with comparison schemes to minimize the effect of common-mode failures.
One technique useful in tackling problems with common-mode failures in VLSI circuits is in the use of complementary logic where one circuit uses positive logic (that is, logic 1) while the other circuit uses negative logic (that is, logic 0). Suppose we know the Boolean function realized by a circuit using positive logic, we can easily determine the function realized by the same circuit using negative logic using the concept of duality.
Recall from Boolean algebra that the dual of a Boolean function can be formed by replacing AND operations with OR operations, OR operations with AND operations, 1s with 0s, and 0s with 1s. The variables and complement operations are not changed. For example, consider the function
f (x 1 , x 2 , x 3 ) = x 1 x¯ 2 + x 3
The dual of the function f is given by
fd(x 1 , x 2 , x 3 ) = (x 1 + ¯x 2 )x 3
We can use the dual function fd to obtain the complement of f by replacing each variable in fd with its complement. f¯ (x 1 , x 2 , x 3 ) = fd(¯x 1 , x¯ 2 , ¯x 3 ) = (¯x 1 + x 2 )¯x 3
Let X be a vector consisting of n input bits given by X = (x 1 , x 2 ,... , xn). If we apply X to an arbitrary Boolean function f and then apply X¯ = (¯x 1 , ¯x 2 ,... , x¯n to the function fd, where f and fd are duals, the resulting outputs will be complementary. That is, fd( X¯) = f¯ (X).
∗These notes are adapted from: B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison Wesley,
Fig. 1: Implementation of the function f (x 1 , x 2 , x 3 ) = x 1 x 2 x¯ 3 + ¯x 1 x¯ 2 x 3 and its dual.
Complementary logic can be used to implement a duplication with comparison approach to fault detection. Rather than use exact replicas of each module, the modules are designed as duals of each other. One module operates using positive logic and the other module operates operates using negative logic. If both modules are operating properly, the outputs will be complementary.
There are three advantages of using complementary logic: (1) The use of dual implementations forces the use of separate masks to create the two modules. The possibility of common-mode failures resulting from design mistakes or mask problems is reduced. (2) The voltage transitions on the corresponding lines in the two modules are in opposite directions, and so, the possibility of faults that are sensitive to voltage transitions producing identical effects is reduced. (3) Corresponding lines in the two modules are always at different voltage levels, and so, a short between two such lines always results in one of the two lines having an erroneous value and the other line having the correct value. Consequently, the fault can be detected.
Let us consider the design of a duplicate and compare scheme and the concept of complementary logic to realize the function f (x 1 , x 2 , x 3 ) = x 1 x 2 x¯ 3 + ¯x 1 x¯ 2 x 3
The dual of f is given by fd(x 1 , x 2 , x 3 ) = (x 1 + x 2 + ¯x 3 )(¯x 1 + ¯x 2 + ¯x 3 )
Fig. 1 shows the logic diagrams of the circuits that realize f and fd, respectively. The original function and its dual are now operated in parallel using complementary input combinations, as shown in Fig. 2. Logic values on corresponding lines in the two modules are complementary. The outputs, in the fault-free case, will also be complements, and can be compared to detect faults.
One of the problems with the duplicate and compare approach is the penalty paid in extra hardware. Time redundancy is a way to decrease the hardware overhead needed to achieve fault detection (or fault tolerance), at the expense of using additional time. The basic concept of time redundancy is to repeat computations in such a way that allows faults (both transient and permanent) to be detected. The approach used to detect
Fig. 4: If time redundancy is used to tolerate permanent faults, the computations must be modified when they are performed the second time.
where f is the Boolean expression for the circuit and X is the input vector for the circuit. In other words, a combinational circuit is self-dual if the output of the circuit for the input vector X is the complement of the output when the input vector X¯ is applied. So, for a self-dual circuit, the application of an input X followed by the input X¯, produces outputs that alternate between 1 and 0. The key to detecting faults is determining that at least one input combination exists for which the fault does not result in alternating outputs. The full-adder circuit shown in Fig. 5 is a self-dual circuit.
Any combinational circuit with n inputs can be transformed into a self dual circuit with no more than n + 1 input variables. The dual fd of an n-variable function f is given by
fd = f¯ (¯x 1 , x¯ 2 ,... , x¯n)
The function given by fsd = xn+1f + ¯xn+1fd
is a self-dual function because when xn+1 = 1, that is, when X = (x 1 , x 2 ,... , xn, xn+1) = (x 1 , x 2 ,... , xn, 1), then the value of fsd is f. When xn+1 = 0, that is, when we provide X¯ = (¯x 1 , ¯x 2 ,... , x¯n, 0), the value of fsd is fd. Thus, xn+1 is a control line that determines which of the two functions, f or fd, appear on the output line.
Alternating logic detects a set of faults, if for every fault within the set, there is at least one input combination that produces non-alternating outputs. Fig. 6 shows the resulting truth table for the various stuck-at-1 or stuck-at-0 faults present in the full adder circuit in Fig. 5. As we can see, each stuck-type fault results in at least one set of non-alternating outputs being produced for complementary inputs at either the carry or the sum output.
When using alternating logic, it is important to note that faults may not be immediately detected. For example, suppose that the full-adder contains a stuck-at-0 fault on line D. As we can see from the truth table, the sum output is not affected by this fault. So, we must depend on the carry output to detect this fault. The carry output, however, will have alternating outputs for the complimentary input combinations (000) and (111) as well as (001) and (110). So, the fault D/0 is not detected until the input combination (010) and (101), or the combination (011) and (100) is applied to the circuit. Depending on when these combinations are actually applied to the circuit, the time elapsed before the detection of the fault can be significant.
Fig. 5: A full-adder is a self-dual circuit. Complementary inputs produce complementary outputs.
Another form of time redundancy is called recomputing with shifted operands (RESO), and RESO was developed as a method to detect errors in arithmetic logic units (ALUs). (RESO is discussed in page 160 of the text book.)
We will illustrate how RESO is used using the example of a n-bit ripple carry adder that performs. Suppose that the ith^ full-adder cell (or slice) is faulty and produces an erroneous value for the function’s output at that bit slice. During the first computation when the operands are not shifted, the ith^ output of the circuit is erroneous. When the input operands are shifted left by one bit, the faulty bit slice then operates on, and corrupts the (i − 1)th^ bit. When the result is shifted back to the right, the two results—the first with unshifted operands and the second with shifted operands—are either both correct, or they disagree in either (or both) the ith^ or the (i − 1)th^ bits.
Suppose we compute R = A + B, and the ith^ full adder is faulty. When the operands are unshifted
Rf ault f ree = rnrn− 1... riri− 1... r 1 r 0 Rf aulty = rnrn− 1... r∗ i ri− 1... r 1 r 0 (1)
where r∗^ is the error in the result bit due to the faulty bit slice. A faulty bit slice can have one of three effects: the sum bit can be stuck at 0 or 1, the carry bit can be stuck at 0 or 1, or both the sum bit and the carry bit may be in error. The following table shows the effect of each possible error on the sum R.
Fig. 7: The ALU structure using RESO.
Summarizing, the result will be incorrect by one of [0, ± 2 i−^2 , ± 2 i−^1 , ± 3. 2 i−^2 ]. Comparing the two tables, we see that the results of the two computations (that is, the unshifted and the one where the operands are shifted by two) cannot agree unless both are correct.
The structure of an ALU that uses the RESO techniques is shown in Fig. 7. The additional hardware required for the technique are the three shifters, the storage register to hold the results of the first computation, and the comparator. Also, the ALU must be extended by 2 bits to allow the two-bit arithmetic shift to be performed without an overflow.
The primary issues with the RESO approach are the additional hardware required and the lack of coverage provided for faults in the shifters and the comparator.
Self-Checking logic is needed to tackle the “checking the checker” problem. In duplicate and compare approaches, it is necessary to compare the outputs of two modules. So, the basic problem is to ensure that the comparator is fault free, or to design a comparator that can detect its own fault, or a self-checking comparator. First, we define several terms that are important to understand self-checking technology.
A circuit is said to be self-checking if it has the ability to detect the existence of a fault without the need for any externally applied stimulus (like what is done in circuit testing). In other words, a self-checking circuit determines if it contains a fault during the normal course of its operation. Self-checking logic is typically designed using coding techniques where the basic idea is to design a circuit that, when fault free and presented with a valid input code word, will produce the correct output code word. If a fault exists, however, the circuit should produce an invalid output code word so that the fault can be detected.
Fig. 8: Basic structure of a TSC circuit.
correct code word or producing a non-code word, for any valid input code word. In other words, if the circuit is fault secure, then the fault either has no effect on the output or the output is affected in such a way that it becomes an invalid code word.
The general structure of a TSC circuit is shown in Fig 8. During normal operation, coded inputs are applied to the circuit and coded outputs are produced at the circuit’s output. Note that, rather than have a single-bit output that provides a “faulty” or “not faulty” indication, the output consists of two bits that are: (1) complementary if the input to the checker is a valid code word and the checker is fault free, or (2) non-complementary if the input to the checker is not a valid code word or the checker contains a fault. An obvious reason for using two checker outputs is to overcome the problem of the checker output becoming stuck at either the logic 0 or the logic 1 value.
The most common TSC checker is the two-rail checker shown in Fig. 9. The two-rail checker is used to com- pare two words that would normally be complementary. If the words are complementary and the checker itself is fault free, the outputs of the checker should also be complementary. If the two words are not com- plementary or the checker contains a fault, the outputs of the checker should not be complementary.
A simple design of a 2-bit TSC two-rail checker is shown in Fig. 9 where each of the two input words is two bits. The first input word is (x 0 , x 1 ), and the second input word is (y 0 , y 1 ). Valid code words on the inputs will have x 0 = ¯y 0 and x 1 = ¯y 1. From the logic of the circuit, we see that
e 1 = x 0 y 1 + y 0 x 1 e 2 = x 0 x 1 + y 0 y 1 (3)
Fig. 10: An 8-bit TSC checker using 2-bit TSC checkers as building blocks.
the outputs of the checker become
e 1 = x 0 x 1 + x 0 x 1 = x 0 x 1 e 2 = x 0 x 1 + x 0 x 1 = x 0 x 1 (7)
which are identical.
We can also show that the TSC circuit is fault secure with respect to single stuck-line faults, and also satisfies the self-testing property. The proof is left to the reader.
Finally, it is possible to create TSC two-rail checkers with the larger number of input bits using the circuit in Fig. 9 as the basic building block. Fig. 10 shows a hierarchical construction of a 8-bit TSC checker using 2-bit TSC checkers as building blocks. The notation eji is used to denote the ith^ error signal from the jth checker, and e 1 and e 2 denote the primary error-signal outputs. The four checkers in the first level of the hierarchy each compare 2 bits from the 8-bit operands and each produce two error signals. Checkers in the second and third levels of the hierarchy verify that the error signals from the checkers at the first level are indeed complementary.
A natural feature of the two-rail checker is the requirement that the two input operands be complements in the fault-free case. If we simply consider duplication with comparison, then the input from one of the modules must be inverted before the checking process.