




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A sample computation of the md6 hash algorithm, which is an older version of the md5 hash algorithm. It includes 160 words of input message and the corresponding 16 words of output chaining values for each round of the compression function.
Typology: Lab Reports
1 / 236
This page cannot be seen from the preview
Don't miss anything!
Abstract
This report describes and analyzes the MD6 hash function and is part of our submission package for MD6 as an entry in the NIST SHA-3 hash function competition^1. Significant features of MD6 include:
A cryptographic hash function h maps an input M –a bit string of arbitrary length—to an output string h(M ) of some fixed bit-length d. Cryptographic hash functions have many applications; for example, they are used in digital signatures, time-stamping methods, and file modification detection methods. To be useful in such applications, the hash function h must not only provide fixed-length outputs, but also satisfy some (informally stated) cryptographic properties:
A history of cryptographic hash functions can be found in Menezes et al. [63, Ch. 9]; a more recent survey is provided by Preneel [80]. The purpose of this report, however, is to describe and analyze the MD hash function, not to survey the prior art (which is considerable). Some readers may find it helpful to begin their introduction to MD6 by reviewing the powerpoint slides:
http://group.csail.mit.edu/cis/md6/Rivest-TheMD6HashFunction.ppt
from Rivest’s CRYPTO’08 invited talk.
1.1 NIST SHA-3 competition
This document is part of our submission of MD6 to NIST for the SHA-3 com- petition [70]. We have attempted to respond to all of the requirements and requests for information given in the request for candidate SHA-3 algorithm nominations. This report does not contain computer code implementing MD6 or other documents relevant to our submission. These can all be found in our submission package to NIST, and on our web site:
http://groups.csail.mit.edu/cis/md.
Updated versions of this report, and other MD6-related materials, may also be available on the MD6 web site.
1.2 Overview
This report is organized as follows. Chapter 2 gives a careful description of the MD6 hash function, including its compression function and mode of operation. Chapter 3 describes the design rationale for MD6. Chapter 4 describes efficient software implementations of MD6, including parallel implementations on multi-core processors and on graphics processing units. Chapter 5 describes efficient hardware implementations of MD6 on FPGA’s, special-purpose multi-core chips, and ASIC’s. Chapter 6 analyzes the security of the MD6 compression function. Chapter 7 analyzes the security of the MD6 mode of operation. Chapter 8 discusses issues of compatibility with existing standards and ap- plications. Chapter 9 discusses variations on the MD6 hash function; that is, how MD can be “re-parameterized” easily to give new hash functions in the “MD6 fam- ily”. Appendices A and B describe the constants Q and S used in the MD computation. Appendix C gives some sample computations of MD6. Appendix D summarizes our notations. Appendix E describes the additional documents we are submitting with this proposal. Appendix F gives information about each of the MD6 team members, in- cluding contact information.
use of W above is an exception.) We let both A[i] and Ai denote the i-th element of A. We use 0-origin indexing. The notation A[i..j] (or Ai..j ) denotes the subarray of A from A[i] to A[j], inclusive. MD6 is defined in a big-endian way: the high-order byte of a word is defined to be the “first” (leftmost) byte. This is as in the SHA hash functions, but dif- ferent from in MD5. Big-endian is also frequently known as “network order,” as Internet network protocols normally use big-endian byte ordering. We number the bytes of a word starting with byte 0 as the high-order byte, and similarly number the bits of a byte or word so that bit 0 is the most-significant bit. (The underlying hardware may be little-endian or big-endian; this doesn’t matter to us.) Other more-or-less standard notation we may use includes:
⊕: denotes the bitwise “XOR” operator on words. ∧: denotes the bitwise “AND” operator on words. ∨: denotes the bitwise “OR” operator on words. ¬x: denotes the bitwise negation of word x. x << b: denotes x left-shifted by b bits (zeros shifting in). x >> b: denotes x right-shifted by b bits (zeros shifting in). x <<< b: denotes x rotated left by b bits. x >>> b: denotes x rotated right by b bits. ||: denotes concatenation. 0x.. .: denotes a hexadecimal constant.
Additional notation can be found in Appendix D.
2.2 MD6 Inputs
This section describes the inputs to MD6. Two inputs are mandatory, while the other three inputs are optional.
M – the message to be hashed (mandatory).
d – message digest length desired, in bits (mandatory).
K – key value (optional).
L – mode control (optional).
r – number of rounds (optional).
The only mandatory inputs are the message M to be hashed and the desired message digest length d. Optional inputs have default values if any value is not supplied. We let H denote the MD6 hash function; subscripts may be used to indicate MD6 parameters.
The first mandatory input to MD6 is the message M to be hashed, which is a sequence of bits of some finite length m, where
0 ≤ m < 264.
In accordance with the NIST requirements, the length m of the input mes- sage M is measured in bits, not bytes, even though in practice an input will typically consist of some integral number (m/8) of bytes. The length m does not need to be known before MD6 hashing can begin. The NIST API for SHA-3^1 provides the input message sequentially in an arbitrary number of pieces, each of arbitrary size, through an Update routine. A call to Final then signals that the input has ended and that the final hash value is desired. MD6 is tree-based and highly parallelizable. If the entire message M is available initially, then a number of different processors may begin hashing operations at a variety of starting points within the message; their results may then be combined.
The second input to MD6 is the desired bit-length d of the hash function output, where 0 < d ≤ 512. The value d must be known at the beginning of the hash computation, as it not only determines the length of the final MD6 output, but also affects the MD6 computation at every intermediate operation. Changing the value of d should result in an “entirely different” hash function— not only will the output now have a different length, but its value should appear to be unrelated to hash-values computed for the same message for other values of d. MD6 naturally supports the digest lengths required for SHA-3: d = 224, 256 , 384 and 512 bits, as they are within the allowable range for d.
Often it is desirable to work with a family {Hd,K } of hash functions, indexed not only by the digest size d but also by a key K drawn from some finite set. These instances should appear to be unrelated—the behavior of Hd,K should not have any discernible relation to that of Hd,K′ , for K 6 = K′. The MD6 user may provide a K of keylen bytes, for any key length keylen, where 0 ≤ keylen ≤ 64. (^1) http://csrc.nist.gov/groups/ST/hash/sha-3/Submission_Reqs/crypto_API.html
the default in order to represent a value “sufficiently large” that the sequential mode of operation is never invoked. Section 2.4 gives more details on MD6’s mode of operation.
The MD6 compression function f has a controllable number r of rounds. Roughly speaking, each round corresponds to one clock cycle in a typical hardware im- plementation, or 16 steps in a software implementation. The default value of r is
r = 40 + bd/ 4 c ; (2.1)
so Hd,K,L = Hd,K,L,40+bd/ 4 c. For d = 160, MD6 thus has a default of r = 80 rounds; for d = 512, MD6 has a default of r = 168 rounds. One may increase r for increased security, or decrease r for improved performance, trading off security for performance. However, we also require that when MD6 is used in keyed mode, that r ≥ 80. This provides protection for the key, even when the desired output is short (as it might be for a MAC). Thus, when MD6 has a nonempty key, the default value of r is r = max(80, 40 + bd/ 4 c). (2.2) The round parameter r is exposed in the MD6 API, so it may be explicitly varied by the user. This is done since reduced-round versions of MD6 may be of interest for security analysis, or for applications with tight timing constraints and reduced security requirements. Or, one could increase r above the default to accommodate various levels of paranoia. Also, if there is a key, but it is non-secret, then fewer than 80 rounds could be specified if desired. Arguably, the current need to consider a new hash function standard might have seemed unnecessary if the API for the prior standards had included a variable number of rounds.
There are other parameters to the MD6 hash function that could also be varied, at least in principle (e.g. w, Q, c, t 0... t 5 , ri, `i, Si for 0 ≤ i < rc). For the purpose of defining what “MD6” means, these quantities should all be considered fixed with default values as described herein. But variant MD6 hash functions that use other settings for these parameters could be considered and studied. See Chapter 9 for a description of how these parameters might be varied.
We suggest the following approach for naming various versions of MD6. In the simplest case, we only need to specify the digest size: MD6-d specifies the version of MD6 having digest size d. This version also has the zero-length
key nil, L = 64 (i.e. fully hierarchical operation), and a number r of rounds that is the default for that digest size. These are the MD6 versions most relevant for SHA-3:
MD6- MD6- MD6- MD6-512.
Some of our experiments also consider MD6-160, as it is comparable to SHA-1. Software implementations of MD6 typically use the lower-case version of the name MD6, as in “md6sum”. Naming non-standard variants of MD6 is discussed in Section 9.1. The CRYPTO 2008 invited talk by Rivest also called MD6 the “pumpkin hash”, noting that the due date for SHA-3 submissions is Halloween 2008. One could thus also label the MD6 variants as PH-256, etc. ...
2.3 MD6 Output
The output of MD6 is a bit string D of exactly d bits in length:
D = Hd,K,L,r(M ) ;
D is the hash value of input message M. It is also often called a “message digest.” The “MD” in the name “MD6” reflects this terminology. In some contexts, the MD6 output may be defined to include other parame- ters. For example, with digital signatures, a hash function needs to be applied once by the sender, and once again by the recipient, to the same message. These computations should yield the same result. For this to work, the recipient needs to know not only the message M and the message digest length d, but also the values of any of the parameters K, L, and r that have non-default values. In such applications, these parameters (other than K) could be considered as part of the hash function output. At least, they need to be communicated to the receiver along with the hash function value D, communicated in some other way from sender to receiver, or agreed in advance to have some particular non-default settings.
2.4 MD6 Mode of Operation
A hash function is typically constructed from a “compression function”, which maps fixed-length inputs to (shorter) fixed length outputs. A “mode of oper- ation” then specifies how the compression function can be used repeatedly to enable hashing inputs of arbitrary nonnegative length to produce a fixed-length output. To describe a hash function, one thus needs to describe:
factor of four, and then performs (if necessary) a single sequential pass to finish up. Since the input size must be less than 2^64 bits and the final compression function produces an output of 2^10 = 1024 bits (before final truncation to d bits), there will be at most 27 such parallel passes (since 27 = log 4 (2^64 / 210 ). The default value L = 64, since it is greater than 27, ensures that by default MD6 will be full hierarchical. Graphically, MD6 creates a sequence of 4-ary trees of height at most L, each containing 4L^ leaf chunks (of c = 16 words each), then combines the values produced at their roots (if there is more than one) in a sequential Merkle- Damg˚ard-like manner. If 4L^ is larger than the number of 16-word chunks in the input message, then only one tree is created, and MD6 becomes a purely tree-based method. On the other hand, if L = 0, then no trees are created, and the input is divided into 48-word (three-chunk) data-blocks to be combined in a sequential Merkle-Damg˚ard-like manner. (There are now only three data chunks in a data block, since one chunk is the chaining variable from the previous compression operation at the node immediately to the left.) For intermediate values of L, we trade off tree height (and thus minimum memory requirements) for opportunities for parallelism (and thus perhaps greater speed). Figure 2.4 gives the top-level procedure for the MD6 mode of operation, which is described in a bottom-up, level by level manner. First, all of the compression operations on level = 1 are performed. Then all of the compression operations on level
= 2 are performed, etc. MD6 is described in this manner for maximum clarity. A practical imple- mentation may be organized somewhat differently (but, of course, in a way that computes the same function). For example, operations at different levels may be intermixed, with a compression operation being performed as soon as its inputs are available. See Chapter 4 for some discussion of implementation issues. Each such compression operation is by default performed with the PAR operation, described in Figure 2.5. The PAR operation may be implemented in parallel (hence its name). Given the data on level − 1, it produces the data on level
, which will be only one-fourth as large. This is repeated until a level is reached where the remaining data is only 16 words long. This data is truncated to become the final hash output. Figure 2.6 describes SEQ—it is very similar to Merkle-Damg˚ard in opera- tion. It works sequentially through the input data on the last level and produces the final hash output.
MD6’s mode of operation formats the input to the compression function f in the following way. There are n = 89 words, formatted as follows with the default sizes. See Figure 2.7. The first four items Q, K, U , V , are “auxiliary inputs”, while the last item B is the data payload.
Figure 2.1: Structure of the standard MD6 mode of operation (L = 64). Com- putation proceeds from bottom to top: the input is on level 0, and the final hash value is output from the root of the tree. Each edge between two nodes represents a 16 word (128 byte or 1024-bit) chunk. Each small black dot on level 0 corresponds to a 16-word chunk of input message. The grey dot on level 0 corresponds to a last partial chunk (less than 16 words) that is padded with zeros until it is 16 words long. A white dot (on any level) corresponds to a padding chunk of all zeros. Each medium or large black dot above level zero corresponds to an application of the compression function. The large black dot represents the final compression operation; here it is at the root. The final MD hash value is obtained by truncating the value computed there.
0
1
level
Figure 2.2: Structure of the MD6 sequential mode of operation (L = 0). Com- putation proceeds from left to right only; level 1 represents processing by SEQ. The hash function output is produced by the rightmost node on level 1. This is similar to standard Merkle-Damg˚ard processing. The white circle at the left on level 1 is the 1024-bit all-zero initialization vector for the sequential com- putation at that level. Each node has four 1024-bit inputs: one from the left, and three from below; the effective “message block size” is thus 384 bytes, since 128 bytes of the 512-byte compression function input are used for the chaining variable.