NIST Hash Workshop Liveblogging (5)

The afternoon started with three brand new hash functions: FORK-256, DHA-256, and VSH. VSH (Very Smooth Hash) was the interesting one; it’s based on factoring and the discrete logarithm problem, like public-key encryption, and not on bit-twiddling like symmetric encryption. I have no idea if it’s any good, but it’s cool to see something so different.

I think we need different. So many of our hash functions look pretty much the same: MD4, MD5, SHA-0, SHA-1, RIPE-MD, HAVAL, SHA-256, SHA-512. And everything is basically a block cipher in Davies-Meyer mode. I want some completely different designs. I want hash functions based on a stream ciphers. I want more functions based on number theory.

The final session was an open discussion about what to do next. There was much debate about how soon we need a new hash function, how long we should rely on SHA-1 or SHA-256, etc.

Hashing is hard. At the ultra-high-level hand-waving level, it takes a lot more clock cycles per message byte to hash than it does to encrypt. No one has any theory to account for this, but it seems like the lack of any secrets in a hash function makes it a harder problem. This may be an artifact of our lack of knowledge, but I think there’s a grain of fundamental truth buried here.

And hash functions are used everywhere. Hash functions are the workhorse of cryptography; they’re sprinkled all over security protocols. They’re used all the time, in all sorts of weird ways, for all sorts of weird purposes. We cryptographers think of them as good hygiene, kind of like condoms.

So we need a fast answer for immediate applications.

We also need “SHA2,” whatever that will look like. And a design competition is the best way to get a SHA2. (Niels Ferguson pointed out that the AES process was the best cryptographic invention of the past decade.)

Unfortunately, we’re in no position to have an AES-like competition to replace SHA right now. We simply don’t know enough about designing hash functions. What we need is research, random research all over the map. Designs beget analyses beget designs beget analyses…. Right now we need a bunch of mediocre hash function designs. We need a posse of hotshot graduate students breaking them and making names for themselves. We need new tricks and new tools. Hash functions are a hot area of research right now, but anything we can do to stoke that will pay off in the future.

NIST is thinking of hosting another hash workshop right after Crypto next year. That would be a good thing.

I need to get to work on a hash function based on Phelix.

Posted on November 1, 2005 at 3:43 PM27 Comments

Comments

Nicholas Weaver November 1, 2005 4:07 PM

Any comments on Whirlpool (Rijmnen’s (dam’n can’t spell his name) hash function with its AES-like structure?)

Hal November 1, 2005 4:58 PM

I would put the issue of speed of hash vs encryption algorithms differently. Actually I don’t think there is a substantial, observable speed difference between the broad family of hash functions and encryption functions currently in existence. Now, AES has been heavily optimized for various architecture and that may be faster than most hashes. But if you look at the benchmark page at http://www.eskimo.com/~weidai/benchmarks.html you will see a great deal of overlap between the hash and encryption algorithm benchmarks. Among the hashes we have SHA-1 at 70, SHA-256 at 40, Tiger at 40, RIPE-MD160 at 50 (all figures in MB/s rounded to nearest 10). We’ll ignore MD5 at 220(!) because it’s too broken. Among ciphers we have DES at 20, IDEA at 20, Blowfish at 60, CAST-128 at 40, Twofish at 30, Serpent at 20, and AES-128 at 60 (I think faster versions exist).

IMO there is no real evidence that ciphers are faster than hashes; however, I would turn Bruce’s comment around and say that there are hand-wavey theoretical reasons why they should be faster, just as he states: the hash function’s job is harder, because it doesn’t have any secrets which it can exploit.

It may be that when all is said and done and we have good hash and cipher algorithms, it will turn out that the hashes are much slower. Maybe our hashes today are generally weaker than our ciphers, and when we design new hashes with enough rounds and complexity to be as strong as ciphers, they will be much slower. But that is just speculation at this point.

Desi November 2, 2005 1:19 AM

I’m not all that versed in hashes, but it seems to me that a criterium for what constitutes a ‘good enough’ hash for designers depends not just on the cryptographic security, but also on the time it needs to last. If you need a hash to live say 10 years, then it might be worthwhile to burn a few cycles on a ‘better’ hash. I was wondering whether creating a long(er)lived hash by simply concatenating two (or more) hashes each created by a different algorithm would be any good from a cryptographical point of view. After all, in order to modify a document if you have one hash may be relatively easy, but modifying a document if you have to keep two (or more) hashes invariant seems quite another thing. Or is the hashing problem we’re facing not that big that this amount of extra effort is justified?
Note that you could easily make a relatively cheap cryptographic chip that could calculate such double (triple) hashes in the same time that it would need to calculate a hash like we know it today. That would be an investment of in the order of $100 in hardware in a PC (if the chip existed).

Pascal Junod November 2, 2005 1:20 AM

So many of our hash functions look
pretty much the same: RC4, RC5,
SHA-0, SHA-1, RIPE-MD, HAVAL, SHA-> 256, SHA-512.

Hmmm, find the two intruders in that list…

Clive Robinson November 2, 2005 7:47 AM

@Bruce

“it takes a lot more clock cycles per message byte to hash than it does to encrypt. No one has any theory to account”

At a hand waveing level 😉

If you think of all block encryption as a (real) map And if you then think of each step in the algorithum as the steps of a drunk, you have to guess the bar/pub they started in and how many steps they have taken to get to where they currently are. The more steps the more difficult it becomes to find the drunk.

A hash however is like knowing which bar, the drunk does their drinking in. their steps are therefore much more predictable. So the drunk needs to take many many more steps to get to the same level of “being lost”.

If you also think of multiplication etc as consisting of little steps you will probably find that VSH needs a similar increase in the number of steps over any other log based encryption system…

It would be nice to find out it was not the case but my gut feeling says otherwise.

Mike Sherwood November 2, 2005 7:56 AM

Reading the VSH paper raises an interesting question – do we really need hashes to be generic? It’s more convenient is hash(foo) always has the same value, but hash(foo,key) adds a nice level of complexity to the problem. Unless the algorithm has exploitable weaknesses, it forces an attacker to identify a specific target. Including something like a public key with the hash output doesn’t seem like too big of an imposition.

I like the idea of a digital signature that doesn’t have the same vulnerabilities of the underlying hash function. I don’t know if VSH has the potential to be that, but it seems like a start in the right direction.

Zooko November 2, 2005 10:44 AM

Don’t forget to look at the amd64 benchmarks. The amd64 architecture is fast becoming standard on the desktop and server.

http://www.eskimo.com/~weidai/amd64-benchmarks.html

Here’s Hal’s summary with amd64 numbers:

SHA-1 at 100, SHA-256 at 60, Tiger at 210 (!!!), RIPE-MD160 at 70 (all figures in MB/s rounded to nearest 10). We’ll ignore MD5 at 150(!) because it’s too broken. Among ciphers we have DES at 25, IDEA at 24, Blowfish at 35, CAST-128 at 51, Twofish at 56, Serpent at 36, and AES-128 at 50

Look at Tiger! Too bad that 64-bit machines aren’t the machines on which I encounter crypto performance bottlenecks. 😉 But I wonder why Tiger is being overlooked in the current hash function fest. Its dramatically superior performance on amd64, and its relatively advanced age should mark it out for special attention.

Zooko November 2, 2005 10:45 AM

Don’t forget to look at the amd64 benchmarks. The amd64 architecture is fast becoming standard on the desktop and server.

http://www.eskimo.com/~weidai/amd64-benchmarks.html

Here’s Hal’s summary with amd64 numbers:

SHA-1 at 100, SHA-256 at 60, Tiger at 210 (!!!), RIPE-MD160 at 70 (all figures in MB/s rounded to nearest 10). We’ll ignore MD5 at 150(!) because it’s too broken. Among ciphers we have DES at 25, IDEA at 24, Blowfish at 35, CAST-128 at 51, Twofish at 56, Serpent at 36, and AES-128 at 50

Look at Tiger! Too bad that 64-bit machines aren’t the machines on which I encounter crypto performance bottlenecks. 😉 But I wonder why Tiger is being overlooked in the current hash function fest. Its dramatically superior performance on amd64, and its relatively advanced age should mark it out for special attention.

Zooko November 2, 2005 10:55 AM

Down With Hash Functions!

Here’s my contribution: it would be good for practitioners to stop using hash functions as powerful all-purpose primitives. Probably the most common use of hash functions in the wild today is for MACing. Well, nowadays we have better, faster, safer ways to do MACs. (I’m thinking of Carter-Wegman e.g. UMAC and Poly1305-AES.)

Another very widespread use of crypto hashes is for content-addressing — peer-to-peer networks, decentralized file storage, integrity checkers, decentralized revision control systems, and Sun’s new ZFS filesystem.

Linus Torvald’s “git” was recently created from scratch and Linus chose to use SHA-1 over the objections of the cryptographically inclined (Linus was following the example of the “monotone” decentralized revision control system). Sun’s new ZFS filesystem uses, I think, SHA-256. A perhaps underappreciated fact is that pre-image resistance is of little or no value in these applications, so a concatenation of hashes is almost certainly safer than a single hash. The peer-to-peer networks, decentralized revision control tools, and decentralized file store applications usually have under-utilized CPU which they can use for multiple hashes. The actual filesystem ZFS almost certainly doesn’t.

Perhaps cryptographers would see some new insights if they set out to design a hash function which had collision-resistance but not necessarily pre-image resistance.

What about the original raison d’etre of hash functions — public key signatures? I don’t know — I haven’t thought much about that. From my perspective hash functions are rarely used for this — their original purpose.

Alice November 2, 2005 11:23 AM

“Perhaps cryptographers would see some new insights if they set out to design a hash function which had collision-resistance but not necessarily pre-image resistance.” Zooko wrote.

I’m confused… As far as I know, collision-resistance implies other resistances.

Zooko November 2, 2005 2:25 PM

Collision-resistance implies second-pre-image resistance, but not pre-image resistance. Consider, for example, the identity function, which is collision-free and second-pre-image free, but has no pre-image resistance. The Handbook of Applied Cryptography by Menezes, van Orschot, Vanstone has an excellent treatment of this. It is well worth the cover price, but it is also freely available online in PDF form:

http://www.cacr.math.uwaterloo.ca/hac/

Chapter 9

Gargantua November 2, 2005 7:23 PM

@Bruce:
Hashing is hard. At the ultra-high-level hand-waving level, it takes a lot more clock cycles per message byte to hash than it does to encrypt. No one has any theory to account for this, but it seems like the lack of any secrets in a hash function makes it a harder problem.

I’m not so sure the lack of secrets makes hashing harder.

Consider that any cipher in a feedback mode can be turned into a hashing function like so:

  1. Always nitialize the cipher with a fixed key. This is a known public value, and is arbitrary as long as it isn’t a weak key for the cipher algorithm.
  2. Digest bytes by encrypting them. Feed back the ciphertext output, in CBC or another feedback mode (pick one that’s fast for the given cipher).
  3. To finish, digest the padded length, and return the resulting cipher-text block as the hash value. The length of the hash value is always equal to the cipher’s block-length, or a multiple thereof.

Clearly, a feedback mode like CBC is preferable to all non-feedback modes, because it ensures that the output from each block is dependent on all blocks that came before. Otherwise the hash value only depends on the last block hashed, which is useless as a hash.

For the speed analysis, assume the output is a good hash (whatever “good” means). Clearly, if the cipher enciphers quickly, then the hash will also hash just as quickly. And there are just as many secrets here as in any other hash function, i.e. none, because the key is a fixed non-secret value.

But is the output always a good hash? Good question.

My gut reaction is that it should be at least a pretty good hash, otherwise the cipher itself wouldn’t be very good, for the usual cipher-analysis reasons about predictable ciphertext output and such.

However, my gut reaction could easily be wrong. If so, what does the cryptanalysis say in regard to the general strategy? That is, what characteristics of a cipher algorithm would cause it to be weak when used like this?

Anonymous November 3, 2005 6:37 AM

And hash functions are used everywhere. Hash functions are the workhorse of cryptography; they’re sprinkled all over security protocols.
They’re used all the time, in all sorts of weird ways, for all sorts of weird purposes. We cryptographers think of them as good hygiene,
kind of like condoms.

comparing Hash functions with condoms? Might I propose a new hash function: SHA-G…

Hal November 3, 2005 12:43 PM

Two more points:

As far as Niels Ferguson’s comment about the AES process being the best cryptographic invention of the past decade, I think this evaluation is a bit premature. It may be that those involved had a good feeling about the process, that it did succeed in getting our best cryptographers to work on the problem, but the bottom line is the result. Are we sitting here saying, gee, I’m sure glad that Rijndael was what was chosen as AES, what a great success that process was? I don’t think so! I detect considerable nervousness and finger-crossing that Rijndael will hold up. If anything, my perception is that the end result of the process was perhaps not as strong as it could have been. Hopefully the few hints of possible weaknesses in AES will prove to be unimportant, but it is not something we can rule out at present.

The other point was with regard to whether hashing or encryption is naturally easier. Bruce suggested and I agreed that encryption might be easier because it has a secret input, so it is not as “exposed” as a hash. However there is one big advantage a hash has. A hash should be a random function, while an encryption has to be a random permutation. Random functions greatly outnumber random permutations. If we identify the subset of functions with an appropriate domain and range to be a hash or an encryption function, and we pick one at random, it will probably be a good hash function, but it will almost certainly not be a good encryption function. In this sense hashes ought to be easier to find than encryptions.

Bruce Schneier November 3, 2005 5:20 PM

“comparing Hash functions with condoms? Might I propose a new hash function: SHA-G…”

We’ll come up with an algorithm that’s ribbed for your security.

Ilya November 3, 2005 7:53 PM

“I want some completely different designs. I want hash functions based on a stream ciphers. I want more functions based on number theory.”

Yes indeed it would be really interesting.


Offtopic: Your Web site seems to be unaccessible for some IP range from Singapore. It is not totally blocked but few common ports are filtered out (including http one) while the rest are not. Out of curiosity: is it you who is doing it intentionally or is it Singtel just upset about you? 🙂

Bruce Schneier November 3, 2005 8:57 PM

“Offtopic: Your Web site seems to be unaccessible for some IP range from Singapore. It is not totally blocked but few common ports are filtered out (including http one) while the rest are not. Out of curiosity: is it you who is doing it intentionally or is it Singtel just upset about you? :)”

This is all news to me. I have no idea what’s going on, but it’s nothing on my end.

Anonymous November 3, 2005 10:38 PM

I’m not a cryptographer but from what I have read, I think that with whatever the new algorithm for “SHA2” is, in the speed vs. security (which is roughly length of digest) the emphasis should be mostly on security.

People have shown remarkable skill at optimizing algorithms for speed (look at all the various AES optimizations) and note that both the strength of attacks goes up but the speed of hardware goes up as well. If we choose an algorithm optimzed for security, that has no mathematical or algorithmic weakness (a big IF I know), we know that hardware speeds and programming optimizations will make speed less of an issue over time, while at the same time we can rest easy knowing that it will probably not be brute-forced or birthday-attacked in our lifetimes.

Agree? Disagree?

Clive Robinson November 4, 2005 5:36 AM

@Mike Sherwood

“It’s more convenient is hash(foo) always has the same value, but hash(foo,key) adds a nice level of complexity”

Sorry it actually makes things a lot lot worse.

With hash(foo) there is no degree of fredom in the hash function therfore the result either matches or it does not match when the document foo is hashed.

With hash(foo,key) you have introduced a loverly bit of freedom into the equation that anyone who wants to replace foo with snaf can do ie

hash(foo,key1) = hash(snaf,key2)

Where as with the hash(foo) I had to spend a considerable time looking for

hash(foo) = hash(snaf)

now I have this extra degree of freedom which would open a whole new class of attacks which almost certainly produce a nice new short cut or five 🙁

I can think of a couple of ways that you might gain an advantage in message / key trade offs to reduce the search task. This would probably mean that you would have to double the bit size of your hash function which means you would lose any advantage you might have gained.

Also on a more practical point, you would have to protect the integraty of your “key” value in the over all message to prevent other kinds of attack, so more complexity there.

Paul Crowley November 7, 2005 10:50 AM

“Sorry it actually makes things a lot lot worse.”

Surprised no-one’s responed to this. The usual assumption is that the attacker controls both M and M’, and they have to find a pair such that H(M) == H(M’). This gives the attacker all the freedom in the world. By contrast, if you throw in a unique key when you hash, that key is not under the attacker’s control – they can choose M, but they get back H(M, K) for a K they did not choose, and have to choose H(M’, K’) to match that.

If K is transmitted alongside H(M, K) then life is even harder, but of course that also makes the hash large; it’s not clear to me that it’s worth it, but this is not well explored territory.

pandian December 23, 2009 6:47 AM

iam master engineering student.I am going to do projects in Cost-Efficient SHA Hardware Accelerators.so kinly give some guide to me its very help to finsh my project works.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.