New Cryptanalytic Results Against SHA-1

Xiaoyun Wang, one of the team of Chinese cryptographers that successfully broke SHA-0 and SHA-1, along with Andrew Yao and Frances Yao, announced new results against SHA-1 yesterday at Crypto’s rump session. (Actually, Adi Shamir announced the results in their name, since she and her student did not receive U.S. visas in time to attend the conference.)

Shamir presented few details—and there’s no paper—but the time complexity of the new attack is 263. (Their previous result was 269; brute force is 280.) He did say that he expected Wang and her students to improve this result over the next few months. The modifications to their published attack are still new, and more improvements are likely over the next several months. There is no reason to believe that 263 is anything like a lower limit.

But an attack that’s faster than 264 is a significant milestone. We’ve already done massive computations with complexity 264. Now that the SHA-1 collision search is squarely in the realm of feasibility, some research group will try to implement it. Writing working software will both uncover hidden problems with the attack, and illuminate hidden improvements. And while a paper describing an attack against SHA-1 is damaging, software that produces actual collisions is even more so.

The story of SHA-1 is not over. Again, I repeat the saying I’ve heard comes from inside the NSA: “Attacks always get better; they never get worse.”

Meanwhile, NIST is holding a workshop in late October to discuss what the security community should do now. The NIST Hash Function Workshop should be interesting, indeed. (Here is one paper that examines the effect of these attacks on S/MIME, TLS, and IPsec.)

EDITED TO ADD: Here are Xiaoyun Wang’s two papers from Crypto this week: “Efficient Collision Search Attacks on SHA-0” and “Finding Collisions in the Full SHA-1Collision Search Attacks on SHA1.” And here are the rest of her papers.

Posted on August 17, 2005 at 2:06 PM66 Comments

Comments

clvrmnky August 17, 2005 2:49 PM

I’m a bit confused. If there is still no paper, and all we have to go on is (albiet rather weighty) opinion and a few announcements, how do we know how accurate this news is?

What if the math is wrong? Is Shamir’s opinion enough to count as reasonable peer review? I’m certainly in no position to dispute Schneier or Shamir on this, but if there is anything I’ve learned from reading about crypto it’s to maintain a skeptical eye. How many people have actually got a chance to challenge these results?

While I’m glad NIST is working on making better controlled substances, er, hashing standards, I’m a little cloudy on how accurate this particular dire news is.

Bruce Schneier August 17, 2005 3:08 PM

I’m not sure what you mean by “still no paper.” The paper on SHA-i and SHA-0 that were presented at Crypto are in the proceedings. This new result is way too new for a paper, but I’m sure one is forthcoming.

greg August 17, 2005 4:16 PM

Ok, well lets assume that this attack works. as far as i care anything less than 2^64 is really a total brake. But…

So is this the attack? find any X,Y where X != Y such that H(X)=H(Y).

Or (more damaging). Givin H(X) find Y such that H(X)=H(Y)?

If its the latter –we have a problem or two. but still a lot a ppl use sha256 (well i do). But i understood that sha1 and sha256 are very similar. Will this attack lead to attacks on sha256?.

I’m thinking of using hashes based on (constructed from) symetric block ciphpers… Since we seem to be better at designing them.

One thing is for sure. We need better hashes.

greg

Bruce Schneier August 17, 2005 5:00 PM

These new results have not been peer reviewed at all. Shamir presented them at the rump session, but he only received the presentation the previous day — and there are very few details.

The two papers presented at Crypto — the attack against SHA-0 2^69 result against SHA-1 — have been public for months. They have been peer reviewed.

Wang is a respected cryptographer; I trust her enough to accept her claims. And a paper will be forthcoming, of course.

Graham Hughes August 17, 2005 8:24 PM

Greg, the fact that Bruce is talking about 2^80 being brute force implies this is your first problem (that is, find X, Y so that X != Y & SHA-1(X) = SHA-1(Y)).

clvrmnky August 17, 2005 10:36 PM

I meant this: http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

This is the second time I’ve seen you comment on SHA-x being proven weak by the same people, and each time it is made clear that there was no peer-reviewed paper [at the time of writing.]

You did go out of your way to state “Shamir presented few details — and there’s no paper…” It sounds like a caveat of some sort to me. I’ve made it a habit of “wait and see” until all the math is peer-reviewed and results well accepted.

That’s all. I see you have a link to the original research that was presented. Since I pretty much get all my crypto news from here, and rarely follow-up on the research, I may have been under the mistaken assumption that the original research was still not peer-reviewed.

Bruce Schneier August 17, 2005 11:05 PM

What’s going on is that we’re seeing the results before we see the papers. This is not unusual, although there was a long time between when we learned about the first SHA-1 results and when we saw the paper I linked to above.

While I agree that any result needs to be peer-reviewed before it is believed, Wang has a good reputation and I am predisposed to take her word at face value.

Rump August 17, 2005 11:12 PM

It might help to explain what a rump session is, for those that don’t know.

Generally, rump sessions are open to anyone who signs up during the conference to give a 5 minute talk about “something on his mind”: the latest results, an interesting discussion they had in the corridors today, etc.

Rump sessions are great for hearing the bleeding edge stuff.
Generally the presenter will indicate if s/he is presenting real results too new to publish, stuff that just looks good so far, or the next idea on which s/he wants to work.

jammit August 18, 2005 12:26 AM

I view this as more of a ‘breaking news’ sort of thing and not an ‘old hat’. The breaking of the hash is something new, and not enough information has been compiled/sorted yet to make any reasonable paper (with only minor spelling errors). Right now it seems that the attack is just starting to show promise, and so far it looks pretty neat. Just in case this works really well, what are everybody’s plans for developing a new unbreakable crypto?

krypto August 18, 2005 12:59 AM

@Graham,

$ echo X | sha1sum
b5fec669aca110f1505934b1b08ce2351072d16b
$ echo Y | sha1sum
c881daf1af41bc5aa1e14da108fe062784470ccb

clearly, SHA-1(X) != SHA-1(Y).

Clive Robinson August 18, 2005 5:41 AM

@greg

“I’m thinking of using hashes based on (constructed from) symetric block ciphpers… Since we seem to be better at designing them.”

Be careful which block ciphers you pick, some make very bad hashs, and in one case the implementation has been publicaly broken and proved embarising for the implementer (TEA / X-Box / Microsoft).

But I agree with your point we most definatly need better Hashes, but we need the research to get them 😉

Dave August 18, 2005 7:40 AM

@Greg

I think we already have ‘better hashes’ in the form of Tiger and Whirlpool. Then again, if you want to steer clear of MD5 and the entire SHA family, Tiger and Whirlpool are pretty much the only decent (i.e. unbroken, so far) alternatives.

Spammy McGriddle August 18, 2005 7:52 AM

In practical terms, until there is automated software to crunch the numbers at will, SHA is secure for the average Joe. Big business and governments who have other big businesses and governments as adversaries need to begin evaluating alternatives.

Ed T. August 18, 2005 12:25 PM

@clvrmnky, krypto,

Well, I certainly don’t feel less stupid today. What are you (krypto) trying to demonstrate by the commands you posted? This one is over my head…

Chris W August 18, 2005 5:53 PM

What are the odds that two texts will result in the same MD5 hash and the same SHA-1 hash without being identical texts? You could just mingle the two (one character from one, one from the other) and call that a new hash. And at only twice the computing power!

Bruce Schneier August 18, 2005 6:35 PM

MD5 is a 128-bit hash; the odds of two texts having identical MD5 hashes are 1 in 2^64. SHA-1 is a 160-bit hash; the odds of two texts having identical SHA-1 hashes are one in 2^80. Assuming the two hash functions are independent — a reasonable assumption — then the odds of two texts having identical MD5 and SHA-1 hashes are 1 in 2^144.

The universe will either collapse on itself or the galaxies will drift apart into nothingness before you find such a pair of texts.

SM August 18, 2005 6:58 PM

@Ed T

It was just a little joke. What krypto did was get the sha1sum for the letters X and Y (as opposed to two “items” X and Y…numbers, strings, etc.). Since they’re different values, SHA-1(X) != SHA-1(Y).

Dscho August 18, 2005 7:37 PM

@ Bruce:

2^144 sounds impressive. But if the attack is enhanced to find all Y for a given X such that SHA1(X)=SHA1(Y), and an equivalent attack is found for MD5, then 2^144 is no longer safe (because the effective search space is much, much smaller).

jsnow August 18, 2005 8:07 PM

Maybe I’m missing something obvious here, but it seems that the odds of any two arbitrarily chosen texts having the same md5sum would be 1 in 2^128, or of having the same sha1 would be 1 in 2^160.

On the other hand (if I understand the birthday paradox correctly), if you were to chose 2^64 arbitrary texts you would have on average one md5sum collision somewhere amongst those 2^64 texts, and if you chose 2^80 arbitrary texts, you would have on average one sha1 collision (which is what I think you were trying to say, but maybe I’m confused).

Curious August 18, 2005 8:50 PM

Would it be silly of me to use either the hash127 or UMAC schemes for authenticating cookies passed back and forth between server and browser? Are these considered ‘safe’?

Bruce Schneier August 18, 2005 9:07 PM

@ jsnow

No, you don’t understand the birthday paradox correctly. The odds of one text having any particular hash is 2^n. The odds of two texts having the same hash is 2^(n/2), where n is the block length.

Rob August 18, 2005 9:13 PM

Just wondering, but FIPS186-2 specifies the use of SHA-1 to post process random numbers. At what point does SHA-1 become ineffective for this purpose?

Bruno August 18, 2005 9:21 PM

Are you sure about that?
Isn’t 2^(n/2) the approximate number of texts
where the odds of any two in the set having
matching hashes is1/2?
This isn’t the same thing as the odds of two
random texts having the same hash.

Kramer August 18, 2005 9:30 PM

Oh, isn’t the best way to hash by doing a

substr(XOR(mystring),32)

That would give you a 32 byte hash, right!? Isn’t that secure enough!

Jeff Jancula August 18, 2005 10:05 PM

Bruce,

I don’t believe you understand the birthday attack (and paradox) correctly. If one input (X) produces a hash H(X), the odds that any next input (Y) will also produce H(X) is equal to 1 in 2^n, where n is the hash length.

JSnow and Bruno have it right: The birthday paradox/attack says that you need 2^(n/2) inputs to find a case where the two hashes are equal.

So, in the case of MD5 (n=128 bits). The odds that any two inputs will have the same hash = 1 in 2^128. However, the odds that 2^64 inputs will result in at least one duplicate hash are almost certain.

Jeff Jancula August 18, 2005 10:41 PM

Follow-up on the birthday paradox:

The birthday paradox comes from the scenario that two randomly chosen people in a classroom of 30 students will have the same birthday (excluding year) is almost certain.

Bruce, the way you’re stating the birthday paradox is that if you choose the first student in the class, the odds that one of the remaining 29 will have the same birthday as the first are almost certain – which is false.

Dan Kaminsky August 18, 2005 11:45 PM

Bruce,

Can you identify a 2^64 work effort task that’s been completed? Remember, MD5 is thrashable without the Wang result if 2^64 is doable.

What’s the mechanism for making a birthday paradox attack not require as much storage as it requires computation? Creating a 2^m store, with m<n and 2^m feasible, and recomputing hashes that fall within that range?

–Dan

Edward J. Huff August 19, 2005 4:10 AM

Proposed revision of software signing protocols to allow for hash collisions.

Google for a25f7f0b29ee0b3968c860738533a4b9 OR a25f7f0b, an example of how to exploit a hash collision.

To avoid this exploit, the signer needs to make an unpredictable modification to the document prior to signing.

Alice and Bob use hash function H and signature function F to validate documents. Alice signs a message M by finding signature S such that F(S) = H(M). Bob accepts (M,S) as signed by Alice. Eve can calculate H(M) and F(S) but cannot find S given M.

Eve creates documents M1 and M2 which have the same hash, and asks Alice to sign M1. If Alice uses the expected hash, Eve can substitute M2 and Bob will accept (M2,S) as signed by Alice.

So Alice and Bob need to agree to a safer method. When Eve presents M1 for signature, Alice generates a new random key K, encrypts M1 with K and hashes, giving H(K(M1)). She then signs this by finding S such that F(S) = (K,H(K(M1))), and gives (S,K) to Eve.

Eve sends (M2,S,K) to Bob. Bob checks the signature by calculating F(S) and comparing it to (K,H(K(M2))).

Eve is foiled, because H(K(M1)) != H(K(M2)), even though H(M1) = H(M2).

Thus, immediately, the protocols for signing software distributions should be adjusted so that the signing authority Alice generates random key K and signs H(K(M)) instead of just H(M). The automatic software update agent Bob checks the signature (S,K) by calculating F(S) and (K,H(K(M))).

This will prevent substitution of a useful M2 for M1 by Eve, who is allowed to present an arbitrary M1 for testing and signature. Eve cannot predict K, and hence even if she can find M2 after the fact, such that H(K(M1)) = H(K(M2)), she is not allowed to change M1 after K is chosen. Hence M2 will not be useful.

Bob can safely install M1 automatically, confident that if M1 passes a simple syntax check (which M2 cannot possibly satisfy), it is the same program which Alice accepted for signature, even if hash collisions can be found.

Ulrich August 19, 2005 6:53 AM

With a complexitiy of 2^{63}, SHA-1 is (currently!) about as secure as MD5 should have been…

Brian August 19, 2005 7:42 AM

@ Dscho

There are [most likely] an infinite number of Y’s such that SHA1(X) = SHA1(Y), so it wouldn’t be possible to find all Y’s for a given X. But your point is still valid — if you have a way to find Y’s which is better than 2^80 then you should be able to find a Y such that X & Y have identical SHA1 and MD5 hashes in better than 2^144. If this new attack is 2^64, then finding SHA1 and MD5 should be 2^128.

Daniel K. August 19, 2005 7:56 AM

I agree with Chris W – in fact I was just discussing such a scheme with my colleagues earlier this week. By using two (relatively complex) hash functions with completely different algorithms (eg md5 and sha-1), the odds of two different texts producing the same hash in each algorithm must surely pale into insignificance as any weaknesses found in one are unlikely to exist in the other.

Now my question is: There is very little new under the sun and it is unlikely that Chris W. and I are the first two people to think of it. Does prior art exist in this area?

Put another way, if one wanted to find two algorithms that were sufficiently different to make the odds of the above described collision no greater than one with key length equal to the sum of each, which would you choose? Would the calculation of each of those algorithms be faster than the calculation of a similar hash function of the longer key length?

I’m sorry for the hand-waving, I’m sure someone would be able to formalise such a concept much better than I…

Dscho August 19, 2005 8:31 AM

@Brian

Sorry, I meant “all Y of a certain length”.

And I still do not follow the reasoning of the complexity of SHA1 mixed with MD5 being the product of the single complexities.

It should be possible to filter the search space for one hash by finding (partial) matches for the other first, i.e. if there is an efficient technique to look for messages Y which (partially) match the SHA1 of X, then you certainly do not have to start all over again for MD5.

Remy August 19, 2005 8:57 AM

@Daniel K:

“any weaknesses found in one are unlikely to exist in the other”

This is a very bad assumption, as both digest algorithms are based on MD4. There may be structural weaknesses that could be exploited.

DarkFire August 19, 2005 9:59 AM

Ladies & gents, please forgive my relative ignorance in this particular subject, but what (in laymans terms) is the definition of the “complexity” of a hash function?

And what, for that matter, is a hash function?

Cheers, DF.

JohnE August 19, 2005 11:48 AM

If I understand the SHA-1 weakness correctly, it is now within the realm of possibility (practicality) to find synonyms, and possibly use that as a means to spoof a signature. But is SHA-1 now reversible? Is it still strong enough to use as an anonymizer? For example, a 20 byte identifier would hash down to 16 bytes under SHA-1, so would be irreversible because information has been lost, right?

Ricardo Barreira August 19, 2005 12:33 PM

To JohnE:

Not necessarily, since it is possible that you know some restrictions to the 20 bytes identifier which would make it unique on the context you mentioned. An example known restriction would be – all the bytes are printable ASCII characters.

Ricardo Barreira August 19, 2005 12:34 PM

To JohnE again:

BTW, you’re misunderstanding this attack. This is not a pre-image attack, this is a collision attack (I think those are the correct names). With google you can find information about those two classes of attacks on hash functions.

JohnE August 19, 2005 12:43 PM

Thanks, Ricardo, for your comments. I think your point about knowledge of the original key says that information may not have been lost, so that it’s theoretically possible to reverse the hash. But it still leaves open the question: is SHA-1 a good choice for anonymizing?

averros August 19, 2005 8:45 PM

Deny visas to cryptographers, but give
them to terrorists. I get it.

Terrorists are performing a useful service to the government – their actions provide a justification for the governments to expand their powers and increase confiscation of wealth of citizens.

Cryptographers, on the other hand, work on making citizens more immune from government snooping and, potentially, able to hide all or some wealth from the eye of taxmen.

So it is entirely reasonable that the government would want to let terrorists in while keeping cryptographers out.

SumDude August 21, 2005 4:45 PM

JohnE: SHA-1 should still suffice as an “anonimizer”, though, if you can help it, it’s best to simply use a random-numbered token, and remember the association between identity and token server-side. With a one-way hash you still need to hang on to the original identity in order to be able to hash one value to check it (you don’t want to have to hash all userIDs to check just one cookie!), so it doesn’t add overhead. If at all possible, just stick to the bog standard session functionality provided by your application platform (by which I mean the likes of j2ee,.net,php, etc.).

Ricardo Barreira August 22, 2005 11:28 AM

JohnE: Yep, on my first post I forgot to reply to that part, but check out the following post (also from me) 🙂

averros August 22, 2005 10:41 PM

clvrmnky – I do indeed have an agenda, namely I have my political opinions which grew from my first-hand experience with dealing with governments in different countries, including communist. My experience says that people in Western democracies tend to underestimate the similarities between their governments and those of the communist countries – apparently because while they know intellectually that the communism is bad, they don’t know how it looks in practice (and, therefore, have no idea of what to fear).

So, no, this is not humour. Not at all. In fact, one of the lessons of living in a communist country is that a lot of well-meaning people can together create a monster, given the “right” kind of ethics. The US security apparatus is a good illustration – while I do not think there’s a lot of scoundrels working for the goverment, the collective behaviour of nice and helpful government employees (reflecting their belief into the moral permissibility of imposing their will on non-consenting people for “their own good”) creates exactly the kind of moronic self-destructive policies we observe.

The solution is not to try to educate the government, and not to take over it by some political process – because it merely replaces one set of well-meaning people with another, with no change in the ethics whatsoever – but to expose the idea of coercive collectivist power for the moral outrage it is, and disband the collective monstrosity – and, importantly, – refrain from building another one in its place, no matter how tempting it seems to have “our monster” to protect us.

The greatest danger to the personal security of US citizens comes from their own government, this is as simple as that. Never forget that the present terrorist problems are the direct result of previous actions of the government (such as financing and arming “our bastards”), and that our own collectivist retoric is exactly what allows pissed-off arabs to plausibly lump random civilians together with the US government into the category of legitimate targets.

Clive Robinson August 24, 2005 12:19 PM

Folks,

The argument about the aplicability of the birthday paradox is unfortunatly a bit of a red herring, and actually probably does not apply in the practical case.

The reason for this is the assumption that two “random” texts have slightly better than 0.5 probability after 2^(n/2) trials.

The problam with this is that from a theoretical point of view “random” texts are fine, in the real world however they need to be “meaningful” texts. No matter what coding method you use, meaningfull texts have their own statistics, that are very far from random.

I actualy have experiance of the Birthday paradox failing woefully as a party trick (this is due to the skewed birth statistics due to spring 😉

Now I am not aware of anybody having looked into this in any depth so I do not know in which way the 0.5 probability will move but I am confident that it is going to be atleast a couple of bits or more different.

I suspect that for the “brute force” method things will be worse (for the attacker). However for the “crafted” aproach using weaknesses in the algorithum I have a “gut fealing” it is going to be better.

As I said it’s a gut fealing (which even on a good day are not very reliable), however in the absence of “reasond” evidence I will go with it as it errs on the side of caution.

chinarocks August 24, 2005 1:11 PM

It is very simple, organize the next conference in China. The Chinese cryptographers will certainly be there. That’s where the action is anyway these days.

Duncan Simpson August 27, 2005 7:03 AM

If I understood it corerctly the birthday paradox was about the probabiltiy of 2 or more people having the same birthday (the probability of exactly 2 people is a smaller). I will agree that for hash functions the odds are too small for practical use.

Just being able to find random text is good enough, as some researchers demonstrated with a pair of postscript documents (random text+two different printed documents depending on which random text was present). I think the hash involved was MD5 but the same “theory” would work for almost any hash function.

Jason September 8, 2005 2:50 PM

OK, I tried to get Gnupg to use SHA256 instead of SHA1 but it complains that DSA requires a 160 bit hash. Is RIPEMD160 a good alternative ?

Dan Kaminsky September 8, 2005 7:42 PM

The problem with “meaninglessness” is that file format exist, and there’s always somewhere you can shove 64 bytes of random data. In two different documents, you can perturb in two different places, and if the birthday odds are too small eventually you’ll cause the documents to both possess the same hash and be fully readable.

Joey J September 21, 2005 1:58 PM

As far as the birthday paradox is concerned, it seems like something is missing.

The birthday paradox only applies to the day and month only; it does not apply to the complete birthday. It seems to me we’re comparing the odds of an incomplete set of data (partial birthday) being duplicated versus the odds a complete set of data (full hash value) will be duplicated.

It seems like the Birthday Paradox would be more applicable if you were concerned with the odds of 2/3rds of a hash value being duplicated…

Michael Richardson January 22, 2007 1:35 PM

Bruce, the previous 2^69 result was “protected” before by the HMAC-construct,
is the 2^63 result also protected?

Second, is this a different 2^63 result,
or is the 2^69 + 6 more bits of weakness?

whmurray January 23, 2007 4:52 PM

As a general rule, one should not take security advice from a cryptographer. As a general rule Schneier is the exception that proves the rule.

The cryptographer asks, “Is the attack feasible?” That is, could it be done with available or finite resources in available or finite time. The security person asks, “is the attack efficient?” Is the value of success greater than the cost of the attack?

In response, the cryptographer will craft an example in which the attack might be efficient and rests his case. The security person asks whether the attack is efficient in the average or general case?

Courtney’s first law tell us that “Nothing useful can be said about the security of a mechanism except in the context of a specific application and environment.”

Alex June 3, 2007 1:44 PM

Hi,
I am new to cryptography. I know that birthday paradox means that finding two random messages that hash to a same result is surprisingly low (2^m/2), but I don’t know why that is important.
So let’s say someone (eve) finds two messages that hash to the same result. Why is that important, and how can she do malicious stuff knowing two messages that hash to the same value?

Thanks,
Alex

Bruce Schneier June 3, 2007 4:26 PM

@Alex

Let’s say I have two messages that hash to the same value: “Alex is a geat guy” and “Alex owes Bruce Schneier a million dollars.” I get you to sign the hash of the first, which is how digital signature algorithms work, and then take you to court because you refuse to acknowledge the second.

Sun October 7, 2007 3:15 AM

Hi, I don’t know anything about cryptography, but I was just curious about bittorrent and its use of SHA1. Let’s say I have a 2MB torrent piece. How much CPU time would it take to generate another 2MB file (can be gibberish) with the same SHA1? Would it be feasible with a modern data center to produce a file with the same SHA1 within, say one week? I’m curious because RIAA wants to prevent pirating via bittorrent and poisoning the swarm with bad torrent pieces could be one method…

Rune K. Svendsen August 21, 2011 11:49 AM

Regarding BitTorrent (two comments up from this one), this is actually why I became interested in cryptographic hash functions as well.

To answer your question; no, the attack presented here does not allow BitTorrent downloads to be compromised. The attack you describe (and what would compromise the way that current BitTorrent works) is what is called a “second pre-image attack”. The attack presented here is called a “collision attack”.
The difference between these two attacks is that a “collision attack” enables us to find ANY two (different) blocks of data that have the same hash value. In case of the “second pre-image” attack we have to find one block that has the same hash value as another given block. Ie., the difference between the two attacks is that in one case we can find any two (different) data sets which hash to the same value. In the second case we’re given one of the data sets (one of the blocks in a BitTorrent file) and we’re asked to find a second set of data that hashes to the same as the first (given) set of data.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.