I was doing some performance comparisons between Nim and Python for sha1.
Nim:
import strutils
import std/sha1
var
buf = " ".repeat(1_000_000)
for i in 0..<1000:
discard secureHash(buf)
echo $secureHash(buf)
Python:
import hashlib
buf = 1000000*' '
sha = hashlib.sha1
for i in xrange(1000):
sha(buf).digest()
print sha(buf).hexdigest()
Python runs this in 1.44s, but Nim with -d:release was taking 21.89s. With -d:danger, Nim runs in 2.42s.
I may not trust my own code as much as Nim std library code, so would probably stick with a release build. But, I wouldn't want sha1 to take 10x longer either. My hacky solution is to add {.checks:off, optimization:speed.} to std/sha1.nim, but shouldn't that be the default for well-tested Nim libraries? It seems like this could be a big turn-off to someone kicking the tires on Nim if they didn't realize what was happening.
If you really care about the performance, you should use gcrypt's sha1 implementation. Here has the benchmarking result.
That's a nice library, thanks for the link! Today I have a static build of HashBackup, so the LGPL license is painful. But it's nice to know an upper performance bound. The non-gcrypt versions (not vectorized) seem to be in the same ballpark as Nim's std sha1.
I ran some more tests with smaller buffers:
Nim @ 256K, 10K loops:
ms:nim jim$ /usr/bin/time -l ./sha
D4609D41B91CFD2FD481EDB1C8198EEAC705EF81
6.19 real 6.17 user 0.00 sys
Python @ 256K, 10K loops:
ms:nim jim$ /usr/bin/time -l py sha.py
d4609d41b91cfd2fd481edb1c8198eeac705ef81
3.73 real 3.71 user 0.00 sys
Nim @ 8K, 100K loops:
ms:nim jim$ /usr/bin/time -l ./sha
48F4BB1ECCB77E2A1E6C2B48288F1D8458DFF8C3
2.04 real 2.03 user 0.00 sys
Python @ 8K, 100K loops:
ms:nim jim$ /usr/bin/time -l py sha.py
48f4bb1eccb77e2a1e6c2b48288f1d8458dff8c3
1.38 real 1.37 user 0.00 sys
I'm okay with Nim's sha1 performance with no checks. I think there may be a lot of people like myself who are looking at Nim as a faster alternative to Python, and to keep them interested, it's important that Nim is competitive on these small tests of general features: db access, dicts, hashing, string operations, file I/O, etc.Can we hide SHA1 or MD5?
Those should be used by the compiler only but we shouldn't expose those to the public especially under the very misleading name secureHash.
Regarding perf at first glance I don't see why the Nim stdlib with danger would that much slower than Python unless Python is vectorized.
Given that Nimcrypto will undergo audit (but on SHA2 not SHA1) I strongly suggest that hashes use Nimcrypto: https://github.com/cheatfate/nimcrypto/blob/master/nimcrypto/sha.nim
Compatibility and transitions are a thing. If you want to add a note to the sha1 page that Google spent $110K to find one sha1 collision, fine. md5 is much less secure than sha1, yet there are still uses for it today. You can't just say "from today on, this hash can never be used again, anywhere". Hashes get stored. We have to deal with that.
Some details about Google's collision from their blog post:
We then leveraged Google’s technical expertise and cloud infrastructure to compute the collision which is one of the largest computations ever completed. Here are some numbers that give a sense of how large scale this computation was:
I agree with you about the secureHash name. I never liked it because it's too vague. I realize that it is Nim style to not use module names, but to me, sha1.hash(data) is a lot easier to read than secureHash(data) and knowing that secureHash comes from the sha1 module.
Does this mean that achieving SHA-1 collisions is now within the grasp of most attackers? No, but it's certainly within the capabilities of nation-states.
That's enough to make it unsafe for any commercial use.
But I don't think these functions should be "hidden". Perhaps an easily suppressible warning could be added.
OTOH, secureHash is a terrible name and should be deprecated, and changed to toSha1 a la toMd5. And
type
Sha1Digest* = array[0 .. Sha1DigestSize-1, uint8]
SecureHash* = distinct Sha1Digest
should be
type
Sha1DigestRepr = array[0 .. Sha1DigestSize-1, uint8]
Sha1Digest* = distinct Sha1DigestRepr
SecureDigest* {.deprecated.} = Sha1Digest