Here is my as fast as C implementation: https://gist.github.com/treeform/900f55d4bc08e57fe2257360b5f9fa68
It looks like you can go faster if you use SIMD stuff, like this C library: https://github.com/aklomp/base64
it has more features but does not handle errors?
Since base64.nim says "unstable API" we could take your code... :-)
I am happy to take over the stdlib base64 API and make it stable if you guys agree with my proposals bellow:
This is the place where stdlib does not handle errors:
https://github.com/nim-lang/Nim/blob/master/lib/pure/base64.nim#L123
It needs to throw exception saying invalid base64 encoding instead of setting it to 63... RFC says to do so: https://tools.ietf.org/html/rfc4648#section-3.3
https://github.com/nim-lang/Nim/blob/master/lib/pure/base64.nim#L47
The feature it should not support is lineLen and setting a custom newLine. It makes code slower.
Python does not support this either: https://docs.python.org/2/library/base64.html
I think I have seen this in ObjectiveC. I think this is only needed when used in emails?
The RFC says: https://tools.ietf.org/html/rfc4648#section-3.1 to not support it? Leave it up to the email MIME spec. Multipurpose Internet Mail Extensions wrapper should do this part. See: https://tools.ietf.org/html/rfc2045
What makes my code faster is the lookup table and dropping support for MIME stuff which should not be there.
if unlikely(str.len == 0): return "" # For eg. encode("")
Special case return fast for empty string ?.
I am working on a PR.
The gist is just a proof of concept.
You are right. I need to check for "" otherwise my code breaks. I just added that in. Thanks!
benchmarks are a game.
I agree, but in a very positive interpretation of that phrase.
Competitive games are essential, both to individual human development as well as software projects. They are a feedback mechanism that challenges potential complacency, and helps bring out the best that is within us.
Even if benchmarks don't have a perfect correlation with every real-world performance scenario, they have a strong correlation with many. Participating, tuning, and winning benchmarks shows that Nim has a community that cares about its success.
just gotta say, "cool!" and Thanks, all! 3 hours from "... noticed that..." to "PR up:"!
and love that compile time lookup table goodness!
I use nim heavily on my projects. I hope to get some bits polished enough to give back at some point.
I think the big difference between making python faster vs making nim faster ... is in nim you just use better algorithms like precomputed tables ... some thing natural. While in python you use a different language like C modules or cython. It's just not the same. Why not always just use C and cython then?
In nim the optimization path seems natural extension of what you do anyways.
Back when I used Cython more, I did always use Cython and just gradually type in things as required (or for some things do the equivalent of #include some C for some SSE intrinsics type codes). I agree it is not generally "popular", but it is a very legitimate mode of usage. Adding more and more cdef's etc. brings your code closer and closer to C semantically while staying in Cython syntax. Partly, I had better profiling tools for such code than "pure python" style.
They even have a nice --annotate mode supporting this usage style that spits out an HTML page with clickable generated code colorized by how C-like it is. That kind of source to "assembly" visualization tool would be nice for Nim as well, or even just gcc/clang going all the way to "real" assembly. It's got a bit more "oomph" for things spanning orders of magnitude of efficiency like pure Py and C, though, and the CPython API calls Cython generates may be (a little easier to read than real assembly, much like the C code Nim generates.
Now, why good modes of usage (of almost anything) are not more popular..well, that's some question for the ages. People are imitative. "It's not popular because it's not popular." How to bootstrap something catching on is just..tricky.