nimforum mirror - Winning the Base64 benchmarks.

treeform (orginal) [2019-10-16T17:49:47+02:00] view original

I was looking at Nim benchmarks here: https://github.com/kostya/benchmarks#base64 , and noticed that Nim's base64 is so far behind the simple C implementation. I took the plain C algorithm and ported it to Nim without using any crazy C's pointers etc... and its just as fast as C on my computer. I learned 3 things:

benchmarks are a game.

you can always beat or come close to C with Nim.

Nim's base64 standard library implementation is slow. (it has more features but does not handle errors?)

Here is my as fast as C implementation: https://gist.github.com/treeform/900f55d4bc08e57fe2257360b5f9fa68

It looks like you can go faster if you use SIMD stuff, like this C library: https://github.com/aklomp/base64

Araq (orginal) [2019-10-16T17:59:17+02:00] view original

it has more features but does not handle errors?

Since base64.nim says "unstable API" we could take your code... :-)

treeform (orginal) [2019-10-16T18:15:32+02:00] view original

I am happy to take over the stdlib base64 API and make it stable if you guys agree with my proposals bellow:

This is the place where stdlib does not handle errors:

https://github.com/nim-lang/Nim/blob/master/lib/pure/base64.nim#L123

It needs to throw exception saying invalid base64 encoding instead of setting it to 63... RFC says to do so: https://tools.ietf.org/html/rfc4648#section-3.3

https://github.com/nim-lang/Nim/blob/master/lib/pure/base64.nim#L47

The feature it should not support is lineLen and setting a custom newLine. It makes code slower.

Python does not support this either: https://docs.python.org/2/library/base64.html

I think I have seen this in ObjectiveC. I think this is only needed when used in emails?

The RFC says: https://tools.ietf.org/html/rfc4648#section-3.1 to not support it? Leave it up to the email MIME spec. Multipurpose Internet Mail Extensions wrapper should do this part. See: https://tools.ietf.org/html/rfc2045

What makes my code faster is the lookup table and dropping support for MIME stuff which should not be there.

Araq (orginal) [2019-10-16T18:30:41+02:00] view original

Sounds good.

juancarlospaco (orginal) [2019-10-16T19:06:03+02:00] view original

Wheres the PR ?. :P

if unlikely(str.len == 0): return "" # For eg. encode("")

Special case return fast for empty string ?.

treeform (orginal) [2019-10-16T20:16:18+02:00] view original

I am working on a PR.

The gist is just a proof of concept.

You are right. I need to check for "" otherwise my code breaks. I just added that in. Thanks!

treeform (orginal) [2019-10-16T21:27:52+02:00] view original

PR up: https://github.com/nim-lang/Nim/pull/12436

Libman (orginal) [2019-10-17T03:05:52+02:00] view original

benchmarks are a game.

I agree, but in a very positive interpretation of that phrase.

Competitive games are essential, both to individual human development as well as software projects. They are a feedback mechanism that challenges potential complacency, and helps bring out the best that is within us.

Even if benchmarks don't have a perfect correlation with every real-world performance scenario, they have a strong correlation with many. Participating, tuning, and winning benchmarks shows that Nim has a community that cares about its success.

refaqtor (orginal) [2019-10-17T03:55:45+02:00] view original

just gotta say, "cool!" and Thanks, all! 3 hours from "... noticed that..." to "PR up:"!

and love that compile time lookup table goodness!

I use nim heavily on my projects. I hope to get some bits polished enough to give back at some point.

dom96 (orginal) [2019-10-17T12:43:29+02:00] view original

Couldn't render post #33591.

torarinvik (orginal) [2019-10-17T17:00:46+02:00] view original

Truth as it's told! Wise post indeed. I hear people talking about speed of the implementation rather than the language? I always heard people separating between CPython the implementation and the Python language. If one had a super powerful AI optimizer I would assume that all languages would be almost equally fast. The AI would do the heavy lifting optimizing algorithms generating beautiful machine code and people would do human friendly programming and higher concepts and ideas. People would program assembly in their spare time like other people chop wood for recreation :D :D

treeform (orginal) [2019-10-17T22:21:53+02:00] view original

I think the big difference between making python faster vs making nim faster ... is in nim you just use better algorithms like precomputed tables ... some thing natural. While in python you use a different language like C modules or cython. It's just not the same. Why not always just use C and cython then?

In nim the optimization path seems natural extension of what you do anyways.

cblake (orginal) [2019-10-17T23:53:34+02:00] view original

Back when I used Cython more, I did always use Cython and just gradually type in things as required (or for some things do the equivalent of #include some C for some SSE intrinsics type codes). I agree it is not generally "popular", but it is a very legitimate mode of usage. Adding more and more cdef's etc. brings your code closer and closer to C semantically while staying in Cython syntax. Partly, I had better profiling tools for such code than "pure python" style.

They even have a nice --annotate mode supporting this usage style that spits out an HTML page with clickable generated code colorized by how C-like it is. That kind of source to "assembly" visualization tool would be nice for Nim as well, or even just gcc/clang going all the way to "real" assembly. It's got a bit more "oomph" for things spanning orders of magnitude of efficiency like pure Py and C, though, and the CPython API calls Cython generates may be (a little easier to read than real assembly, much like the C code Nim generates.

Now, why good modes of usage (of almost anything) are not more popular..well, that's some question for the ages. People are imitative. "It's not popular because it's not popular." How to bootstrap something catching on is just..tricky.

Mirror of forum.nim-lang.org

5363 :: Winning the Base64 benchmarks.