Hello all,
Everybody here wants to make Nim the best language ever made. The question is however, can we take code from other projects to use in Nim? The reason i am asking this is because i found a website that contains allot of cool tricks to make Nim faster. They are using the MIT license, so i am not sure if my contribution helps or not.
So perhaps somebody can tell me what to do.
So what can we expect:
import times
proc lowerAscii8(c: char): char =
# Branchless tolower, based on the input-rotating trick described
# at http://www.azillionmonkeys.com/qed/asmexample.html
#
# This algorithm depends on an observation: each uppercase
# ASCII character can be converted to its lowercase equivalent
# by adding 0x20.
# Step 1: Clear the high order bit. We'll deal with it in Step 5.
var rotated = uint(c) and 0x7f
#echo $rotated
# Currently, the value of rotated, as a function of the original c is:
# below 'A': 0- 64
# 'A'-'Z': 65- 90
# above 'Z': 91-127
# Step 2: Add 0x25 (37)
rotated += 0x25
#echo $rotated
# Now the value of rotated, as a function of the original c is:
# below 'A': 37-101
# 'A'-'Z': 102-127
# above 'Z': 128-164
# Step 3: clear the high order bit
rotated = rotated and 0x7f
#echo "3 ", $rotated
# below 'A': 37-101
# 'A'-'Z': 102-127
# above 'Z': 0- 36
# Step 4: Add 0x1a (26)
rotated += 0x1a
#echo "4 ",$rotated
# below 'A': 63-127
# 'A'-'Z': 128-153
# above 'Z': 25- 62
# At this point, note that only the uppercase letters have been
# transformed into values with the high order bit set (128 and above).
# Step 5: Shift the high order bit 2 spaces to the right: the spot
# where the only 1 bit in 0x20 is. But first, how we ignored the
# high order bit of the original c in step 1? If that bit was set,
# we may have just gotten a false match on a value in the range
# 128+'A' to 128+'Z'. To correct this, need to clear the high order
# bit of rotated if the high order bit of c is set. Since we don't
# care about the other bits in rotated, the easiest thing to do
# is invert all the bits in c and bitwise-and them with rotated.
#rotated &= ~c <-- original code
rotated = rotated or rotated # is this correct?
#echo "5 ",$rotated
rotated = rotated shr 2
#echo "6 ",$rotated
# Step 6: Apply a mask to clear everything except the 0x20 bit
# in rotated.
rotated = rotated and 0x20
#echo "7 ",$rotated
# At this point, rotated is 0x20 if c is 'A'-'Z' and 0x00 otherwise
# Step 7: Add rotated to c
result = char(uint(c)+rotated)
var
t0 = epochTime()
s = "This document is a tutorial for the programming language Nim. This tutorial assumes that you are familiar with basic programming concepts like variables, types or statements but is kept very basic. The manual contains many more examples of the advanced language features. All code examples in this tutorial, as well as the ones found in the rest of Nim's documentation, follow the Nim style guide."
r = newString(len(s))
t0 = epochTime()
for x in 1..1000000:
for i in 0..len(s) - 1:
r[i] = lowerAscii8(s[i])
echo "Duration (toLower bitwise) : ", epochTime() - t0
echo r
Result: Duration (toLower bitwise) : 0.35502028465271
import times
import strutils
var
t0 = epochTime()
s = "This document is a tutorial for the programming language Nim. This tutorial assumes that you are familiar with basic programming concepts like variables, types or statements but is kept very basic. The manual contains many more examples of the advanced language features. All code examples in this tutorial, as well as the ones found in the rest of Nim's documentation, follow the Nim style guide."
r = newString(len(s))
t0 = epochTime()
for x in 1..1000000:
for i in 0..len(s) - 1:
r[i] = toLowerAscii(s[i])
echo "Duration (toLower Nim) : ", epochTime() - t0
echo r
Result: Duration (toLower Nim) : 0.8650496006011963
The code must be compiled with the -d:release option or else the duration of the bitwise version is longer. This code runs on Intel processors very fast, AMD processors won't see a big difference. I have not fully test this code, my main concerns are the licensing.
Greetings, Dippo
What website is it?
Normally I would say that it depends on the website and the license. If the website doesn't have a license for the code, I would look into contacting whoever administrates it.
If it's just small snippets and optimizations, then it's probably fine to translate the code, provided you put some attribution.
The source is from Facebook, it can be found here: https://github.com/facebook/folly
I see now it uses Apache 2.0 license, not MIT.
Dippo, your observations is interesting.
It is very obvious that something strange is going on...
For your test code the implementation of toLowerAscii() should have nearly no impact, as that operation is really trivial and gcc is smart.
Note that it is generally recommended to put all your code in a proc if speed is a concern -- that can make a noticeable difference in some cases.
I took the original proc from stringutils and copied it in your example. The surprise is that the plain copy makes it more than ten times faster. The inline pragma is not really necessary and even seems to make it a bit slower.
import times
import strutils
proc toLowerAscii*(c: char): char = # {.inline.}=
if c in {'A'..'Z'}:
result = chr(ord(c) + (ord('a') - ord('A')))
else:
result = c
proc main =
var
t0 = epochTime()
s = "This document is a tutorial for the programming language Nim. This tutorial assumes that you are familiar with basic programming concepts like variables, types or statements but is kept very basic. The manual contains many more examples of the advanced language features. All code examples in this tutorial, as well as the ones found in the rest of Nim's documentation, follow the Nim style guide."
r = newString(len(s))
t0 = epochTime()
for x in 1..1000000:
for i in 0..len(s) - 1:
r[i] = strutils.toLowerAscii(s[i]) # this is ten times slower
#r[i] = toLowerAscii(s[i])
echo "Duration (toLower Nim) : ", epochTime() - t0
echo r
main()
I have currently no idea what the real problem is. But note that such bit fiddling tricks are generally not necessary, as compilers are very smart today.
[EDIT]
Well, the problem may be that strutils.toLowerAscii() is not inlined. With this
$ cat nim.cfg
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-march=native -O3 -flto -fstrict-aliasing"
it becommes very fast. -flto enables inlining due to link time optimization.
rotated = rotated or rotated # is this correct?
I doubt this is correct. For any x: expression x or x gives just x (always), so this line is equivalent to
rotated = rotated
which is pointless and does nothing (unlike the original rotated &= ~c;).