Hello i have the following piece of code that i want to transform in nim:
int utf8str_codepoint_len( char const* s, int utf8len ) {
int codepointLen = 0;
unsigned char m4 = 128 + 64 + 32 + 16;
unsigned char m3 = 128 + 64 + 32;
unsigned char m2 = 128 + 64;
for ( int i = 0; i < utf8len; ++ i, ++ codepointLen ) {
char c = s[i];
if ( ( c & m4 ) == m4 ) {
i += 3;
} else if ( ( c & m3 ) == m3 ) {
i += 2;
} else if ( ( c & m2 ) == m2 ) {
i += 1;
}
}
return ( codepointLen );
}
I have this for the moment but it's doesn't compile:
proc utf8str_codepoint_len*(s: cstring; utf8len: cint): cint =
var codepointLen: cint = 0
var m4: cuchar = 128 + 64 + 32 + 16
var m3: cuchar = 128 + 64 + 32
var m2: cuchar = 128 + 64
var i: cint = 0
while i < utf8len:
var c: char = s[i]
if (c and m4) == m4:
inc(i, 3)
elif (c and m3) == m3:
inc(i, 2)
elif (c and m2) == m2:
inc(i, 1)
inc(i)
inc(codepointLen)
return codepointLen
Should i keep it in c and import it nim ?
It would be helpful to others to indicate your exact problem, like why does it not compile and why you can not solve this? Your above snippet has problems with converting types, because you are trying to mix Nim integers and C unsigned chars. Fixing this is not too hard:
proc utf8str_codepoint_len*(s: cstring, utf8len: cint): cint =
var codepointLen: cint = 0
let m4 = 128 + 64 + 32 + 16
let m3 = 128 + 64 + 32
let m2 = 128 + 64
var i: cint = 0
while i < utf8len:
var c = s[i].int
if (c and m4) == m4:
inc(i, 3)
elif (c and m3) == m3:
inc(i, 2)
elif (c and m2) == m2:
inc(i, 1)
inc(i)
inc(codepointLen)
return codepointLen
As an alternative you might look into the nim unicode module which offers the runelen() proc which does almost exactly what your above code does.
Your code is mostly fine, the reason it's only compiling, is due to Nim having a much stricter type system. For other visitors this is the error message is:
in.nim(3, 34) Error: type mismatch: got but expected 'cuchar = Char'
You're probably better off not using the types prefixed by C, those are mostly meant for compability.
This is how I would translate your function into more idiomatic Nim:
proc utf8str_codepoint_len*(s: string, utf8len: int): int =
const
m4 = 128'u8 + 64 + 32 + 16
m3 = 128'u8 + 64 + 32
m2 = 128'u8 + 64
var i = 0
while i < utf8len:
# in Nim char and uint8/byte are separate types, with the former being only used to represent 8 bit
# characters and thus arithmetic functions aren't defined for neither it nor cchar.
var c = uint8(s[i])
if (c and m4) == m4:
i += 3
elif (c and m3) == m3:
i += 2
elif (c and m2) == m2:
i += 1
inc result # result is the implicit return value that is like all unassigned variables in Nim set to 0 in the beginning