nimforum mirror - [SOLVED] What am i doing wrong? (unicode library)

Renaud (orginal) [2015-06-25T22:54:20+02:00] view original

Hi,

My first message here, as a new user. I discovered Nim a while ago, but just started "playing seriously" with it a few days ago. So let me start with a big THANK YOU to Andreas and the whole community.

To the point...

I wrote a simple test with the unicode library :


import unicode

const word = "Méthode"

echo word
echo "-------"

for i in 0..word.len-1 :
    echo i , " : " , word[i]

echo "-----"

for i in 0..word.runeLen-1 :
    echo i , " : " , word.runeAt(i)

And the result surprises me :

Méthode
-------
0 : M
1 :
2 : �
3 : t
4 : h
5 : o
6 : d
7 : e
-----
0 : M
1 : é
2 : ©
3 : t
4 : h
5 : o
6 : d

For the last part, i was expecting :

0 : M
1 : é
2 : t
3 : h
4 : o
5 : d
6 : e

Where's my mistake ?

Thanks...

def (orginal) [2015-06-25T23:01:53+02:00] view original

In UTF-8 runes are a variable amount of bytes in size. runeAt takes the bye position as parameter. You could use runeLenAt with runeAt or fastRuneAt or the runes iterator. This may be closer to what you want:

var i = 0 # i is bytepos
while i < word.len:
  echo i , " : " , word.runeAt(i)
  i += word.runeLenAt(i)

var i = 0 # i is bytepos
var r: Rune
while i < word.len:
  word.fastRuneAt(i, r)
  echo i , " : " , r

var i = 0 # i is runepos
for rune in word.runes:
  echo i , " : " , rune
  inc i

jibal (orginal) [2015-06-25T23:11:49+02:00] view original

The documentation is quite clear:

returns the unicode character in s at byte index i

This makes sense and is as it should be -- runeAt is for traversing the UTF-8 encodings of a string by adding the length of each one, which is an O(N) operation. If runeAt took a character index, traversal would be O(N*N).

Renaud (orginal) [2015-06-25T23:25:06+02:00] view original

Indeed the documentation is quite clear, but my english is less so :-)

That's very logical.

Thanks def and jibal, your were both very helpful.

Mirror of forum.nim-lang.org

1368 :: [SOLVED] What am i doing wrong? (unicode library)