I was trying some pattern from other languages:
echo "中文".match(re"[\x{4e00}-\x{9fa5}]+")
And I got error:
/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(102) re
/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(78) rawCompile
/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(70) raiseInvalidRegex
Error: unhandled exception: character value in \x{} or \o{} is too large
[\x{4e00}-\x{9fa5}]
by reading docs, I thought max value should be 7FFFFFFF?
https://nim-lang.org/docs/re.html
After x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, any number of hexadecimal digits may appear between x{ and }, but the value of the character code must be less than 2**31 (that is, the maximum hexadecimal value is 7FFFFFFF).
what's happening here? how to correct the code?
out of the scope of this thread... but my original issue was that str.escape() handles Chinese characters differently so I have to detect Chinese characters and use another slow function by operating runes.
The key text in your quote is "In UTF-8 mode", which is never explained.
There should be a flag that you can use as well, but with reference pcreunicode, this works:
import re
echo "中文".match(re"(*UTF)[\x{4e00}-\x{9fa5}]+")
Thanks. (*UTF8) fixes the code.
I actually glanced that at https://nim-lang.org/docs/nre.html#options but didn't even realized it was related.