nimforum mirror - how to detect Chinese character with regex?

jiyinyiyong (orginal) [2021-01-18T19:57:54+01:00] view original

I was trying some pattern from other languages:

echo "中文".match(re"[\x{4e00}-\x{9fa5}]+")

And I got error:


/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(102) re
/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(78) rawCompile
/Users/chen/.choosenim/toolchains/nim-1.4.0/lib/impure/re.nim(70) raiseInvalidRegex
Error: unhandled exception: character value in \x{} or \o{} is too large
[\x{4e00}-\x{9fa5}]

by reading docs, I thought max value should be 7FFFFFFF?

https://nim-lang.org/docs/re.html

After x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, any number of hexadecimal digits may appear between x{ and }, but the value of the character code must be less than 2**31 (that is, the maximum hexadecimal value is 7FFFFFFF).

what's happening here? how to correct the code?

out of the scope of this thread... but my original issue was that str.escape() handles Chinese characters differently so I have to detect Chinese characters and use another slow function by operating runes.

Mirror of forum.nim-lang.org

7399 :: how to detect Chinese character with regex?