Mirror of forum.nim-lang.org

1173 :: How does one use "UTF-8 mode" for regular expressions?

[2015-05-01T10:56:37+02:00]

rspeer (orginal) [2015-05-01T10:56:37+02:00] view original

The documentation on the re module describes how things match in UTF-8 mode, in which one can presumably write expressions that match ranges of Unicode characters. However, I can't figure out how to enable UTF-8 mode. It doesn't seem to be one of the possible flags, for example. So far, I've had to write more complex expressions where characters are broken down into their component bytes.

Does this mode actually exist, and if so, how do you use it?

BlaXpirit (orginal) [2015-05-02T02:31:30+02:00] view original

PCRE supports Unicode. You can pass PCRE options in the regex string itself. However, re module will break for most uses, because it was not made with UTF-8 in mind.

Just use nre https://github.com/flaviut/nre