The documentation on the re module describes how things match in UTF-8 mode, in which one can presumably write expressions that match ranges of Unicode characters. However, I can't figure out how to enable UTF-8 mode. It doesn't seem to be one of the possible flags, for example. So far, I've had to write more complex expressions where characters are broken down into their component bytes.
Does this mode actually exist, and if so, how do you use it?
PCRE supports Unicode. You can pass PCRE options in the regex string itself. However, re module will break for most uses, because it was not made with UTF-8 in mind.
Just use nre https://github.com/flaviut/nre
Okay, cool.
Can I see an example of some code that uses nre? I'm not sure what to do with an Option[RegexMatch] object.
I've seen the part of the docs that says:
Usually seen as Option[RegexMatch], it represents the result of an execution. On failure, it is None[RegexMatch], but if you want automated derefrence, import optional_t.nonstrict
...but I don't understand how to use it based on that.