Working with various languages in Wikipedia and would like to capture text that is Unicode, for example:
This works (plain ascii):
import re
let t = "{{Cite book|test=}}"
echo $(findBounds(t, re("(*UTF8)[{]{2}Cite book[|][^}]+}}", {}) ))
This does not work (Unicode):
import re
let t = "{{Сite book|ссылка=|автор=Виноградов В. Б., Бараниченко Н. Н.}}"
echo $(findBounds(t, re("(*UTF8)[{]{2}Cite book[|][^}]+}}", {}) ))
No luck w/ nre
import nre
let t = "{{Сite book|ссылка=|автор=Виноградов В. Б., Бараниченко Н. Н.}}"
for found in t.findIter(re("(*UTF8)(?s)[{]{2}Cite book[|][^}]+}}")):
echo $found
How to capture unicode?
You might want to preform some sort of unicode normalization first, to map unicode "C" to ascii "C" etc...