Hello,
I'm confused about the behaviour of findBounds from std/re. It seems to only ever return the first match in the matches parameter and leave the rest empty.
Here is a minimal example:
import std/re
let temp = findAll("Hello World", re"(\w+)")
echo temp
var matches = newSeq[string](2)
let (first, last) = findBounds("Hello World", re"(\w+)", matches)
echo matches
prints:
@["Hello", "World"]
@["Hello", ""]
I would have expected both printed lines to be the same.
findBounds <https://nim-lang.org/docs/re.html#findBounds%2Cstring%2CRegex%2Cint> has an optional start argument defaulting to 0 and will return the first match encountered scanning from start offset in the string.
In order to return all matches I wrote this iterator:
iterator iterAllFoundBounds(s: string; pattern: Regex; start = 0): tuple[first, last: int] =
var found: tuple[first, last: int] = (start, 0)
while found.first < s.len:
found = s.findBounds(pattern, found.first)
if found.first < 0:
break
yield found
found.first = found.last+1
HTH
That's not the same findBounds I'm talking about, sorry I should have specified, I'm talking about this one: https://nim-lang.org/docs/re.html#findBounds%2Cstring%2CRegex%2CopenArray%5Bstring%5D%2Cint
It takes an array of strings as parameters that it says it will fill.
My bad, I should have deduced which findBounds you were referring to from the type of the arguments passed to it.
Nethertheless the findBounds you are using only returns the bounds of the first matching span of the string passed as first argument. The matches sequence contains the matched groups as defined in your regular expressions (in order from left ro right).
In your case there is only one group (\w+) so only the first element of the sequence is populated. Defining two groups like below, returns each matched word as a separate element in the matches sequence.
import std/re
let temp = findAll("Hello World", re"(\w+)")
echo temp
var matches = newSeq[string](2)
let (first, last) = findBounds("Hello World", re"(\w+)\s(\w+)", matches)
echo matches