nimforum mirror - re.findBounds doesn't return all matches

Charles (orginal) [2023-12-03T11:33:30+01:00] view original

Hello,

I'm confused about the behaviour of findBounds from std/re. It seems to only ever return the first match in the matches parameter and leave the rest empty.

Here is a minimal example:

import std/re

let temp = findAll("Hello World", re"(\w+)")
echo temp

var matches = newSeq[string](2)
let (first, last) = findBounds("Hello World", re"(\w+)", matches)
echo matches

prints:

@["Hello", "World"]
@["Hello", ""]

I would have expected both printed lines to be the same.

dxb (orginal) [2023-12-03T13:47:39+01:00] view original

findBounds <https://nim-lang.org/docs/re.html#findBounds%2Cstring%2CRegex%2Cint> has an optional start argument defaulting to 0 and will return the first match encountered scanning from start offset in the string.

In order to return all matches I wrote this iterator:

iterator iterAllFoundBounds(s: string; pattern: Regex; start = 0): tuple[first, last: int] =
  var found: tuple[first, last: int] = (start, 0)
  while found.first < s.len:
    found = s.findBounds(pattern, found.first)
    if found.first < 0:
      break
    yield found
    found.first = found.last+1

HTH

Charles (orginal) [2023-12-03T13:56:45+01:00] view original

That's not the same findBounds I'm talking about, sorry I should have specified, I'm talking about this one: https://nim-lang.org/docs/re.html#findBounds%2Cstring%2CRegex%2CopenArray%5Bstring%5D%2Cint

It takes an array of strings as parameters that it says it will fill.

dxb (orginal) [2023-12-03T14:13:40+01:00] view original

My bad, I should have deduced which findBounds you were referring to from the type of the arguments passed to it.

Nethertheless the findBounds you are using only returns the bounds of the first matching span of the string passed as first argument. The matches sequence contains the matched groups as defined in your regular expressions (in order from left ro right).

In your case there is only one group (\w+) so only the first element of the sequence is populated. Defining two groups like below, returns each matched word as a separate element in the matches sequence.

import std/re

let temp = findAll("Hello World", re"(\w+)")
echo temp

var matches = newSeq[string](2)
let (first, last) = findBounds("Hello World", re"(\w+)\s(\w+)", matches)
echo matches

Charles (orginal) [2023-12-03T14:21:48+01:00] view original

Oooooh I see that makes sense, thank you.

Mirror of forum.nim-lang.org

10725 :: re.findBounds doesn't return all matches