nimforum mirror - How to use proc findbounds?

drifter (orginal) [2014-10-19T19:37:10+02:00] view original

Hello,

I am trying to use the proc findbounds as recommended, but running into a problem not having an example:


proc findBounds(s: string; pattern: TRegex; matches: var openArray[string];
                start = 0): tuple[first, last: int] {.raises: [], tags: [],
    uses: [].}

if I try something like:


import parseutils
import strutils
import re
import pcre
import unicode

var myresults: tuple = findbounds(currentline, re"chapter")

I get the following error:


Error: internal error: GetUniqueType
No stack traceback available

I'm not sure what I am doing wrong. If someone could point out to me it would be much appreciated.

Thanks!

Stefan_Salewski (orginal) [2014-10-19T20:39:08+02:00] view original

If you are only interested to use proc findbounds() you may try something like

import parseutils
import strutils
import re
import pcre
import unicode

var currentline = "xyz"

#var myresults: tuple = findbounds(currentline, re"chapter")
var (s, e) = findbounds(currentline, re"chapter")

echo s, e

Compiles fine with 0.9.4 and output is -1 and 0.

drifter (orginal) [2014-10-19T21:46:42+02:00] view original

Hello Stefan,

Thank you for your example. Actually I do need the match results. The description of the proc indicates that it stores matches in matches variable.

But I'm not certain if this is a variable that I must define and pass to the proc, or if it is a built-in variable. But when I try to output the capture by referencing matches[0], this doesn't work.


var currentline: string = "[chapter Uno] and {style} [chapter dos]."

#This is to test findbounds
var (start, e) = findbounds(currentline, re"\[chapter(\s+)(.*?)\]")
echo start, e, matches[0]

I also tried declaring a matches array var, but this also did not work.

I apologize for the questions, very much learning the Nim way of doing things...

gradha (orginal) [2014-10-19T22:36:52+02:00] view original

Here is a possible findBounds example:


import re

let
  currentline = "[chapter Uno] and {style} [chapter dos]."
  regex = re"\[chapter(\s+)(.*?)\]"

proc testStrings() =
  var matches: seq[string] = @["", ""]
  let (start, e) = currentline.findbounds(regex, matches)
  echo "testStrings"
  echo "start: ", start, " end: ", e, " matches: ", matches.repr

proc testIndices() =
  var matches: seq[tuple[first, last: int]]
  matches.newSeq(2)
  let (start, e) = currentline.findbounds(regex, matches)
  echo "testIndices"
  echo "start: ", start, " end: ", e, " matches: ", matches.repr

when isMainModule:
  testStrings()
  testIndices()

The reason why you seem to be confused about the matches array is because in Nimrod there is parameter overloading. There are three possible ways to call findBounds. You used one, the simplest. The example above shows how to use the other two versions. One of them will capture the strings, another will capture the indices of where each capture starts so you can slice the original string with them. Here is the output on my machine:


testStrings
start: 0 end: 12 matches: 0x10f4ec050[0x10f4ed078" ", 0x10f4ed0a0"Uno"]

testIndices
start: 0 end: 12 matches: 0x10f4ef050[[Field0 = 8,
Field1 = 8], [Field0 = 9,
Field1 = 11]]

The matches array should be a variable you create yourself with enough pre-allocated space to hold all the regex groups (that's why I'm initialising it once with empty strings, and another with the newSeq() proc.

Stefan_Salewski (orginal) [2014-10-19T23:17:43+02:00] view original

Here is a possible findBounds example:

Thanks. Indeed it is not very easy to guess usage.

I tried a few minutes before your reply with a simplified example -- my most stupid error was that I uses no () in the regex to indicate captures, so I got valid start and end positions but the matches variable was still unchanged.

Guessing that the matches variable needs to be filled with empty strings was not easy for me also, but I managed it myself...

I think instead of

var matches: seq[string] = @["", ""]

we should better use an array

var matches: array[2, string]

Is only my feeling and seems to work -- for me it makes not much sense to use a sequence when findbounds() is not dynamically extending it.

drifter (orginal) [2014-10-20T00:14:20+02:00] view original

I think I am just not getting it, and not really sure why.

gradha, from your example, I would expect to get 2 matches for the regex. Actually lets make it more simple and use:


let
  currentline = "[chapter] and {style} [chapter]."
  regex = re"(\[chapter\])"

I would expect to see 2 matches when I echo matches, but instead there is only 1.

Trying to find a way around it, I found proc replacef.

But here again I am am getting frustrated, not getting anywhere:


proc replacef(s: string; sub: TRegex; by: string): string {.
    raises: [EInvalidValue], tags: [], uses: [].}
    
    Replaces sub in s by the string by. Captures can be accessed in by with the notation $i and $# (see strutils.`%`). Examples:
    
    "var1=key; var2=key2".replacef(re"(w+)'='(w+)", "$1<-$2$2")
    
    Results in:
    
    "var1<-keykey; val2<-key2key2"

In the end I'm simply trying to do some "search and replace" using regex and captures. So, using the example I tried:


currentline.replacef(regex, ".CHAPTER")

But instead this generates the following error:

Error: value of type 'string' has to be discarded

At first glance things look simple enough, and maybe things will get easier once I get the hang of Nim, but right now it's a very frustrating experience :)

gradha (orginal) [2014-10-20T01:00:01+02:00] view original

The matches array will store the regular expression's groups. In the case you highlight:


let
  currentline = "[chapter] and {style} [chapter]."
  regex = re"(\[chapter\])"

The regular expression regex contains a single group capture matching the string chapter. As soon as that is found, the rest of the string is left unprocessed and the proc returns. If you suspect there are more matches, and you want them, you will need to repeat the search using a substring starting from the previous group match end index until the whole string is exhausted.

This has been done for the proc/iterator findAll dealing with just strings. You could request or propose a similar implementation for the indices version which would effectively return you a sequence of all the start/end pairs for all matches in the input string.

Mirror of forum.nim-lang.org

593 :: How to use proc findbounds?