Hello,
I am trying to use the proc findbounds as recommended, but running into a problem not having an example:
proc findBounds(s: string; pattern: TRegex; matches: var openArray[string];
start = 0): tuple[first, last: int] {.raises: [], tags: [],
uses: [].}
if I try something like:
import parseutils
import strutils
import re
import pcre
import unicode
var myresults: tuple = findbounds(currentline, re"chapter")
I get the following error:
Error: internal error: GetUniqueType
No stack traceback available
I'm not sure what I am doing wrong. If someone could point out to me it would be much appreciated.
Thanks!
import parseutils import strutils import re import pcre import unicode var currentline = "xyz" #var myresults: tuple = findbounds(currentline, re"chapter") var (s, e) = findbounds(currentline, re"chapter") echo s, e
Compiles fine with 0.9.4 and output is -1 and 0.
Hello Stefan,
Thank you for your example. Actually I do need the match results. The description of the proc indicates that it stores matches in matches variable.
But I'm not certain if this is a variable that I must define and pass to the proc, or if it is a built-in variable. But when I try to output the capture by referencing matches[0], this doesn't work.
var currentline: string = "[chapter Uno] and {style} [chapter dos]."
#This is to test findbounds
var (start, e) = findbounds(currentline, re"\[chapter(\s+)(.*?)\]")
echo start, e, matches[0]
I also tried declaring a matches array var, but this also did not work.
I apologize for the questions, very much learning the Nim way of doing things...
Here is a possible findBounds example:
import re
let
currentline = "[chapter Uno] and {style} [chapter dos]."
regex = re"\[chapter(\s+)(.*?)\]"
proc testStrings() =
var matches: seq[string] = @["", ""]
let (start, e) = currentline.findbounds(regex, matches)
echo "testStrings"
echo "start: ", start, " end: ", e, " matches: ", matches.repr
proc testIndices() =
var matches: seq[tuple[first, last: int]]
matches.newSeq(2)
let (start, e) = currentline.findbounds(regex, matches)
echo "testIndices"
echo "start: ", start, " end: ", e, " matches: ", matches.repr
when isMainModule:
testStrings()
testIndices()
The reason why you seem to be confused about the matches array is because in Nimrod there is parameter overloading. There are three possible ways to call findBounds. You used one, the simplest. The example above shows how to use the other two versions. One of them will capture the strings, another will capture the indices of where each capture starts so you can slice the original string with them. Here is the output on my machine:
testStrings
start: 0 end: 12 matches: 0x10f4ec050[0x10f4ed078" ", 0x10f4ed0a0"Uno"]
testIndices
start: 0 end: 12 matches: 0x10f4ef050[[Field0 = 8,
Field1 = 8], [Field0 = 9,
Field1 = 11]]
The matches array should be a variable you create yourself with enough pre-allocated space to hold all the regex groups (that's why I'm initialising it once with empty strings, and another with the newSeq() proc.Here is a possible findBounds example:
Thanks. Indeed it is not very easy to guess usage.
I tried a few minutes before your reply with a simplified example -- my most stupid error was that I uses no () in the regex to indicate captures, so I got valid start and end positions but the matches variable was still unchanged.
Guessing that the matches variable needs to be filled with empty strings was not easy for me also, but I managed it myself...
I think instead of
var matches: seq[string] = @["", ""]
we should better use an array
var matches: array[2, string]
Is only my feeling and seems to work -- for me it makes not much sense to use a sequence when findbounds() is not dynamically extending it.
I think I am just not getting it, and not really sure why.
gradha, from your example, I would expect to get 2 matches for the regex. Actually lets make it more simple and use:
let
currentline = "[chapter] and {style} [chapter]."
regex = re"(\[chapter\])"
I would expect to see 2 matches when I echo matches, but instead there is only 1.
Trying to find a way around it, I found proc replacef.
But here again I am am getting frustrated, not getting anywhere:
proc replacef(s: string; sub: TRegex; by: string): string {.
raises: [EInvalidValue], tags: [], uses: [].}
Replaces sub in s by the string by. Captures can be accessed in by with the notation $i and $# (see strutils.`%`). Examples:
"var1=key; var2=key2".replacef(re"(w+)'='(w+)", "$1<-$2$2")
Results in:
"var1<-keykey; val2<-key2key2"
In the end I'm simply trying to do some "search and replace" using regex and captures. So, using the example I tried:
currentline.replacef(regex, ".CHAPTER")
But instead this generates the following error:
Error: value of type 'string' has to be discarded
At first glance things look simple enough, and maybe things will get easier once I get the hang of Nim, but right now it's a very frustrating experience :)
The matches array will store the regular expression's groups. In the case you highlight:
let
currentline = "[chapter] and {style} [chapter]."
regex = re"(\[chapter\])"
The regular expression regex contains a single group capture matching the string chapter. As soon as that is found, the rest of the string is left unprocessed and the proc returns. If you suspect there are more matches, and you want them, you will need to repeat the search using a substring starting from the previous group match end index until the whole string is exhausted.
This has been done for the proc/iterator findAll dealing with just strings. You could request or propose a similar implementation for the indices version which would effectively return you a sequence of all the start/end pairs for all matches in the input string.