nimforum mirror - re.match() problem

AxBen (orginal) [2015-01-22T09:06:48+01:00] view original

What's wrong with this piece of code:

import re;

var
    tokens: seq[string]
    
    timeLine = re"(..):(.*)"

if "00:00:36,443".match(timeLine, tokens):
   echo ">>> ", tokens[0]

which results in:

Traceback (most recent call last)
sample.nim(8)            sample
SIGSEGV: Illegal storage access. (Try to compile with -d:useSysAssert -d:useGcAssert for details.)

but IMHO it shouldn't. If match() returns true, then tokens shouldn't be empty.

Araq (orginal) [2015-01-22T10:25:13+01:00] view original

You need to either use an array[2, string] for tokens or do tokens = newSeq[string](2).

AxBen (orginal) [2015-01-22T11:11:31+01:00] view original

@Araq

thanks you for the info, this works.

However, I wonder why - as match() accepts sequences - it wouldn't make more sense to have match() (and the other alike functions) take care of the sequence's expansion? At a minimum I would expect the compiler to print some warning message.

AxBen (orginal) [2015-01-22T14:07:02+01:00] view original

Pattern matching seems to be buggy:

1) Ignoring characters


import re;

var tokens: array[8, string]
let timeLine = re"(\d\d):(\d\d):(\d\d),(\d\d+) (.*)"

if "00:00:03,009 --> 00:00:08,009".match(timeLine, tokens):
   echo "[", tokens[0], "] [", tokens[1], "] [", tokens[2], "] [", tokens[3], "] [", tokens[4], "]"

prints "[00] [00] [03] [009] [ --> 00:00:08,009]"; note the space before the arrow, which should not be matched.

2) Not matching at all


import re;

var tokens: array[8, string]
let timeLine = re"(\d\d):(\d\d):(\d\d),(\d\d+) --> (\d\d):(\d\d):(\d\d),(\d\d+)"

if "00:00:03,009 --> 00:00:08,009".match(timeLine, tokens):
   echo "[", tokens[0], "] [", tokens[1], "] [", tokens[2], "] [", tokens[3], "] [", tokens[4], "] [", tokens[5], "] [", tokens[6], "] [", tokens[7], "]"

prints nothing

where the corresponding Perl script works as intended:


@tokens = "00:00:03,009 --> 00:00:08,009" =~ /(\d\d):(\d\d):(\d\d),(\d\d+) --> (\d\d):(\d\d):(\d\d),(\d\d+)/;

print "[", $tokens[0], "] [", $tokens[1], "] [", $tokens[2], "] [", $tokens[3], "] [", $tokens[4], "] [", $tokens[5], "] [", $tokens[6], "] [", $tokens[7], "]";

which prints: "[00] [00] [03] [009] [00] [00] [08] [009]"

Araq (orginal) [2015-01-22T14:13:02+01:00] view original

proc re*(s: string, flags = {reExtended, reStudy}): Regex

The default is extended re syntax so whitespace is available to make the regexes more readable. But yes, this should be in big FAT letters in docs. PRs are welcome, as usual.

AxBen (orginal) [2015-01-22T16:08:21+01:00] view original

@Araq

Again, thanks for the info.

Where would I file a PR?

def (orginal) [2015-01-22T16:32:31+01:00] view original

Where would I file a PR?

On Github: http://github.com/Araq/Nim/pulls

BlaXpirit (orginal) [2015-01-22T17:11:51+01:00] view original

Pleeease just use NRE http://forum.nim-lang.org/t/771

AxBen (orginal) [2015-01-22T18:22:46+01:00] view original

@BlaXpirit

I wouldn't mind using NRE, but for now I'd like to stick with "The Standard". Maybe RE will be NRE in the future?

BlaXpirit (orginal) [2015-01-22T19:17:04+01:00] view original

I sure hope RE is deprecated in favor of NRE. I won't be using RE, that's for sure.

AxBen (orginal) [2015-01-23T09:03:23+01:00] view original

@def: Is this really the correct place for CRs? (Haven't seen any user requests there)

axben (orginal) [2015-03-02T20:23:11+01:00] view original

As I take it, right now Nim depends on pcre.dll, which seems to prevent us from having an automatically filled/added to seq[string] in match(). But is it necessary that we have to initialize the underlying array ourselves?

var
   tokens: array[2, string]  # seq[string] would be nicer...
   tests = ["-n345", "-n", "--test345"]

for test in tests:
   tokens = ["", ""]  # OMITTING THIS GIVES US "WRONG" RESULTS
   if test.match(re"-([[:alpha:]])\s?(.+)?", tokens):
      echo tokens[0], ", ", tokens[1]

GravityWell (orginal) [2015-03-06T13:56:47+01:00] view original

@axben: I found it necessary to initialize first. A snippet of my working code:

var matchesidx = newseq[tuple[first: int, last: int]](5)
var spos = findbounds(fstr,regpat,matches=matchesidx,0)

Mirror of forum.nim-lang.org

777 :: re.match() problem