I'm working on a Nim port of the Lua pattern matching code. Lua patterns are a bit like regular expressions, but simpler. The implementation is pretty small, however, compared to regular expressions, and the port is pure Nim and does not require any shared libraries like PCRE for the re and nre libs.
The current code is available at https://github.com/zevv/nimpat/
The basic pattern matching is functional, a single proc 'match' is implemented, and can be used like this:
proc match(src: string, pat: string): seq[string]
let src = "3 foxes: 0x44111"
let pat = "(%d+) ([^:]+): 0x(%x+)"
let caps = src.match(pat)
echo "$1 $2 $3" % [ caps[0], caps[1], caps[2]]
The pattern "(%d+) ([^:]+): 0x(%x+)" translates to
I am now looking for a friendly API for the library, as getting the returned matches in a seq is a bit cumbersome to use. Ideally I would like to do pattern captures like in Lua allowing direct assignment to variables from a proc and/or iterator:
local a, b, c = src:match(pat)
for a, b, c in src:gmatch(pat) do
...
end
In Nim that would be something like this, using tuples.
let (a, b, c) = src.match(pat)
for a, b, c in src.gmatch(pat):
...
I have been trying to implement this in Nim, but I can not get this to work because (quote Araq) "overloading doesn't look at the return types"
So this does not work:
iterator gmatch(src, pat: string): (string) =
let c = src.match(pat)
yield (c[0])
iterator gmatch(src, pat: string): (string, string) =
let c = src.match(pat)
yield (c[0], c[1])
iterator gmatch(src, pat: string): (string, string, string) =
let c = src.match(pat)
yield (c[0], c[1], c[2])
nimpat.nim(357, 14) Error: ambiguous call; both nimpat.gmatch(src: string, pat: string)[declared in nimpat.nim(348, 9)] and nimpat.gmatch(src: string, pat: string)[declared in nimpat.nim(352, 9)] match for: (string, string)
Araqs next remark was to "write a macro", but my Nim-fu is not up to that yet, I'm afraid.
So, would it be possible to get this to work, and how would I proceed from here?
Thanks,
Ico
Regarding the overloading, why not call these gmatch1, gmatch2 and gmatch3 and call it a day?
You could also unpack your seq using some of the answers proposed here: https://stackoverflow.com/questions/31948131/unpack-multiple-variables-from-sequence
Well, if you need gmatch3 then it won't do. But if gmatch1 and gmatch2 are enough, use C++-like approach and make those a proc which returns a structure with items and pairs iterators. ;)
type MatchWrap = distinct seq[string]
proc gmatch(src, pat: string): MatchWrap =
let c = src.match(pat)
MatchWrap(c)
iterator items(wrap: MatchWrap): string =
yield seq[string](wrap)[0]
iterator pairs(wrap: MatchWrap): (string,string) =
yield (seq[string](wrap)[0], seq[string](wrap)[1])
Yeah, I guess providing multiple functions would work, since in practice patterns are usually limited to a handful of captures.
Thanks,
This is also usable, although it requires the variables to receive the captures to be declared ahead of time:
proc match(src: string, pat: string, c0: var string): bool =
let caps = src.match(pat)
if caps.len == 1:
c0 = caps[0]
return true
proc match(src: string, pat: string, c0, c1: var string): bool =
let caps = src.match(pat)
if caps.len == 2:
c0 = caps[0]
c1 = caps[1]
return true
proc match(src: string, pat: string, c0, c1, c2: var string): bool =
let caps = src.match(pat)
if caps.len == 3:
c0 = caps[0]
c1 = caps[1]
c2 = caps[1]
return true
let src = "3 foxes: 0x1234"
var a, b: string
if src.match("(%a+).*0x(%x+)", a, b):
echo "a = " & a
echo "b = " & b
For gmatch definition, how about if you make it generic? You define it like
iterator gmatch[R: seq | array | tuple](src, pat: string): R
Of course you need to match the result based on the type given, not sure how to proceed for that :P
@zevv, your library doesn't support utf-8. IMO, this is a big disadvantage.
BTW, have you seen nim-regex library?
True, it does not support UTF-8, just as the original Lua patterns do not. I guess having proper support for UTF-8 would probably include supporting Unicode as well and would complicate things a lot- for example, matching any unicode upper case character with %u would not be trivial.
BTW: YEs, I've seen nim-regex, which is also very nice. The lua pattern port is mainly for my personal use, because I'm leaving Lua more then 10 years of Lua behind, but I dearly miss the patterns. The advantage is that they are much more simple then regular expressions, which often result in something that I can read and understand a year after I wrote it - which is often not the case with full fledged PCRE regular expressions.
(Personally, I'd prefer pure Nim regex handing in the Nim stdlib over a PCRE based solution which requires external libs, would nim-regex not be a nice candidate?)
it requires the variables to receive the captures to be declared ahead of time:
This might work (untested):
proc makeDiscardable[T](a: T): T {.discardable.} = a
template match(src: string, pat: string, c0, c1: untyped): bool =
when compiles(c0):
c0 = ""
else:
var c0 = ""
when compiles(c1):
c1 = ""
else:
var c1 = ""
let caps = src.match(src, pat)
if caps.len == 2:
c0 = caps[0]
c1 = caps[1]
makeDiscardable(true)
else:
makeDiscardable(false)
let src = "3 foxes: 0x1234"
if test.match(src, "(%a+).*0x(%x+)", a, b):
echo "a = " & a
echo "b = " & b
untyped only works with templates, but templates won't return discardable, so the makeDiscardable proc work around. compiles() checks if the variable was already declared, if not create it.