I have just uploaded my first stab at a Nim module: https://github.com/Henry/Nimbol
Documentation:
http://henry.github.com/Nimbol/doc/nimbol.html
Needs odarrays (based on rtarrays):
http://henry.github.com/Nimbol/doc/odarrays.html
It isn't perfect, needs more testing and examples but it is a start. Currently it matches strings in a manner similar to PEG but I plan to create a generic version in which patter matching may be applied to sequences of any type which provides comparison operators.
Feedback welcome.
Really cool! I've been toying with some ideas for writing a pattern matching library myself (something more akin to the PEG module) and this looks like it could be a big help (not to mention very useful as a replacement for hand-written parsers). The documentation is well-written and comprehensive, and the source code is very readable.
That being said, why the odarrays dependency? It appears that the odarray type is only used in one place, as part of the matching state stack. I'm also a tad uncomfortable with the use of operator procedures acting on bare strings, but that's probably my irrational side talking.
I quickly knocked-up OdArray based on RtArray (which Araq wrote for similar reasons for re.nim) for the backtracking stack to avoid unnecessary heap allocations for small patterns. Basically it uses an Array if the stack is small and only allocates a seq if a larger stack is required. If you would rather not use the OdArray the RtArray could be used but setLen functions would have to be added. Alternatively if you know the maximum stack size you will need you could replace OdArray with an Array or if you are not bothered about the modest overhead of the allocation of the seq you could just use that. I could put all of the options on compile-time switches. However, I rather like OdArray and would be happy to see the functionality used or added to RtArray.
I'm also a tad uncomfortable with the use of operator procedures acting on bare strings
Could you clarify? I am happy to consider any improvements.
Note also that currently pointers are used to allow changes to external strings etc. but I am considering adding support for ref pattern data so that GC allocated strings can be handled seamlessly.
Note also that currently pointers are used to allow changes to external strings etc. but I am considering adding support for ref pattern data so that GC allocated strings can be handled seamlessly.
Oh yeah. That should definitely be changed.
@Henry So I'm trying out the library, and have a question. How would I represent the following regex string?
Hello, my name is (.*?) and I am a programmer\.
Goodbye $1
(Note the use of regex groups)Adding ref support isn't essential as the underlying string can be passed to the Assign functions/operators which would hold a pointer to it. This is generally OK as you would not expect the pattern to keep the data alive through holding a ref because if the data is not also held outside the pattern there would be no way to access the data the match paroduced. However for consistency and to avoid memory errors caused by the Assign functions being used incorrectly I will add ref support.
Note that I am not an expert in using SNOBOL4 patters, just interested in the flexibility of the method. I would need spend a little time creating the equilavent of you regex. You may find the tests at the bottom of the nimbol.nim file useful.
Oh yeah. That should definitely be changed.
Now references to strings (string), cursor locations (Natural) and files (File) are directly supported.
Correct, Setcur retrieves the current cursor position and sets the argument to that value. I added ref Natural support today.
To move the cursor you can use the Len, Tab and Rtab functions.
I added another test for Tab to demonstrate that it sets the cursor position relative to the start of the subject string:
let subject1 = "indiana"
let p2 = "indi" & Tab(6) & "a"
assert match(subject1, p2) == true