Ward (orginal) [2022-12-14T03:42:34+01:00] view original

TinyRE is a Nim wrap for a tiny regex engine less than 10K binary size (loc < 1K), and guarantees that input regex will scale O(n) with the size of the string.

Features:

Support unicode.

Support case-insensitive matching.

Support global matching.

Support most common regex syntax, including:
- Greedy and non-greedy expressions: *, +, ?, *?, +?, ??
- Characters sets: [xyz], [^xyz]
- Meta characters: \s, \S , \ w, \W, \d, \D, \n, \r, \t etc.
- Ascii or unicode characters: \x00, \u0000, \U00000000
- Beginning and end assertions: ^, $.
- Repetition operators: {n}, {n,m}, {n,}.
- Group and non-capture group: (...), (?:...).
- Start-of-word and end-of-word assertions: \<, \>.

Examples:

import tinyre

doAssert match("abc123", re"\d+") == @["123"]
doAssert bounds("abc123", re"\d+") == @[3..5]
doAssert contains("abc123", re"\d+") == true
doAssert startsWith("abc123", re"[a-z]+") == true
doAssert endsWith("abc123", re"\d+") == true
doAssert split("abc123", re"\d+") == @["abc", ""]
doAssert replacef("abc123", re"([a-z]+)(\d+)", "$2$1") == "123abc"

# reG for global matching
doAssert match("abc123", reG".") == @["a", "b", "c", "1", "2", "3"]

# reI for case insensitive matching
doAssert match("abc123", reI"ABC") == @["abc"]

# reU for utf8 matching
doAssert match("中文", reU"..") == @["中文"]

slangmgh (orginal) [2022-12-14T07:17:39+01:00] view original

I like it, thank you.

Araq (orginal) [2022-12-14T08:20:59+01:00] view original

The standard re/nre modules should link to these awesome alternatives. There is also nim-regex.

Yardanico (orginal) [2022-12-14T08:24:43+01:00] view original

Nice wrapper! Are you planning to rewrite the underlying C code in Nim in the future? So that it could e.g. work at compile-time or in JS backend.

Ward (orginal) [2022-12-14T11:34:33+01:00] view original

Add a benchmark result on github.

# small string: "abc123def".contains("\d+")
# large string: 6.71 MB text file
#   email: [\w\.+-]+@[\w\.-]+\.[\w\.-]+
#   uri: [\w]+://[^/\s?#]+[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?
#   ipv4: (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])
# compile options: -d:release -d:danger --opt:speed -d:lto

name ............................... min time      avg time    std dv   runs
tinyre (small string) .............. 0.366 ms      0.379 ms    ±0.018  x1000
std/re (small string) .............. 5.862 ms      6.218 ms    ±0.171   x797
nim-regex (small string) .......... 16.132 ms     17.067 ms    ±0.580   x288
tinyre (large string, email) ..... 140.684 ms    151.663 ms    ±8.625    x33
std/re (large string, email) ...... 44.793 ms     48.884 ms    ±2.716   x102
nim-regex (large string, email) .... 3.680 ms      3.921 ms    ±0.132  x1000
tinyre (large string, uri) ....... 127.465 ms    131.721 ms    ±2.110    x38
std/re (large string, uri) ........ 40.380 ms     42.812 ms    ±1.175   x117
nim-regex (large string, uri) ..... 21.400 ms     22.205 ms    ±0.344   x225
tinyre (large string, ipv4) ...... 182.995 ms    186.441 ms    ±1.057    x27
std/re (large string, ipv4) ........ 4.854 ms      5.965 ms    ±0.903   x838
nim-regex (large string, ipv4) ..... 7.569 ms      7.849 ms    ±0.159   x635

Port to pure nim is possible, but I don't know is it still "tiny". https://github.com/nitely/nim-regex is already a pure nim regex and works well in compile-time. In JS backend, using https://nim-lang.org/docs/jsre.html seems more reasonable.

DeletedUser (orginal) [2023-03-15T17:19:13+01:00] view original

I just started with Nim and I'm trying to do the things I did in Python. Can I get groups with tinyRE?

var ppl = """olle=7, pelle=12, lisa=21, ringhals=42"""

myMatches = match(ppl, reG"(w+)=(d+)")

echo myMatches.groups

@[["olle",7], ["pelle",12], ["lisa",21],["ringhals,42]]

Mirror of forum.nim-lang.org

9723 :: TinyRE - Tiny Regex Engine for Nim

@[["olle",7], ["pelle",12], ["lisa",21],["ringhals,42]]