nimforum mirror - What would be the bast way to get parsing and validation of parameters (option:value)

void09 (orginal) [2023-01-11T03:02:48+01:00] view original

Need to parse and validate user supplied parameters for video encoders, in the form of "--option1:value --option2 --option3:value", which are unique for each encoder we have defined. option fields must be in a set of predefined such fields, while values have to be in a certain range for integers, have to be of certain specific values for strings, or maybe be a valid file path.

https://nim-lang.org/docs/parseopt.html only does the parsing which is not that hard, and not that useful to me without validation, as I'd need to extract the data from its object anyway after parsing. The other command line libs in nimble that I looked at all have parsing, but no validation options.

https://github.com/captainbland/nim-validation - this is kind of nice, but only does validation on object fields, and I am not sure I want an object with tens or 100 fields. I would have to look into how to handle iterating over fields. Worth keeping in mind.

https://github.com/sealmove/binarylang with the help of sealmove, I do have a working implementation using this. Downside is, binarylang was used for binary streams, not string parsing, author believes it's not a proper context to utilize it in.

I am curious how would you implement this. Without using 100 line long case statements, for each option.

Bonus features would be the ability to "inherit" these values:constraints from an encode type and extend them, like with classes. So I don't have to repeat 100 lines for each slightly different version or a fork that changed an option and so on.

A nice thing to have would be runtime loading of these values, so they don't need to be known at compile time. but not strictly necessary.

Araq (orginal) [2023-01-11T08:49:18+01:00] view original

The default design is always procedural programming:

import std / [strformat, strutils, tables]

proc error(msg: string) = echo msg

proc wantNumberBetween(field, s: string; a, b: int) =
  try:
    let x = parseInt(s)
    if x < a or x > b:
      error(&"{field} of value {s} must be in range {a}..{b}")
  except ValueError:
    error(&"{field} takes a number")

template intField(name: string; a, b: int) {.dirty.} =
  if name in fields:
    wantNumberBetween name, fields[name], a, b
    del fields, name # mark as processed

proc validateDecoderA(fields: var Table[string, string]) =
  intField "abc", 1, 3
  intField "xyz", 5, 6

proc validateDecoderB(fields: var Table[string, string]) =
  # "inheritance" is simply done with a function call
  validateDecoderA(fields)
  intField "B specfiic field", 12, 16

proc noRemainingFields(fields: Table[string, string]) =
  for k, v in pairs(fields):
    error &"unknown field: {k} of value {v}"

proc validate(decoder: string; fields: Table[string, string]) =
  var fullCopy = fields
  case decoder
  of "A": validateDecoderA fullCopy
  of "B": validateDecoderB fullCopy
  noRemainingFields fullCopy

validate "A", {"abc": "34", "unknown": "abc"}.toTable

That's the entire logic but you will have "hundreds" of entries like intField "abc", 1, 2. But any solution requires these and by using Nim code you can compress the descriptions in ways that are usually outside the realm of custom text formats that you interpret at runtime.

xigoi (orginal) [2023-01-11T09:03:13+01:00] view original

I'm not sure what exactly you're asking, but maybe cligen would work?

void09 (orginal) [2023-01-11T18:30:50+01:00] view original

Hm no, in this case I want to simply ensure the parameters that I get, from any external source, are valid in their context. They won't be used as parameters in a nim program, but to eventually execute an external binary. So cligen is not of any help.

Araq's way, "default design procedural programming", is not what I had in mind. I believe that if nim has the features needed to abstract these concepts to a more declarative syntax, then it should be used. Much more readable and sane to reason about. After all, this is a rather general use case I can see being needed (to validate cli params of a nim program, or external program you want to launch), I am surprised I couldn't find something readily available for the task.

Still, expanding on Aarq's suggestion, using templates and macros, I believe a small DSL could be built to define such constructs. Something like:


enumType fields:
  "param1" {validExpression} defaultValue
  "param2" {validExpression} defaultValue

and some associated procs that would generate a Table of the key value pairs (if valid) from the input string, and maybe the associated enum types for them, if needed/designed like that. Could also have the "inherit" keyword, so as to copy the list of statement from another structure previously defined, and expand/modify it.

Araq (orginal) [2023-01-11T21:13:31+01:00] view original

Just use my solution and get on with more interesting things in life. I know how to design and use DSLs and gave you a good solution for your problem under the assumption that you would not only be the user of the DSL, but also its implementer.

jackhftang (orginal) [2023-01-12T14:18:02+01:00] view original

I am curious how would you implement this. Without using 100 line long case statements, for each option.

I would say the keyword is parser combinator (PC). You can write customized parser for each option and then combine them with combinators. And you can achieve all 3 bonus features together. The resulting code of PC is usually dense, easy to read and maintain.

All you need is to grasp the concept of PC. It is not hard, but yet not short enough to be written here. There are many tutorials of PC on the Internet, even though they are written for other languages, the concept and usage is pretty much transferable.

I found honeycomb which interface looks okay, though I have to admit that I have never used it =] One possible issue you may encounter is that you have to write custom chain+map for parsers of different generic for each size e.g. chainMap3<T1,T2,T3,T>(p1: Parser<T1>, p2: Parser<T2>, p3: Parser<T3>, f: proc(t1:T1, t2:T2, t3:T3): T): Parser<T> or a macro for all generics and sizes.

jackhftang (orginal) [2023-01-13T14:30:20+01:00] view original

I realized that what I foresee the way to implement may not be what you get.

Since I have not coded nim for some time, I decided to do an exercise... but it took longer than I thought T^T...

Anyway, the following program have done

string option with choices

int option range validation

multi-string option with choices (not demanded)

composition of options (not exactly inheritance)

runtime parsing

default value for form "--option"

need a bit of skill to read error message (talk more below)

First, have a taste of parser combinators. It seems a lot code, but the hard part is that honeycomb is missing the combinators I want.

import honeycomb
import std/json
import std/sets
import std/sugar
import std/sequtils
import std/strutils
import std/strformat

# -------------------------------------------------------------
# general commbinators

proc succeedWith[T](x: T): Parser[T] =
  # common combinator missing in honeycomb?
  # nop[void]().result(x) ?
  createParser(T): succeed(input, x, input)

proc failWith[T](msg: openArray[string]): Parser[T] =
  # common combinator missing in honeycomb?
  let expected = @msg
  createParser(T): fail(input, expected, input)

proc choice[T](ps: openArray[Parser[T]], desc: string = "no options"): Parser[T] =
  # wrap oneOf(varargs) into choices(openArray)
  if len(ps) == 0: return failWith[T]([desc])
  result = ps[1]
  for i in 1 .. ps.high: result = result | ps[i]

proc branch[T1, T2, T](
  ps: openArray[(Parser[T1], Parser[T2])],
  f: proc(t1: T1, t2: T2): T): Parser[T] =
  runnableExamples:
    # use like if(...) ... elseif(...) ... elseif(...) ...,
    # if the first parser match, go to the second parser and do not backtrack other branches.
    let oct = c('0'..'7')
    let bin = c('0'..'1')
    let p = branch([
      (s("0b"), bin.atLeast(1)),
      (s("0"), oct.atLeast(1)),
      (s(""), digit.atLeast(1)),
    ], proc(base: string, ds: seq[char]): int =
      let b = case base:
        of "0b": 2
        of "0": 8
        else: 10
      for d in ds: result = result * b + ord(d) - ord('0')
    )
    let r = p.parse("0bFFFF")
    assert r.kind == failure
  
  let copy = @ps
  createParser(T):
    var expects: seq[string]
    for (cond, body) in copy:
      let r1 = cond.parse(input)
      case r1.kind:
      of failure:
        expects.add r1.expected
      of success:
        let r2 = body.parse(r1.tail)
        case r2.kind:
        of failure: return fail(input, r2.expected, input)
        of success: return succeed(input, f(r1.value, r2.value), r2.tail)
    fail(input, expects, input)

proc sepBy[T, T2](p1: Parser[T], p2: Parser[T2]): Parser[seq[T]] =
  ## [<p1> (<p2> <p1>)*]
  
  runnableExamples:
    let p = wordParser().sepBy(c(','))
    let r = p.parse("abc,xyz")
    assert r.kind == success
    assert r.value == @["abc", "xyz"]
    assert r.tail == ""
  
  runnableExamples:
    # this will fail because of the trailing space
    let p = wordParser().sepBy(c(' '))
    let r = p.parse("abc xyz ")
    assert r.kind == failure
  
  createParser(seq[T]):
    var res: seq[T]
    var r1: ParseResult[T]
    var r2: ParseResult[T2]
    
    r1 = p1.parse(input)
    case r1.kind:
    of failure:
      return succeed(input, res, input)
    of success:
      res.add r1.value
      
      # alternatively parse p2, p1, p2, p1...
      while true:
        r2 = p2.parse(r1.tail)
        case r2.kind:
        of failure:
          return succeed(input, res, r1.tail)
        of success:
          r1 = p1.parse(r2.tail)
          case r1.kind:
          of failure:
            # a success of p2 follow by fail of p1 result in whole fail
            # return fail(input, r1.expected, input)
            var expects = @[fmt"parse failure near `{r2.tail}`"]
            expects.add r1.expected
            return fail(input, expects, input)
          of success:
            res.add r1.value

proc wordParser(): Parser[string] =
  runnableExamples:
    let r = wordParser().parse("abc xyz")
    assert r.kind == success
    assert r.value = "abc"
    assert r.tail = " xyz"
  alphanumeric.atLeast(1).map(cs => cs.join(""))

proc intParser(): Parser[int] =
  # todo: this accept leading zeros e.g. 007, which is not a strictly correct grammar of integer
  digit.atLeast(1).map(cs => parseInt(cs.join("")))

# -------------------------------------------------------------
# application specific patterns

type OptionParser = Parser[JsonNode]

let dash = s("--")
let wsp = regex(r"\s*")
let sep = c(':') | c('=') # for fun

proc strOpt(name: string, opts: openArray[string]): OptionParser =
  ## match --<name>:<option>
  let copy = opts.toHashSet
  let option = wordParser().validate(w => w in copy,
      fmt"invalid option for {name}")
  let full = dash >> s(name) >> sep >> option
  full.map(s => %*{name: s})

proc strOpt(name: string, default: string, opts: openArray[string]): OptionParser =
  ## match --<name>
  ## match --<name>:<option>
  let copy = opts.toHashSet
  let option = wordParser().validate(w => w in copy, fmt"invalid option for {name}")
  let full = dash >> s(name) >> branch([
    (sep, option),
    # last case always success with default value
    (nop[char](), succeedWith(default))
  ], (_, s) => s)
  full.map(s => %*{name: s})

proc mltOpt(name: string, opts: openArray[string]): OptionParser =
  ## match --<name>:<opt1>
  ## match --<name>:<opt1>,<opt2>
  let copy = opts.toHashSet
  let option = wordParser().validate(w => w in copy, fmt"invalid option for {name}")
  let comma = c(',')
  let full = dash >> s(name) >> sep >> option.sepBy(comma)
  full.map(s => %*{name: s})

proc intOpt(name: string, rng: HSlice[int, int]): OptionParser =
  ## match --<name>:<num>
  let full = dash >> s(name) >> sep >> intParser().validate(n => n in rng, fmt"expect {name} to be in range {rng}")
  full.map(n => %*{name: n})

proc flgOpt(name: string, default: bool): OptionParser =
  ## match --<name>
  ## match --<name>:false
  ## match --<name>:true
  ## match --<name>:0
  ## match --<name>:1
  let boolParser = oneOf(
    s("true").result(true),
    s("1").result(true),
    s("false").result(false),
    s("0").result(false)
  ).desc("expect flag in one of the following form: 0, 1, true, false")
  let full = dash >> s(name) >> branch([
    (sep, boolParser),
    # last case always success with default value
    (nop[char](), succeedWith(default))
  ], (_, b) => b)
  full.map(b => %*{name: b})

proc mergeJson(js: seq[JsonNode]): JsonNode =
  result = newJObject()
  for j in js:
    for k, v in j:
      if k in result and v.kind == JArray:
        result[k].add v
      else:
        result[k] = v

proc optionLineParser(opts: openArray[OptionParser]): OptionParser =
  ## <wsp> [ <opt> ( <wsp1> (<opt> | <eol>) )* ]
  
  let opt = choice(opts)
  let eol = eof.result(newJObject())
  
  proc check(js: seq[JsonNode]): bool =
    result = true
    var ks: HashSet[string]
    for j in js:
      for k,v in j:
        if k in ks and v.kind != JArray:
          return false
        ks.incl k
  
  wsp >> (opt|eol).sepBy(whitespace).validate(check, "duplicated option").map(mergeJson) << eof

proc mergeOpt(opts: varargs[seq[OptionParser]]): OptionParser =
  var lis: seq[OptionParser]
  for opt in opts: lis.add opt
  optionLineParser(lis)

# -------------------------------------------------------------
# application

let commonOpt = @[
  strOpt("command", ["copy", "edit"]),
  strOpt("format", default="ogv", ["mp4", "ogv", "avi"]),
  flgOpt("verbose", false),
  intOpt("fps", 30..180),
  mltOpt("feature", ["aa", "bb", "cc"]),
]

let encoder1SpecificOpt = @[
  strOpt("option1", ["a1", "a2"]),
  intOpt("quality", 1..3),
]

let encoder2SpecificOpt = @[
  strOpt("option2", ["b1", "b2"]),
  intOpt("quality", 1..10),
]

let encoder1OptionParser = wsp >> mergeOpt(commonOpt, encoder1SpecificOpt) << eof
let encoder2OptionParser = wsp >> mergeOpt(commonOpt, encoder2SpecificOpt) << eof

echo encoder1OptionParser.parse("")
echo encoder1OptionParser.parse(" ")
echo encoder1OptionParser.parse("--format")
echo encoder1OptionParser.parse("--format:mp4")
echo encoder1OptionParser.parse("--format=mp4")
echo encoder1OptionParser.parse("--format:mp4 --quality:1 --feature=aa,cc")
echo encoder2OptionParser.parse("--format:mp4 --quality:1 --feature=aa,cc --verbose")
echo encoder1OptionParser.parse("--format:mp4 --quality:1 --feature=aa,cc --verbose:1")
echo encoder1OptionParser.parse("--format:mp4 --quality:10") # invalid range of speed for encoder1
echo encoder2OptionParser.parse("--format:mp4 --quality:10") # valid range of speed for encoder2
echo encoder1OptionParser.parse("--format:mp4 --quality:1 --format:ogv") # duplicated option
echo encoder2OptionParser.parse("--format:mp4 --quality:1 --option2=a1") # invalid option for encode2
echo encoder2OptionParser.parse("--format:mp4 --quality:1 --option2=b1") # valid option for encode2

#[
(kind: success, value: {}, tail: "", fromInput: "")
(kind: success, value: {}, tail: "", fromInput: " ")
(kind: success, value: {"format":"ogv"}, tail: "", fromInput: "--format")
(kind: success, value: {"format":"mp4"}, tail: "", fromInput: "--format:mp4")
(kind: success, value: {"format":"mp4"}, tail: "", fromInput: "--format=mp4")
(kind: success, value: {"format":"mp4","quality":1,"feature":["aa","cc"]}, tail: "", fromInput: "--format:mp4 --quality:1 --feature=aa,cc")
(kind: success, value: {"format":"mp4","quality":1,"feature":["aa","cc"],"verbose":false}, tail: "", fromInput: "--format:mp4 --quality:1 --feature=aa,cc --verbose")
(kind: success, value: {"format":"mp4","quality":1,"feature":["aa","cc"],"verbose":true}, tail: "", fromInput: "--format:mp4 --quality:1 --feature=aa,cc --verbose:1")
(kind: failure, expected: @["parse failure near `--quality:10`", "\'format\'", "\'format\'", "\'verbose\'", "\'fps\'", "\'feature\'", "\'option1\'", "expect quality to be in range 1 .. 3", "EOF"], tail: "--format:mp4 --quality:10", fromInput: "--format:mp4 --quality:10")
(kind: success, value: {"format":"mp4","quality":10}, tail: "", fromInput: "--format:mp4 --quality:10")
(kind: failure, expected: @["duplicated option"], tail: "--format:mp4 --quality:1 --format:ogv", fromInput: "--format:mp4 --quality:1 --format:ogv")
(kind: failure, expected: @["parse failure near `--option2=a1`", "\'format\'", "\'format\'", "\'verbose\'", "\'fps\'", "\'feature\'", "invalid option for option2", "\'quality\'", "EOF"], tail: "--format:mp4 --quality:1 --option2=a1", fromInput: "--format:mp4 --quality:1 --option2=a1")
(kind: success, value: {"format":"mp4","quality":1,"option2":"b1"}, tail: "", fromInput: "--format:mp4 --quality:1 --option2=b1")
]#

(A bit out of topic) After using honeycomb, my comment is that the library has some fundamental problems. First, it define type Parser[T] = proc(input: string): ParseResult[T], the input string drop all contextual information like position of line, I cannot generate more contextual error message from input. Also, it force direct manipulation of string, this is easy to accidentally create unnecessary strings. Secondly, it force user to eagerly generate error string for failure. It is very common to backtrack, many error strings are just created and drop away. A more decent implementation should be like

(Sorry, in typescript, I copy it from elsewhere)



export class Parser<T> {
  run: (ctx: ParseContext) => ParseResult<T>;
}
export class ParseContext {
  tokens: string;
  ix = 0;
  line = 0;
  column = 0;
}
export class ParseResult<T> {
  ctx: ParseContext;
  ok: boolean;
  value?: T;
  genErrorTree?: () => ParseErrorTree;
}
export class ParseErrorTree {
  name: string;
  error: string;
  children: ParseErrorTree[];
}

Though parser combinator generally slower then hand-written procedural, it can be optimized by writing low-level parser to close to the later given more effort. A properly implemented parser is just a stationary graph of closures that can be called many time without changes. Just-in-time like to inline them (Not directly related Nim unless you compile to JS or llvm and run on llvm-jit) and get closer to the version of procedural. Given the flexibility of PC, IMO, it worth the penality.

In closing, if you do need to have a very strict validation, you will need the full power of parser. It seems everyone has own flavour of doing parsing. Anyway, honeycomb, IMO, is not 'real world' enough. I also checked another library combparser, also have similar problem. The above code somehow works, but need more polishing to fit your case. If I were you, I would roll out yet-another-parser-combinator library. I am not persuading you to go this path. It is simply because I know how to do it ideally (ideal in my mind). Anyway, I guess I should stop here now. Good luck to your projects.

Mirror of forum.nim-lang.org

9816 :: What would be the bast way to get parsing and validation of parameters (option:value)