nimforum mirror - Proposed table changes

Prestige (orginal) [2021-06-12T22:28:05+02:00] view original

I'd like to gather opinions on changes I want to propose for the tables module.

Please leave any thoughts or suggestions you may have.

getOrDefault

Example:

import tables

let table = {1: "one", 2: "two"}.toTable

proc createAString(s: string): string =
  echo "some expensive operation: " & s
  return s

# createAString doesn't need to be invoked here.
let someVal = table.getOrDefault(1, createAString("ONE!"))
echo someVal

# This works just fine using a template
template foo(a, b: untyped): untyped =
  if true: a else: b

# Only the first createAString is invoked.
echo foo(createAString("a"), createAString("b"))

table.getOrDefault(1, createAString("ONE!")) will ALWAYS invoke createAString, even if not needed. It would be useful to have this only execute when necessary.

get


proc `[]`[A, B](t: Table[A, B]; key: A): B

Retrieves the value at t[key].

If key is not in t, the KeyError exception is raised.

It would be useful to have a get proc that returned an Option instead of throwing a KeyError.

I'm aware I could check if the key exists with hasKey, but that requires two lookups in the table.

Stefan_Salewski (orginal) [2021-06-13T06:31:38+02:00] view original

It would be useful to have a get proc that returned an Option instead

Can you tell us how big the overhead of Options is, for code size and performance?

As Options are not a core element of the Nim language, I generally avoided their use in the last seven years. I used a special marker instead, as in

const: RRbNil = (nil, nil, false)


    popom = parents.getOrDefault(pom, RRbNil)
    if popom != RRbNil:

timothee (orginal) [2021-06-13T11:52:08+02:00] view original

see https://github.com/nim-lang/Nim/pull/18255 for 1st part

see https://github.com/nim-lang/Nim/pull/18253 for 2nd part, in which I'm proposing getPtr instead of Option (see rationale)

cblake (orginal) [2021-06-13T12:44:49+02:00] view original

Another way to address both in one addition is to have a new 2-body template (I called this editOrInit in adix/lptabz.nim) that just has a user-given body for both the found & missing branches with no exceptions. Example use might be inc.

On the one hand, users do have to know the do: notation for the 2nd body at call sites. OTOH, this is both general & more backward compatible than a proc -> template switch (where the symbol no longer has a run-time address). There could be a non-var overload for just the getOrFail case. { Besides exceptions, Option[T], and a more vanilla return code people might come up with some other failure protocol. }

Zoom (orginal) [2021-06-13T14:38:00+02:00] view original

It would be really nice to provide an analogue of the Entry type with its API:

https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html

It's extremely ergonomic. It would provide a bunch of very useful operations while avoiding double lookup or copying. Separating this would allow all the functions working on entries to be completely generic and reusable for other collections.

cblake (orginal) [2021-06-13T15:07:23+02:00] view original

There is an entry type in tables impl (KeyValuePair), but it is unexposed/unexported. What my template does is let the user manipulate a Nim ptr to their own V value type..and they presumably understand their own types. { We can hope, anyway! :-) } Maybe not the safest thing ever, but pretty close to the Rust mutable ref for a val. This API is also for the performance minded who probably trade-off safety for speed on occasion.

There may be a safer way still to not use a ptr s at all and just let the user access the value field as a symbolic (var-qualified) expression in either body clause. I suspect that would require a macro, but have not really tried to do it. This may be the best choice of all -- both safe & flexible - if someone wants to have a crack at it.

Sixte (orginal) [2021-06-13T23:11:05+02:00] view original

@Prestige

table.getOrDefault(1, createAString("ONE!")) will ALWAYS invoke createAString, even if not needed. It would be useful to have this only execute when necessary.

You want call-by-name. Alternatively, you could wrap it into a closure. But then, the callee has to know that it has to call the passed closure value.

Generally, you can't reproduce CBN with call-by-value languages. It is always a hack.

shirleyquirk (orginal) [2021-06-13T23:50:49+02:00] view original

you hit the nail on the head, but we do have (pseudo-) lazy-evaluation in templates, which work well enough that I wouldn't call them a hack.

xigoi (orginal) [2021-06-13T23:52:16+02:00] view original

We have templates for this. See sequtils.newSeqWith.

alexeypetrushin (orginal) [2021-06-14T02:27:23+02:00] view original

As Options are not a core element of the Nim language, I generally avoided their use

In my opinion, the nice and clean way to handle missing values is in TypeScript and Kotlin (and proabably other languages I don't know) with const v: string | null with total null-safety and autocasting support from the compiler.

Untill there's no such feature in Nim, Option looks like the second best thing to use.

const
  RRbNil = (nil, nil, false)

Looks ugly, same as let vInt: int, such naming is bad practice, especially in typed language.

getPtr

Simple things should be simple. For almost a year using Nim, I never used ptr. Nim is already complicated enough, I think pointers should be avoided and used for very special cases. Such trivial case as getting a possibly missing value from collection is not advanced use and shouldn't require usage of advanced concepts like ptr.

timothee (orginal) [2021-06-14T10:35:32+02:00] view original

Another way to address both is to have a new 2-body template (I called this editOrInit in adix/lptabz.nim)

there's already tables.withValue which I think is what you're describing but it's less flexible than the proposed getPtr from https://github.com/nim-lang/Nim/pull/18253, eg see https://github.com/nim-lang/Nim/pull/18253#issuecomment-860483037, or at least it's cumbersome to use with the same effect:

when true:
  # with withValue:
  let val2 = block:
    var tmp: ptr string
    t.withValue("foo1", val): tmp = val
    tmp
  echo val2[]
  
  # with getPtr:
  let val3 = t.getPtr("foo1")

(note that withValue also returns a ptr, and, although it's by default scoped inside the template body, it still has same caveats as getPtr if the template body involves table mutations)

cblake (orginal) [2021-06-14T10:48:53+02:00] view original

I said:

to have a new 2-body template

If you click on my use case then you might see I was proposing a less cumbersome to use template that satisfies the laziness ideas (although you may have the sketch of an idea to impl it on top of the existing withValue outside of the tables module).

Sixte (orginal) [2021-06-14T10:59:43+02:00] view original

Rust can do it because it can expose interior pointers

Rust can especially do it because of FnOnce() , a function with state. or_insert_with gets the value to be inserted as FnOnce() and therefore the computation can be delayed. This is the (CBN-related) alternative that I mentioned above.

The Rust Checker is not crucial here.

arnetheduck (orginal) [2021-06-25T08:51:23+02:00] view original

tables.withValue is a bit of an abomination - it requires a mutable table even for read-only cases _and it unnecessarily exposes a ptr with all its issues (lifetime etc) - if anything, it should be deprecated, not used as a rationale for introducing more unsafe constructs.

+1 on a view-type based solution - in general, we have many use cases for for exception-free get functions - one unfortunate thing about nim collections is that they're a bit all over the place in terms of panics and catchable errors, with [] sometimes doing one, and sometimes the other.

One interesting way to view Option itself is a collection of 0 or 1 items, a seq is a special case of Table[int, V] etc - from that follows that many of these collection items could share a lot of interface properties such as panic behaviour, iteration etc - it's similar to https://github.com/nim-lang/Nim/issues/16492 - there exists a logically sound way to join these types and operations into an internally consistent world, but unfortunately the various collection types in Nim were developed in isolation without regard for one another - when adding things to the existing ones, it would be good to first give a thought to what a consistent world would look like for more than just tables in isolation.

cblake (orginal) [2021-07-27T12:49:48+02:00] view original

Maybe it notation (from sortedByIt fame) is also an abomination (or maybe only in combination with do), but you can do a pointer free, view type free, classic Nim (template impl at lptabz.nim:587):

import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
  t.edOrInIt line:  # edit clause
    it.inc; echo "had ", line
  do: # init clause
    it = 1; echo "new ", line

import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
  t.edOrInIt line,
             edit = (it.inc; echo "had ", line),
             init = (it = 1; echo "new ", line)

Note that some callers/call contexts might prefer to see the "init" stage appear first textually (since it does happen first temporally) and:

t.edOrInIt line,
           init = (it = 1; echo "new ", line),
           edit = (it.inc; echo "had ", line)

works just fine with the named parameter call.

The name could probably use some work. All the "its" in edIT, inIT, and the usual IT are a bit confusing/tongue twisting. upsertIt might be best of all since "upsert" has become a common term for this kind of operation, though "upPutIt" or "upSetIt" or some such might track the Nim stdlib naming conventions better.

All the above compile & work fine. The first, block call form could be both more clear in intent (named) and flexible in style (re-orderable) if my idea from the do notation thread were adopted:

import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
  t.edOrInIt line, edit.do:
    it.inc; echo "had ", line
  init.do:
    it = 1; echo "new ", line

cblake (orginal) [2021-08-03T16:49:25+02:00] view original

After posting this, I thought of a maybe better name: withIt that tracks more a sortedByIt convention applied to withValue instead of mgetOrPut style naming. The name is updated in the linked-to adix/lptabz.nim example impl & this post is to shrink confusion.

Varriount (orginal) [2021-08-05T01:09:53+02:00] view original

Personally, I don't mind the mechanic behind it, however I do dislike the name it - I'd much rather use this (or expr, if expr wasn't already used as a deprecated type name).

cblake (orginal) [2021-08-05T08:16:41+02:00] view original

@Varriount - it's just a matter of what you want to be consistent with. mapIt, filterIt, and sortedByIt seem really popular and filterIt at least goes back to 2011. This may be a naming convention too old to change, realistically. withItem/item (choosing consistency with iterator convention) is another idea you may prefer, though.

Nim is all about choice, though, especially when there is not an obvious best one. So, you could write a simple template withItem or withVal wrapper that renamed things. (I'm sure that's already occurred to you, but it may help other readers.) I'm also amenable to rename the thing in adix which isn't even yet impled outside of adix/lptabz.

Similarly, order of parameter passing edit/init vs init/edit is about choice of consistency - temporal vs "OrDefault" "fall back" style. It seems to me, there isn't really just one "correct answer". Maybe you've written/read 20,000 lines of OrDefault..maybe circumstances make inits more involved. Etc. named-do solves this problem simply for both template-implementers and client code. Nim is about choice generally, but especially so when "logically dominant choices" are elusive.

juancarlospaco (orginal) [2021-08-05T09:37:26+02:00] view original

{.this: self.} is a Pragma.

Mirror of forum.nim-lang.org

8110 :: Proposed table changes - get and getOrDefault

getOrDefault

get