I'd like to gather opinions on changes I want to propose for the tables module.
Please leave any thoughts or suggestions you may have.
Example:
https://play.nim-lang.org/#ix=3pJw
import tables
let table = {1: "one", 2: "two"}.toTable
proc createAString(s: string): string =
echo "some expensive operation: " & s
return s
# createAString doesn't need to be invoked here.
let someVal = table.getOrDefault(1, createAString("ONE!"))
echo someVal
# This works just fine using a template
template foo(a, b: untyped): untyped =
if true: a else: b
# Only the first createAString is invoked.
echo foo(createAString("a"), createAString("b"))
table.getOrDefault(1, createAString("ONE!")) will ALWAYS invoke createAString, even if not needed. It would be useful to have this only execute when necessary.
proc `[]`[A, B](t: Table[A, B]; key: A): B
Retrieves the value at t[key].
If key is not in t, the KeyError exception is raised.
It would be useful to have a get proc that returned an Option instead of throwing a KeyError.
I'm aware I could check if the key exists with hasKey, but that requires two lookups in the table.
It would be useful to have a get proc that returned an Option instead
Can you tell us how big the overhead of Options is, for code size and performance?
As Options are not a core element of the Nim language, I generally avoided their use in the last seven years. I used a special marker instead, as in
popom = parents.getOrDefault(pom, RRbNil)
if popom != RRbNil:
Another way to address both in one addition is to have a new 2-body template (I called this editOrInit in adix/lptabz.nim) that just has a user-given body for both the found & missing branches with no exceptions. Example use might be inc.
On the one hand, users do have to know the do: notation for the 2nd body at call sites. OTOH, this is both general & more backward compatible than a proc -> template switch (where the symbol no longer has a run-time address). There could be a non-var overload for just the getOrFail case. { Besides exceptions, Option[T], and a more vanilla return code people might come up with some other failure protocol. }
It would be really nice to provide an analogue of the Entry type with its API:
https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html
It's extremely ergonomic. It would provide a bunch of very useful operations while avoiding double lookup or copying. Separating this would allow all the functions working on entries to be completely generic and reusable for other collections.
There is an entry type in tables impl (KeyValuePair), but it is unexposed/unexported. What my template does is let the user manipulate a Nim ptr to their own V value type..and they presumably understand their own types. { We can hope, anyway! :-) } Maybe not the safest thing ever, but pretty close to the Rust mutable ref for a val. This API is also for the performance minded who probably trade-off safety for speed on occasion.
There may be a safer way still to not use a ptr s at all and just let the user access the value field as a symbolic (var-qualified) expression in either body clause. I suspect that would require a macro, but have not really tried to do it. This may be the best choice of all -- both safe & flexible - if someone wants to have a crack at it.
@Prestige
table.getOrDefault(1, createAString("ONE!")) will ALWAYS invoke createAString, even if not needed. It would be useful to have this only execute when necessary.
You want call-by-name. Alternatively, you could wrap it into a closure. But then, the callee has to know that it has to call the passed closure value.
Generally, you can't reproduce CBN with call-by-value languages. It is always a hack.
As Options are not a core element of the Nim language, I generally avoided their use
In my opinion, the nice and clean way to handle missing values is in TypeScript and Kotlin (and proabably other languages I don't know) with const v: string | null with total null-safety and autocasting support from the compiler.
Untill there's no such feature in Nim, Option looks like the second best thing to use.
const
RRbNil = (nil, nil, false)
Looks ugly, same as let vInt: int, such naming is bad practice, especially in typed language.
getPtr
Simple things should be simple. For almost a year using Nim, I never used ptr. Nim is already complicated enough, I think pointers should be avoided and used for very special cases. Such trivial case as getting a possibly missing value from collection is not advanced use and shouldn't require usage of advanced concepts like ptr.
Another way to address both is to have a new 2-body template (I called this editOrInit in adix/lptabz.nim)
there's already tables.withValue which I think is what you're describing but it's less flexible than the proposed getPtr from https://github.com/nim-lang/Nim/pull/18253, eg see https://github.com/nim-lang/Nim/pull/18253#issuecomment-860483037, or at least it's cumbersome to use with the same effect:
when true:
# with withValue:
let val2 = block:
var tmp: ptr string
t.withValue("foo1", val): tmp = val
tmp
echo val2[]
# with getPtr:
let val3 = t.getPtr("foo1")
(note that withValue also returns a ptr, and, although it's by default scoped inside the template body, it still has same caveats as getPtr if the template body involves table mutations)
I said:
to have a new 2-body template
If you click on my use case then you might see I was proposing a less cumbersome to use template that satisfies the laziness ideas (although you may have the sketch of an idea to impl it on top of the existing withValue outside of the tables module).
Rust can do it because it can expose interior pointers
Rust can especially do it because of FnOnce() , a function with state. or_insert_with gets the value to be inserted as FnOnce() and therefore the computation can be delayed. This is the (CBN-related) alternative that I mentioned above.
The Rust Checker is not crucial here.
tables.withValue is a bit of an abomination - it requires a mutable table even for read-only cases _and it unnecessarily exposes a ptr with all its issues (lifetime etc) - if anything, it should be deprecated, not used as a rationale for introducing more unsafe constructs.
+1 on a view-type based solution - in general, we have many use cases for for exception-free get functions - one unfortunate thing about nim collections is that they're a bit all over the place in terms of panics and catchable errors, with [] sometimes doing one, and sometimes the other.
One interesting way to view Option itself is a collection of 0 or 1 items, a seq is a special case of Table[int, V] etc - from that follows that many of these collection items could share a lot of interface properties such as panic behaviour, iteration etc - it's similar to https://github.com/nim-lang/Nim/issues/16492 - there exists a logically sound way to join these types and operations into an internally consistent world, but unfortunately the various collection types in Nim were developed in isolation without regard for one another - when adding things to the existing ones, it would be good to first give a thought to what a consistent world would look like for more than just tables in isolation.
Maybe it notation (from sortedByIt fame) is also an abomination (or maybe only in combination with do), but you can do a pointer free, view type free, classic Nim (template impl at lptabz.nim:587):
import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
t.edOrInIt line: # edit clause
it.inc; echo "had ", line
do: # init clause
it = 1; echo "new ", line
or
import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
t.edOrInIt line,
edit = (it.inc; echo "had ", line),
init = (it = 1; echo "new ", line)
Note that some callers/call contexts might prefer to see the "init" stage appear first textually (since it does happen first temporally) and:
t.edOrInIt line,
init = (it = 1; echo "new ", line),
edit = (it.inc; echo "had ", line)
works just fine with the named parameter call.
The name could probably use some work. All the "its" in edIT, inIT, and the usual IT are a bit confusing/tongue twisting. upsertIt might be best of all since "upsert" has become a common term for this kind of operation, though "upPutIt" or "upSetIt" or some such might track the Nim stdlib naming conventions better.
All the above compile & work fine. The first, block call form could be both more clear in intent (named) and flexible in style (re-orderable) if my idea from the do notation thread were adopted:
import adix/lptabz
var t = initLPTab[string, int]()
for line in stdin.lines:
t.edOrInIt line, edit.do:
it.inc; echo "had ", line
init.do:
it = 1; echo "new ", line
@Varriount - it's just a matter of what you want to be consistent with. mapIt, filterIt, and sortedByIt seem really popular and filterIt at least goes back to 2011. This may be a naming convention too old to change, realistically. withItem/item (choosing consistency with iterator convention) is another idea you may prefer, though.
Nim is all about choice, though, especially when there is not an obvious best one. So, you could write a simple template withItem or withVal wrapper that renamed things. (I'm sure that's already occurred to you, but it may help other readers.) I'm also amenable to rename the thing in adix which isn't even yet impled outside of adix/lptabz.
Similarly, order of parameter passing edit/init vs init/edit is about choice of consistency - temporal vs "OrDefault" "fall back" style. It seems to me, there isn't really just one "correct answer". Maybe you've written/read 20,000 lines of OrDefault..maybe circumstances make inits more involved. Etc. named-do solves this problem simply for both template-implementers and client code. Nim is about choice generally, but especially so when "logically dominant choices" are elusive.