nimforum mirror - std/tables [] vs. add()

domogled (orginal) [2020-11-09T12:14:26+01:00] view original

Hello,

function add from std/tables package is deprecated since nim version 1.4.

Add and [] are not identicaly.

Warning: Deprecated since v1.4; it was more confusing than useful, use `[]=`; add is deprecated [Deprecated]

var dict: Table[string, string] = initTable[string, string]()


dict["A"] = "a0"
dict["A"] = "a1"

dict.add("B", "b0")
dict.add("B", "b1")

echo $dict

add support duplicate keys, operator [] dont it. Will be provided compatibility, or duplicate keys are not correct supported behaviour?

thanks

Petr

PMunch (orginal) [2020-11-09T12:29:07+01:00] view original

Well, the problem was that people didn't realise that add added another key. There's also the issue that [] for a table only returns a single value, and it's not guaranteed which it will be, and there is no way apart from iterating all the keys to get all values for a key.

I'm not sure if any compatibility support is planned, as said this is a feature that was very rarely used.

cblake (orginal) [2020-11-09T12:36:49+01:00] view original

I retain the duplicate key/multitable/multiset feature in my lptabz in the adix package as mentioned at the end of this thread and also this one. That's the only compatibility support of which I am aware.

DeletedUser (orginal) [2020-11-09T14:39:53+01:00] view original

If you don't need indexing and just need to iterate over a collection of keys to values, then you can use a seq[(string, string)].

cblake (orginal) [2020-11-09T14:58:27+01:00] view original

Also, the {} literal syntax does not restrict to non-dups. { 1:1, 1:2, 1:3 } will probably always be equivalent to @[ (1,1), (1,2), (1,3) ].

To my knowledge there are no plans to make {} de-dup and I, for one, hope it stays that way. Such {} sugar is syntactic while de-dup is semantic. Of course, {...}.toTable does de-dup, but {...}.toSomethingElse need not. I went with {...}.toLPTabz(dups=true) with LPTabz, for example.

I actually think it is more consistent & informative to use {} and {}= for associative lookup and [] and []= for positional lookup (but maybe as a backward compat thing have either work when there is only one kind of index). The kind of seq @Hlaaftana mentions is a good example where both positional and keyed access make sense. Others disagree with me, though, though, more or less on the grounds of it being "too different from other prog.langs/their personal experience/personal mental model" aka "not invented by Leibniz". I think "just one indexing syntax" is an unfortunate artifact of 1980s era C++/Python languages having "rigid syntax/flexible semantics". Ah well.

Araq (orginal) [2020-11-09T17:27:45+01:00] view original

Don't worry. :-) The {:} syntax is here to stay and won't change, hardly read any complaints about it either.

cumulonimbus (orginal) [2020-11-09T17:36:51+01:00] view original

I think "just one indexing syntax" is an unfortunate artifact of 1980s era C++/Python languages having "rigid syntax/flexible semantics".

Alternatively, it is possibly an artifact of Perl having array, hash and scalar context for variables, which -- after a while -- was mostly natural, but was very confusing. I'm not saying it can't be done properly, but I vaguely remember the Perl contexts being mentioned as a "horror story" on the Python development usenet in the early '90s.

cblake (orginal) [2020-11-09T18:38:15+01:00] view original

Cool, @Araq. :-)

@cumulonimbus. That had occurred to me, too, I may have discussed it before, and you may be right, but Python had OpOverloading in like 1989 before Perl a couple years earlier had even gotten popular enough to have many haters or be a real "entry point" from Python (or anything, really). I suspect as a matter of history that Python "mostly" copied "operator overloading" ideas of C++ that were in that late 80s zeitgeist.

Anyway, all I mentioned works fine in Nim already (and I do it in my adix library and the stdlib json does some). It's just not the standard convention/notation. Dual indexing, like dup key capability, is common, though. seq as in your example, tables with any order, trees, etc. In the simplest cases, a positional access path is also faster than associative access path, [i] near in RAM to [i+1] often holds, initializers like [] & {:} match, and arguments are also typed differently..position always being some kind of ordinal while keys are usually more general. Anyway, I think there are at least 3 or 4 reasons that {} for association and [] for position would be nicer in Nim, but I guess I'm almost a lone voice in the wilderness on this one. @Araq is the only one who has ever backed me up on it while I think all others have pushed back. Ah well!

From talking about this exact topic with over 500 developers over several decades, I think people are largely unaware and "inverse spoiled" by personal histories of not having "at hand" libraries of efficient dual indexing data structures..They'll just have (or learn about) an only keyed BST or just unordered tables. Only your (poorly scaling) seq thing will be "easy"/have a lot of example code, be retained from school, etc.

Most people honestly seem just as unaware that many search trees can be positional, keyed, or both as they are unaware that you can put duplicate keys in hash tables (with search performance degradation proportional to degeneracy). They might be more unaware, on average. I think it is this lack of awareness/availability of APIs in stdlibs that generates this lack of notation.

All that said, find, get, set, put are also short words that can take the place of operators. Separating "find" from "edits" also has advantages and the indexing notation blocks that, at least in its a[i]=val form. So, there are also more "small API surface" arguments for not using the notation.

disruptek (orginal) [2020-11-09T18:44:40+01:00] view original

Hey, I used add all the time and it's super annoying that I have to adopt a third-party table implementation just to keep the semantics.

It's documented -- that's how I learned of it -- and I was just reading a thread about HttpHeaders from like 10 years ago where @dom96 was told in like 3 different comments that duplicates are possible. Yet he complained recently that it was a surprise to him.

C'mon, RTFM just once. This is a nice feature you killed out of ignorance and FUD.

disruptek (orginal) [2020-11-09T18:47:24+01:00] view original

Also consider some code I wrote in the compiler recently:

if p.file notin c.filenames:
  `[]=` c.filenames, p.file:  # too bad add() was deprecated, huh?
    var itIsKnown: bool
    fileInfoIdx(ir.sh.config, AbsoluteFile ir.sh.strings[p.file], itIsKnown)

This is uglier than it needed to be because people refused to read the manual.

cblake (orginal) [2020-11-09T22:58:26+01:00] view original

That RFC design is just tuned/optimized for another point in the design space somewhere between seq values and the current deprecated approach. It's not exactly worse, but it is also not exactly "better" or "preferable" either (except perhaps super-duper contingently upon circumstance).

In particular, it wastes a lot of space with that int serial number/counter. So it is not "as efficient" (or "as fast" if the extra space causes different cache behavior). At a bare minimum, the serial/count number in that approach should be a generic type so that one could have as small as a 1-byte counter/serial number instead of 8-bytes, although object/struct padding could still cause waste. Also, the wrapper is so simple and the want (evidently!!) so rare ;-), that it could be argued it need not even be in the stdlib at all. Much as people can just do Table[A,seq[B]] they could do the RFC wrapper or publish a nimble package for eventual inclusion in fusion if it becomes needed enough.

But it is also not so hard to flesh out the API with delIndex or replaceIndex the way duplicates are deprecatedly-done now making, using, as now, no extra memory and extra time only related to collision cluster size (meaning no extra time for non-dup-key users). It's just a couple/few missing proc s and leveraging maybe some separation of search & mutation.

Anyway, it's always a judgement call what to include if people don't read docs, consequently don't understand APIs, and then misuse them. One person's convenience is often another's footgun.

mildred (orginal) [2020-11-22T00:01:29+01:00] view original

When replacing Table[string,string] with Table[string,seq[string]] how do you efficiently add an item to the table?

I came up with:

result[key] = concat(result.getOrDefault(key), @[value])

But it requires a lookup, a concat and a store. Is there a better way? I tried result[key].add(value) but it obviously does not work

inventormatt (orginal) [2020-11-22T00:14:26+01:00] view original

You can try something like this https://play.nim-lang.org/#ix=2F0R which should give you the functionality you are looking for.

cumulonimbus (orginal) [2020-11-22T11:00:44+01:00] view original

The Python way is collections.defaultdict, which you provide a default value when creating the table (would be @[] in this case), and then you can just do table[x].add y.

It's convenient to have in the stdlib, but I don't think it's worth considering outside of a bigger tables revision (e.g. adding MultiTable as discussed).

Araq (orginal) [2020-11-22T12:28:21+01:00] view original

Use mgetOrPut (the name is a bit weird):

import tables

proc add*(t: var Table[string, seq[string]]; key, value: string) =
  t.mgetOrPut(key, @[]).add value

cblake (orginal) [2020-11-22T13:16:03+01:00] view original

To explain the name weirdness a bit, it was meant to be similar to/consistent with "containsOrIncl", "hasKeyOrPut"-type language in other places. Sometimes explanations help people remember. :-)

mildred (orginal) [2020-11-22T14:31:52+01:00] view original

I didn't think mgetOrPut().add() would work, I was under the false impression that it was necessary to assign back the seq in the table after the add operation. I probably programmed too much in Go lately.

Thank you, it does the job perfectly.

cblake (orginal) [2020-11-22T14:46:49+01:00] view original

adix/lptabz also provides a two clause template editOrInit in case the value type of the table is giant or one prefers that "if-else" branch kind of visual structure.

Note, though, that in both cases (stdlib & adix), key,value pairs are packed into one big linear memory area, not indirected through pointers. This makes giant value types risky in terms of CPU cache friendliness/performance. string is indirect, though. So, this does not matter for your specific use case. The k,v is just 8*2 = 16 Bytes.

cblake (orginal) [2020-11-22T15:17:19+01:00] view original

Stdlib criticisms come up a lot, but it bears mention, re-mention, and re-re-mention that Nim programs need not be as dependent upon its stdlib as many other languages by design. Go has no generics (yet?) except weird special case things and Python needs C extensions to be at all performant which are a different skill set (though Cython makes it much easier).

Nim is more like C++ where performance/feature-sensitive people can/often do just replace/avoid the C++STL. The stdlib is just one pool of implementations..granted likely the first one new Nim programmers see.

With data structures, there is never a perfect one for all possible use cases..Use cases must be restricted first and there are many ways that can happen. Restrictions that specialize then afford optimization opportunity.

So, if an "early lesson" in Nim is that maybe you have to hunt down a package for a DS or roll your own, that is not actually such a terrible outcome. Heck, in Ancient Times, you could just "cp stdlib/tables.nim myproj/" and start hacking away.

Mirror of forum.nim-lang.org

7055 :: std/tables [] vs. add()