nimforum mirror - Concatenation of seq and array?

Stefan_Salewski (orginal) [2021-04-20T20:54:02+02:00] view original

In some code ported from Ruby I have


grep "& @" ~/Router/router.nim
      lcuts = v.vertex.neighbors - (lcuts & @[u.vertex, w.vertex])

First I wrote lcuts & [u.vertex, w.vertex] but without the to seq operator it does not compile. I wondered why concatenation of an seq and an array is not supported. Of course typing the single @ in from of the array is not too much work, but the question is if that is optimized or if a whole seq is allocated. So I did a test:

proc main =
  
  var s: seq[int]
  for i in 0 .. 1e5.int:
     #s.add(i)
    s = s & @[i]
  
  echo s[3] + s[^1]

main()

While the program with add() takes milliseconds, it takes a few seconds with the seq. I would say that this is a serious performance trap not only for beginners. So maybe we should add concatenation of seq and array for std lib? Tested with option -d:danger of course.

shirleyquirk (orginal) [2021-04-20T21:43:21+02:00] view original

It might be easy to dismiss this as a straw man, with the spurious creation of @[i] every loop but there's a more realistic gotcha here.

between

for i in 1..1e5:
  result &= i

and

for i in 1e5:
  result = result & i

The first appends to result, and is fast, the second performs many allocations and is slow.

I suppose the second could be auto optimized into the first, maybe even with a term rewriting macro but I smell edge cases here I don't know

Araq (orginal) [2021-04-20T22:20:37+02:00] view original

I would say that this is a serious performance trap not only for beginners.

The following programming languages have "performance traps" where you can't just ignore basic algorithmetic complexity analysis results from computer science:

Rust

Ruby

Delphi

Java

Ruby

Python

Julia

Erlang

Haskell

Zoom (orginal) [2021-04-21T02:10:11+02:00] view original

Again, this is an issue of sticking to the concept* of concatenation while the types you have at hand aren't a good fit for it due to ambiguity (contrary to a linked list, for example).

Either you have a growable sequence or an immutable one, you need to iterate the items you want to grow it with or the items of both containers respectfully to make a new one. You can have some sugar over it, but with sequences it all boils down to iteration (or moving chunks of memory), it's the nature of the type.

What you want in your test example is some form of extend which grows a mutable container (your var seq) with the elements of an iterator (can we have this today using some iterator concept?), or extendFromArray, as a specialization of the same operation.

*in a common sense

PMunch (orginal) [2021-04-21T11:05:16+02:00] view original

I seem to remember you yourself saying that Nim shouldn't copy bad design just because everyone else does it that way. Obviously all languages have some performance traps, but pointing that out as a response to a concrete example just seems a bit childish.

This might be hard to fix though, as the sequence is generated just to be immediately destroyed again, and the & operator returns a new sequence without modifying the original. Of course Nim could potentially see that the original is about to get destroyed and then do some trickery.

Araq (orginal) [2021-04-21T11:10:05+02:00] view original

Foundations of math are not "bad design". What is childish is to write code like s = s & 'x' (i.e. not knowing about .add or &=) after years of programming, in Nim and other languages.

Zoom (orginal) [2021-04-21T11:35:37+02:00] view original

Perhaps the compiler could track situations like


var x
x = x & y

where T of y is T of x[T] or y is a view of a container [T] and replace the second line with appropriate x.add(y) or x.extend(y), but do we really want this complex behaviour which encourages ambiguous expressions in the code? I don't think so.

PMunch (orginal) [2021-04-21T11:36:29+02:00] view original

Of course writing s = s & 'x' is a silly mistake, but if you look at his original code snippet lcuts = v.vertex.neighbors - (lcuts & @[u.vertex, w.vertex]) it's not as obvious. And expanding that to more efficient code

lcuts.add u.vertex
lcuts.add w.vertex
lcuts = v.vertex.neighbors - lcuts

suddenly becomes a question of performance vs. readability.

Zoom (orginal) [2021-04-21T11:45:10+02:00] view original

For readability we need extend which would work with iterators and expand lcuts.extend([u.vertex, w.vertex]) to a loop of


for x in [u.vertex, w.vertex]:
  lcuts.add(x)

which any decent compiler should unroll and optimize the array away.

Stefan_Salewski (orginal) [2021-04-21T11:54:35+02:00] view original

but if you look at his original code

Thanks for your moral support :-)

With replies like from Miram and Araq sometimes one may be tempted to stop writing in this forum, maybe subscribing to stackoverflow instead.

Araq (orginal) [2021-04-22T08:17:05+02:00] view original

It's ok, I'm sorry too, it's just me having had a bad day and misunderstanding everybody.

holgerschurig (orginal) [2021-04-22T18:00:56+02:00] view original

It is 100% true that no compiler can prevent many (most?) performance mistakes

For C++ there is a tool called "clazy" that build upon LLVM and does AST matching and can point out some easy performance optimizations.

I personally am happy that this is in some extra tool, not in the compiler itself. That clang is also usable as a library, not only as a command-line utility here is definitely an enabler.

haoliang (orginal) [2022-02-21T22:01:38+01:00] view original

although this thread is old, i still wanna add another use case: foldl

let
  numbers = @[0, 8, 1, 5]
  digits = foldl(numbers, a & (chr(b + ord('0'))), "")
assert digits == "0815"

DeletedUser (orginal) [2022-02-21T22:20:05+01:00] view original

Can't we just make the LHS of & a sink parameter?

sls1005 (orginal) [2022-02-22T12:26:30+01:00] view original

proc `&`[T](s: sink seq[T], a: openArray[T]): seq[T] =
  for t in a:
    s.add t
  return s

DeletedUser (orginal) [2022-02-22T12:48:46+01:00] view original

I don't get why it would be inefficient. What is the alternative to just using add?

sls1005 (orginal) [2022-02-22T13:12:13+01:00] view original

If the length of the array is many times the length of the seq, it has to reallocate again and again. In such a case, newSeqOfCap(len(s) + len(a)) will be more efficient.

DeletedUser (orginal) [2022-02-22T14:07:26+01:00] view original

Nim grows the cap exponentially AFAIK. And you can just do s.add(a) which would only reallocate once, the same thing as creating a new seq, given that it requires a regrow.

sls1005 (orginal) [2022-02-22T14:16:32+01:00] view original

You're right. The sink version should work.

Mirror of forum.nim-lang.org

7832 :: Concatenation of seq and array?