nimforum mirror - Move semantic and manuel memory management

BigEpsilon (orginal) [2017-08-18T17:26:44+02:00] view original

Hi,

I started looking into the nim language recently after hearing about it on a forum. I find it to be a fun language because of its syntax and meta programming capabilities. In order to test the language, I implemented what is called an isolation forest ( http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html ) during which I encountered some difficulties and based on that I have some questions:

1- How to return a seq from a function without doing an unnecessary copy:

proc foo(): seq[int] =
    var arr = @[1, 2, 3]
    echo cast[int](arr[1].addr)
    arr

var arr = foo()
echo cast[int](arr[1].addr)

The two echo give different addresses.

2- Related to the first question, is there a plan to add a move operator into the language ? I think such operator is necessary since copying a seq is a deep copy by default, which may harm execution speed significantly is some cases. For examples, while building the trees of my isolation forest, a have to push some arrays that represent random indices in a double queue. After pushing them they are no more needed. I didn't find a way to move them instead of making a copy.

The same problem occurs when building an array of array (but in this case it can be done with an ugly shallowCopy call).

3- What is the state of using nim without the GC ? I know the GC in nim is thread local and can be tweaked, but I work in the domains of Computer Vision where every millisecond counts. If the GC have any impact on the performances then I need to avoid it. Is there a way for example to know where the GC is used (as for the D language) and see what modules can be used whithout it ? are there any plans to offer more tools for manual management like smart pointers ? are the destructor now stable and usable ? and does any body here have experience using nim whitout the gc ?

Excuse me for my bad english as I'm not an native speaker and thanks in advance

Parashurama (orginal) [2017-08-19T16:32:41+02:00] view original

You can simulate move semantic by using shallowCopy.

proc worker(): seq[int] =
    var arr = @[1, 2, 3]
    # do some heavy work here
    
    # deep copy is made here
    return arr
    # or
    shallowCopy(result, arr) # move arr to result without copying
    return # not necessary at end of function (implicit)
    
    # you can also use result var from the beginning avoid copies.

if you don't like shallowCopy syntax you can use a template to emulate shallow assignment semantics.

template `:=` (x: var seq[T]; v: seq[T]) =
    shallowCopy(x, v)

# used like this
var s0 = @[1,2,3]
let s1 = s0 # no copy is made since s1 cannot be modified
var s2 = s0 # a copy is made here (unless s0 was marked as shallow)
var s3: seq[int]
s3 := s0 # custom template use shallowCopy

there is also the swap proc which swap two variables without any deep copy.

You can laso mess with assignment semantics for types

https://nim-lang.org/docs/manual.html#type-bound-operations-operator

mashingan (orginal) [2017-08-19T16:35:32+02:00] view original

@BigEpsilon

CMIIW, as I don't really know the internal Nim, I'll try to answer to extent of what I know:

seq already ref type, and content would depend on type kind. In your example you used literal values, and maybe if you used some ref types, it would pass the memory instead of copying it. Also, as @Parashurama mentioned, using var result from beginning won't copy the seq

proc foo(): seq[int] =
  result = newSeq[int]()
  result.add 1
  result.add 2
  result.add 3
  echo cast[int](result[0].addr)

var arr = foo()
echo cast[int](arr[0].addr)

I read somewhere with memory-region implemented, it would possible to have move and lent semantic. Using ptr array/seq certainly won't copy the seq.

Using Nim without GC is as same as using C with nice syntax. You'll have to manage allocated and deallocated memory manually. Look for related proc s in system module. This GC manual explains the minutes of GC usage and how to tweak it.

BigEpsilon (orginal) [2017-08-19T20:24:53+02:00] view original

Hi, Thanks both for your responses.

I didn't know that I could treat result as a normal variable. The template := will also be very helpful for me.

The porblem remains when using containers like double queue where the implementation will make copies. For this reason, I tried to play with the assignement operator, however, when implementing the operator, I obtain a runtime error:

ype
    container[T] = object
        val: seq[T]

proc `=`[T](d: var container[T], s: container[T]) =
    echo "= container called"
    echo s
    echo d
    shallowCopy(d.val, s.val)

var c = container[int](val: @[1, 2, 3])

which outputs:

= container called

(val: @[1, 2, 3])

(val: nil)

Traceback (most recent call last)

main2.nim(11) main2

main2.nim(9) =

gc.nim(287) unsureAsgnRef

gc.nim(196) incRef

SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Any idea how can I solve this error ?

Parashurama (orginal) [2017-08-19T20:50:16+02:00] view original

I can reproduce the issue with latest devel. This a likely a bug in the default GC. (refc)

you can use --gc:markAndSweep to try an alternative garbage collector.

see: https://nim-lang.org/docs/nimc.html#compiler-usage-command-line-switches for a list of potential arguments for the compiler

Jehan (orginal) [2017-08-19T22:08:40+02:00] view original

This is the result of copying from arr to result, not the actual return (return x does an implicit result = x). The following code avoids it:

proc foo(): seq[int] =
  result = @[1, 2, 3]
  echo cast[int](result[0].addr)

var arr = foo()
echo cast[int](arr[0].addr)

You can also avoid copying by using let instead of var (the copying is done to avoid aliasing). Passing a value to a procedure will also not copy it.

There is a shallowCopy that avoids the copying. Note that shallowCopy is unsafe when the target is a global variable or managed heap location and the source is a constant. You can avoid the unsafety via using a version where the right-hand side must be mutable, e.g.:

proc `<-`[T](lhs: var T, rhs: var T) {.noSideEffect, magic: "ShallowCopy".}

If throughput is your concern, then manual memory management won't help you much per se. The primary cost of the the GC in Nim is the allocation/deallocation overhead (plus the write barrier, but you incur that only if you write a reference to a heap location or global variable), and you incur that in C/C++ also, unless you use custom allocation schemes. I know that Araq is also working on a region-based collector, which might alleviate the overhead (enabled via --gc:stack, not sure how mature it is). Note that RAII in particular may not help you much; reference counting as in std::shared_ptr has overhead than Nim's GC, and something like std::unique_ptr is either not memory-safe or incurs significant overhead (C++ chose the memory-unsafe option). In order to have memory-safe move semantics without overhead, you need linear or affine types, which would be a significant increase in language complexity.

BigEpsilon (orginal) [2017-08-20T12:26:29+02:00] view original

Thank you all for your responses, It is pleasant to see such helpful community around the language !

@Parashurama: indeed the program does not crash with the markAndSweep GC. I posted an issue on github.

@Jehan: Thank for your comment on the usability of the GC. As I said in my first message, I work on computer vision (mainly on Android and iOs platforms). My dream is to be able to use something different from C++ for that matter. I know the GC can be used in such domain because some of the most know decoders (barcodes/Qr-codes ...etc) on Andoid are coded in Java (but they are definitively slower than some c++ alternatives (like our in-house decoder)). Once nim stablilize, I think it could be a niche where nim succeeds provided we can make good wrappers for the most known cv libraries like opencv.

(By the way I started looking at a way to port the Opencv wrapper generator for python to nim. Using nim for opencv could break the need to port to c++ after prototyping on python and make the experience far more enjoyable).

BigEpsilon (orginal) [2017-08-20T20:11:07+02:00] view original

Just an update on the question of pushing seqs in a double queue without copy.

I tried this:

import deques

proc `<-`[T](lhs: var T, rhs: var T) {.noSideEffect, magic: "ShallowCopy".}

type
    container[T] = object
        val: seq[T]

proc `=`[T](d: var container[T], s: container[T]) =
    echo "copy"
    echo s.val
    d.val <- s.val

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[1].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[1].addr)

But it does not work. I looked at the implementation of deque but I dont see from where comes the problem.

However I found an easier solution:

import deques

type
    container[T] = ref object
        val: seq[T]

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[1].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[1].addr)

Which works as I want.

Jehan (orginal) [2017-08-20T20:22:29+02:00] view original

Objects (without ref)are value types and those semantics also attach to their components. If you want to have shallow copying for objects, the easiest way is to use the {.shallow.} pragma.

Example:

import deques

type
    container[T] = object {.shallow.}
        val: seq[T]

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[0].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[0].addr)

Note also that a[1] accesses the second element of a seq. If you want the first element, use a[0].

mashingan (orginal) [2017-08-21T01:49:01+02:00] view original

CMIIW, object is for value type while ref object is for reference type.

If the object would be used for many occasions and its construction quite costly, it's better to use reference type then.

BigEpsilon (orginal) [2017-08-27T15:33:48+02:00] view original

Thanks all for your replies.

@Jehan: your responses are always very informative, thank you ! it is now clear for me that the {.shallow.} pragma, shallow and shallowCopy calls can do the job of move (in most cases), and that move is more related to how c++ works.

Just as a side note, I didn't succeed at making my nim code as fast as my rust version, so I just replaced seqs by warped c++ vectors (with some moves in that case :) ) in the hot spots and now the nim version is slightly faster than the rust version.

mratsim (orginal) [2017-09-08T10:05:03+02:00] view original

As always the Nim Manual is a treasure trove. I just discovered this move optimization

With the {call} constraint you can have 2 procs, one if there are multiple references and one if the compiler knows the reference is unique !

proc `[]=`*(t: var Table, key: string, val: string) =
  ## puts a (key, value)-pair into `t`. The semantics of string require
  ## a copy here:
  let idx = findInsertionPosition(key)
  t[idx].key = key
  t[idx].val = val

proc `[]=`*(t: var Table, key: string{call}, val: string{call}) =
  ## puts a (key, value)-pair into `t`. Optimized version that knows that
  ## the strings are unique and thus don't need to be copied:
  let idx = findInsertionPosition(key)
  shallowCopy t[idx].key, key
  shallowCopy t[idx].val, val

var t: Table
# overloading resolution ensures that the optimized []= is called here:
t[f()] = g()

Udiknedormin (orginal) [2017-09-08T13:00:29+02:00] view original

Nim's "implicit const reference" is a bit similar to C++'s "implicit return move", I guess. While Nim's one is nice (I miss you, Ada...), it can be a bit misleading as there seems to be no natural way to get mutable argument with no var (i.e. move it, not mut-ref a variable external to the routine).

I consider Nim's "moving" at least a bit unreliable. Maybe my luck to find random compiler bugs is infinite but I've already encountered many segfaults thanks to shallowCopy not working properly. :( I guess it isn't that surprising, I use Nim for all kinds of weird things (with lots of meta-magic).

By the way: nice to see a fellow Rustacean here. :)

Araq (orginal) [2017-09-08T14:51:04+02:00] view original

shallowCopy does work properly, it's just that you don't understand it. :P

Udiknedormin (orginal) [2017-09-08T21:49:58+02:00] view original

@Araq I'm sorry to hear you don't know your own language's only compiler's problems as well as a person who doesn't even use this language as her primary one. ^^" Or maybe it's the manual which can't describe all the tiny details (which would be a bit surprising as it even describes things that did not make it into compiler yet).

By the way: I don't think being harsh for people will bring you larger community. ^^" Especially posts suggesting everything's trivial (depends on a person AND problem, I'd say), one-liner (that's obviously false) and any weakness of the language is weakness of one's mind (yeah, still remembering out little chat about slicing :-P).

mashingan (orginal) [2017-09-12T06:04:46+02:00] view original

Hmm, shallowCopy is not move right?

IMO, using ptr would be better for fine-grained manual memory management.

Varriount (orginal) [2017-09-12T19:37:55+02:00] view original

Copying semantics in Nim are fairly straightforward:

Object types, sequences, and strings copy on assignment

References do not copy on assignment

shallowCopy and shallow sidestep the fact that strings and sequences copy on assignment. If you use these procedures, any affected variables (which in the case of shallowCopy means both arguments) must not be modified.

Udiknedormin: shallow and shallowCopy should not be considered equivalent to C++ move operations. If they are used as such, data corruption will result.

Araq (orginal) [2017-09-12T20:27:15+02:00] view original

@Araq I'm sorry to hear you don't know your own language's only compiler's problems as well as a person who doesn't even use this language as her primary one. ^^" Or maybe it's the manual which can't describe all the tiny details (which would be a bit surprising as it even describes things that did not make it into compiler yet).
By the way: I don't think being harsh for people will bring you larger community. ^^" Especially posts suggesting everything's trivial (depends on a person AND problem, I'd say), one-liner (that's obviously false) and any weakness of the language is weakness of one's mind (yeah, still remembering out little chat about slicing ).

You haven't reported any bugs. (No, vage ramblings on this forum do not count as bug reports!)

Your reply is much more abusive than mine. In fact, I might delete your impertinences later.

You already know everything already ('"import foo" is always a design mistake'), making discussions with you futile.

Udiknedormin (orginal) [2017-09-12T23:24:00+02:00] view original

@Varriount

Yes, it is not. It works decently when combined with {call}, as mentioned by mratsim, but for my use case it wasn't enough as I needed to move an existing variable, not a routine call result. I tried a few solutions but none was satisfying and some were not working at all. I didn't find any clue in the manual and as the language author himself only answered with an equivalent of "you're doing it wrong" (which, as far as I observed his previous posts, often translates roughly to: "It's either impossible in Nim or I have no idea how to implement it so just go home and go around the problem"), I guess it's not generically doable without some kind of ugly magic I would prefer to avoid. :-/ Or maybe I'm missing something.

I can't say it's that much surprising, actually. Nim is garbage-collected so as a rule it doesn't follow single ownership and as a consequence, moving anything other than a direct routine call result is, at least theoretically, unsafe. It doesn't make it impossible but certainly not a natural thing to do (here I implicitly admit that I use Nim's features in ways I don't think are considered standard). It seems to me the reason for destructors being not-so-simple and not-so-natural is pretty much the same thing: both moving and destructors need deterministic lifetime. However, I do admit that if simplicity is one of the main goals of the language, it's probably still a good choice, especially considering Nim's GC's speed which is quite impressing. What puzzles me is why these two approaches, single-ownership-by-default and multiple-ownership-by-default, seem to mix. Of course they can be mixed in some degree and I'm not sure what is the final goal (it seems to me that the goal is changing, as I find the proposal of removing method multiple dispatch and introducing concept vt-ptrs quite a big change) so I can't tell for sure, but it seems to me that it's hardly possible to achieve all the goals (at least if I'm right about what the goals are).

By the way, it's a bit funny that I encountered an issue with single-ownership when writing some code for nimpylib which mocks python, a typical shared-ownership-by-default language. ^^" I didn't expect that.

@Araq

Please forgive me the sarcastic-serious mix in this reply. I was trying to organize it by topic so the style is not that consistent. Before reading, consider the fact that if I was just to be mean to you, I would be too lazy to write such an essay. ^^" So here is my reply:

That is quite right, I did not. But as I recall, you always just tell me (and some other people doing weird things) that everything works just fine. Back a year ago, when I found the first bug, in distinct types and then in static[T], I had the feeling, based on the comments on others' problems, you would do so. Sadly, I cannot say I am glad to be right in this case...

By the way: it is not like I am always sure something cannot be done. I always hope it is easily doable and it is just me who forgot about something. More than that: sometimes I think something should be easily doable when writing code for reply and then it turns out something does not work the way it seems supposed to work. It happens in many (I think all) languages, it is just I encounter it more often when writing Nim code. I also encountered some writing, say, C++, Fortran or Rust but most of the times I just found the answers (or a bug report) already there so no need to write it.

You said you read my comments but at the same time you call them ramblings and nowhere-near-bug-reports. If you really read them, I take it you also read the code samples I attach sometimes (which, in my opinion, could be often almost copy-pasted as bug reports). Yet you do not comment why these samples (or my assumptions) are wrong. Why is it so?

It is quite reassuring that truth about you being aggressive and over-optimistic (at the very least) feels abusive to you as it means there are many people willing to help bright projects even when they encounter such manners and bad treating. Makes me believe in mankind again.

If you like it that way, go ahead and delete my comments (or any other comments), it is your language/forum/whatsoever anyway, why even bother mentioning it? Not like it would confirm my claim of you being closed for any comments other than "thank you for this amazing language". Which are a bit funny as well... isn't using the language a thanks itself?. I had quite a lot of fun with it myself but the more I love a language, the more I criticize it (just like I used to criticize Ada or Scala before I found a new lovelang). So I guess you should actually be happy that I criticize your language this much as it implies it means quite a lot to me. ^^"

I commented on how you treated me as I was having many issues with many languages and the top-users (not to mention authors) were always helpful, which I considered one of the greatest advantages of communities. And it was so even for languages with much better manuals and docs than Nim. I think you will admit that this is not Nim's greatest advantage (and it does not have to be). Even if we are here for fun, nobody likes to hear just how stupid they are instead of an answer to the actual problem they encountered.

I already know that it is you who knows everything already (like non-copying slices not being possible, because if you had failed to understand or implement them clearly implies they are not possible at all; or maybe you just don't like the idea, which is perfectly ok, but then why not just say so), you really do not have to constantly remind me that you have monopoly on that.

Apropos import foo: I guess I will not find it now but I am pretty sure you yourself encountered users to use from foo import ... often (it seems to me that you even said import foo was an unlucky choice but I am not sure on that), so I was pretty sure you would agree on that. Also, I did not just say that I think so, I gave an example (there are many more). So I do not think summing the comment up as me just being cocky is fair enough. Also, there are many topics (I think most of the threads on this forum) I would not comment on as I do not know much in this topic.

Anyway... I know we were talking about moving but I think moving from discussion to garbage-storm (because by "you just don't get it" you ended a discussion with ad-personam and now it is just me trying to persuade you I am not your enemy) is not the kind of moving we were supposed to discuss so please excuse me for stopping it here. You can send me an e-mail if you really want to continue.

Mirror of forum.nim-lang.org

3111 :: Move semantic and manuel memory management