rku (orginal) [2015-05-05T10:14:37+02:00] view original

Proposition to support constructors

Implementing this proposition would improve nim in multiple ways. Transparent way to initialize objects without having to explicitly look up documentation. This is even more important with good editor support that provides code completion with documentation. It would further boost productivity of nim. Familiar concept would enhance nim's OOP capabilities while being completely optional. Also wrapping c++ libraries would be more easy and fit original api more closely as we would be able to use only object type to initialize the object. No more awkward and confusing newFoo and initBar.

Proposed syntax

I am proposing fully backwards-compatible syntax building on top of current nim's capabilities. At the same time it would be intuitive and obvious to people with general programming knowledge. Syntax consists of 3 key parts:

Call syntax for object initialization

Ref-counted object initialization from non-ref object type

Auto-dereferencing in one more context

Call syntax for object initialization

var o = Foo(p1, p2, p3)

As illustrated in example type name Foo is used as proc call. When such format is used compiler should implicitly allocate object (stack or gc), initialize it's fields to default values and invoke costructor procedure which would be defined as proc init(self: var Foo, p2: int, p2: float, p3: bool). When constructing a ref type compiler should dereference object before passing it to init proc. This is required so we can have single init proc for both stack-local and garbage-collected objects. Further more then we can call parent init proc manually at appropriate place in child init proc. If call syntax is used with type name and no init proc with suitable parameters exists compiler should yield error. If old object initialization syntax with colons is used no init proc should be called. If object constructor with no additional arguments (Foo()) is called then init proc should be implicitly called if such exists (proc init(self: var Foo)), if at least once proc named init with first argument self: var Foo exists but there is no proc with solely self: var Foo argument - error should be thrown, if no proc named init with first argument self: var Foo exists - object should be allocated and fields initialized to their default values (current behavior). Compiler should prove that not nil fields are initialized in init proc. However this may be optional and thus not nil field usage would be restricted in objects initialized using constructor.

Full example:

type Foo = object of RootObj
    n: int
type Bar = object of Foo

proc init(self: var Foo, n: int) =
    self.n = n
    echo "Foo"

proc init(self: var Bar) =
    init(Foo(self))
    echo "Bar"

var o = Bar()           # Calls ``init(self: var Foo)``
echo "---"
var r = ref Bar()       # Calls ``init(self: var Bar)``, this will be covered in next paragraph
var old1 = Foo(n: 2)    # Does not call init proc, sets n=2
var old3 = Bar(n: 3)    # Does not call init proc, sets n=3
var err = Foo()         # Raises error, no suitable constructor found, because ``init(self: var Foo, ...)`` is defined

Prints:


Foo
Bar
---
Foo
Bar

Ref-counted object initialization from non-ref object type

Nim actually supports this feature already, we just need a tiny tweak. Consider this:


var o = (ref Foo)()

Foo is non-ref object type. It would be much more intuitive to simply type this:

var o = ref Foo()

Now we have a clear distinction what sets stack and gc objects apart. Further more we can avoid multiple type definitions (Foo + FooRef) for same type. Best of all - there is no need whatsoever for new keyword like new which would seem out of place in nim.

Auto-dereferencing in one more context

As noted this is already available as experimental feature with {.experimental.} pragma.

Now that we can use proc init(self: var Foo) for constructors of both ref and non-ref types we need same thing for calling procs/methods on objects too. We should be able to do this:

proc talk(self: var Foo) =
   echo "Hello"
var o = Foo()
var r = ref Foo()
o.talk()    # Prints "Hello"
r.talk()    # Prints "Hello", notice r has no deref operator []

I sit down and put some time into writing what i think we collectively imagine as suitable constructor implementation. This is continuation of discussion here: http://forum.nim-lang.org/t/703 So please comment/discuss. I intend to keep first post updated with details that we decide need changing and what not, while keeping list of modifications appended to this post. I know new feature introduction requires RFC now. Well im not capable to put together RFC like ones from IETF, this is best i could put together now, if its somehow lacking please excuse me and point out problems so they can be corrected.

EDIT1: Added note about auto-dereferencing being available as experimental feature.

Arrrrrrrrr (orginal) [2015-05-05T13:13:56+02:00] view original

var r = ref Foo()
o.talk()    # Prints "Hello"
r.talk()    # Prints "Hello", notice r has no deref operator []

It is possible using experimental. At first i liked it more, but now i'm unsure. If it does the autodereferencinf, then i cannot program nil safe procs like this:

type MyObject = ref object
  len: int

proc size(o: MyObject): int =
  return if o == nil: 0 else: o.len

var o: MyObject
echo o.size

rku (orginal) [2015-05-05T13:39:06+02:00] view original

Good point. Turns out at the moment auto-dereferencing nil value causes immediate crash. I was hoping for exception at least. Apparently this feature is rightly experimental. Your example is a little bit wrong because size proc expects ref type too, which works just fine with autoderef. Hell breaks loose if it expected non-ref type. However if you wanted your nil-safe procs you can easily have them, all they have to do is accept a ref type. Therefore i think it is not really a problem since what you want is possible either way. Exception for nil deref would be desirable though.

Jehan (orginal) [2015-05-05T14:15:13+02:00] view original

Two notes:

First, it's not backwards compatible, since T(x) is already used for explicit conversion of x to type T. T() is the empty object constructor for T with no init being called.

Second, I reiterate my concern that C++-style constructors are basically broken. Disambiguation by overloading is a painful hack, because overloading exists to have procedures with similar behavior share the same name; furthermore, overloading is insufficient to distinguish between constructors with identical type signatures. Name-based constructors are generally superior.

HOLYCOWBATMAN (orginal) [2015-05-05T14:41:03+02:00] view original

Jehan nailed it.

Right now with the current syntax its easy to see whats happening and we have fine control over it, with your proposal, not so much.

The current system is pretty good IMO, it does not need changing. The only feature missing is a way to disable the default type constructor from outside the module while still exporting the type (which is comming eventually if I remember correctly from IRC).

Sixte (orginal) [2015-05-05T15:09:47+02:00] view original

IMHO, it is a very good design decision to make Ref-Types explicit: something, what Nim already does.

So, declare simply:

Type
  RFoo = ref Foo
  Foo  = object
    p1,p2,p3 : int
 
 var vfoo = RFoo(p1: 0, p2: 1, p3: 2)
 #  var vfoo = RFoo(0, 1, 2) # does not work in current Nim...
 #  could it be done with a macro?

Automatic Dereferencing? I think that the "var" keyword in proc signatures should be extended with a "varc" keyword and only in this case automatic referencing (at caller side) should be performed. (Rust is the other extreme - tons of annotations have to be made).

proc talk(self: var Foo) =
 echo "Hello" # we definitely expect a variable here, no reference counting for the  garbage collector!
proc talk(self: RFoo) =
 echo "Hello" # we get a reference and could bind it to something else (ref. counting involved)

the r[].size makes explicit that you simply pass the address of the value to the callee. (proc var is used, in Rust: the caller does'nt "own" the object) - proc talk var and proc talk ref are two different functions (what they should be).

Jehan (orginal) [2015-05-05T15:34:53+02:00] view original

Here is a simple example of how named constructors could work:

import macros

macro make(e: untyped): auto =
  var tp, call: NimNode
  let sym = genSym(nskVar)
  case e.kind
  of nnkCall:
    case e[0].kind
    of nnkDotExpr:
      tp = e[0][0]
      call = newCall(e[0][1])
      add(call, sym)
      for i in 1..len(e)-1:
        add(call, e[i])
    of nnkIdent:
      tp = e[0]
      call = newCall(!"init", sym)
    else:
      error("not a constructor call")
  of nnkDotExpr:
    tp = e[0]
    call = newCall(e[1], sym)
  else:
    error("not a constructor call")
  expectKind(tp, nnkIdent)
  result = quote do:
    var `sym` = `tp`()
    `call`
    `sym`

import strutils

type Obj = ref object
  x: int

proc init(ob: Obj) =
  ob.x = 1

proc init(ob: Obj, z: int) =
  ob.x = z

proc fromString(ob: Obj, s: string) =
  ob.x = s.parseInt

proc fromMin(ob: Obj, a, b: int) =
  ob.x = min(a, b)

proc `$`(ob: Obj): string = "Obj(x: " & $ob.x & ")"

var a1 = make Obj.init
var a2 = make Obj.init(2)
var a3 = make Obj.init()
var a4 = make Obj.fromString("99")
var a5 = make Obj.fromMin(314, 2718)
var a6 = make Obj()

echo a1, " ", a2, " ", a3, " ", a4, " ", a5, " ", a6

rku (orginal) [2015-05-05T16:05:54+02:00] view original

@Jehan you are wrong on backwards compatibility. It is fully backwards-compatible. T(x) indeed is alreayd used as type conversion and it does not impact this proposal in any way. It is even used to call right parent init proc. While T() would call init implicitly it would do so only if any init with first argument var T existed. No such proc means no construction going on.

@HOLYCOWBATMAN you are wrong too actually. With my proposal everyone has fine control over the things still. Noone has to use this, but can if they wish so. This is not changing system, this is building on top of it.

From my experience in python single constructor works just fine. However it gets clumsy when one constructor does 10 different things. Opting in for single constructor is far from ideal. Besides from my c++ experience only problem in practice i had with multiple constructors is one constructor being able to call other sibling constructor to avoid code duplication. However in nim it would work just fine.

Now about zeroMatrix and identityMatrix. While it is not possible to have two constructors with same arguments it does not really make sense to have such thing anyway. Then comes compromise. For example procs zeroMatrix and identityMatrix that pretty much act as named cosntructors. Imagine constructor as general object setup. In this specific case:

type Matrix = object
    m: seq[int]

proc init(self: var Matrix) =
    self.m = @[]

Or maybe, just maybe, consider having two types inheriting matrix and performing different initialization. Then code quality goes up because now tools can recognize what kind of matrix it is, either ZeroMatrix or IdentityMatrix, not THE MATRIX one and only.

Perelandric (orginal) [2015-05-05T17:07:04+02:00] view original

Given these types and init procs...


type Foo = ref object of RootObj
    n: int
type Bar = ref object of Foo
    next: Foo

proc init(self: var Foo, n: int) =
    self.n = n

proc init(self: var Bar, next: Foo) =
    init(Foo(self, 0))
    self.next = next

And given that f is an instance of Foo, does this code invoke the constructor or would it be an explicit type conversion?


var x = Bar(f)

Sixte (orginal) [2015-05-05T17:08:12+02:00] view original

@ Jehan:

I tried to compile your example, but got a compiler error:

"Error: undeclared identifier: 'untyped' "

What did I wrong?

Sixte (orginal) [2015-05-05T17:49:42+02:00] view original

Couldn't render post #7354.

Jehan (orginal) [2015-05-05T18:03:52+02:00] view original

Sixte: does the macro reflect the content of an object here? What else should stand for len? I think it stands for a list...

Macros work on ASTs (abstract syntax tress). len(node) gives you the number of children that a node has, and node[i] produces the i-th child of a node.

Sixte: So, it is principally possible too to build a macro "multi-var" like ...

In principle, yes, though the syntax would be a bit of a pain to handle. But the following works:

import macros

macro `..=`*(lhs, rhs: untyped): expr =
  # Check that the lhs is a tuple of identifiers.
  expectKind(lhs, nnkPar)
  for i in 0..len(lhs)-1:
    expectKind(lhs[i], nnkIdent)
  # Result is a statement list starting with an
  # assignment to a tmp variable of rhs.
  let t = genSym()
  result = newStmtList(quote do:
    let `t` = `rhs`)
  # assign each component to the corresponding
  # variable.
  for i in 0..len(lhs)-1:
    let v = lhs[i]
    # skip assignments to _.
    if $v.toStrLit != "_":
      result.add(quote do:
        `v` = `t`[`i`])

var x, y: int
(x, y) ..= (1, 2)
echo x, y
(x, _) ..= (3, 4)
echo x, y

Note that you can already do stuff like:

let (x, y) = (1, 2)
let (z, _) = (3, 4)

Sixte (orginal) [2015-05-05T18:46:46+02:00] view original

Hm, it seems difficult to avoid the '(' ... ')' stuff (and now we have some characters more to type...)

Anyway, I'll try to do some macro programming - very helpful. And this is not my thread. I'll start a separate thread with some questions in addition.

rku (orginal) [2015-05-06T08:58:59+02:00] view original

T(x) remains ambiguous.``T(x)`` can mean calling a constructor with one argument or an explicit type conversion.

Oh so thats what you meant, sorry, i misunderstood before.

Likewise, if you want T() to default to calling the argument-less constructor, then it becomes impossible to create a new object by calling T(). Stuff like this is why C++ and friends prefix object construction with new.

Good point. I dont think anyone wants that. I cant quite think of viable solution to that. If we resorted to introducing keywords like that then all of this can be done via macro. Kind of defeats purpose any change in core language.

First of all, that forces you to use dynamic dispatch and pay for the overhead of that.

Usually there is no overhead if correct proc can be figured out at compile time. Then its simply a call, no overhead except a little bit more work for compiler at compile time. Cost comes with procs that are called indirectly (virtual in c++). I assume it works similar in this case too.

rku (orginal) [2015-05-07T11:42:38+02:00] view original

I got intrigued by Jehan's named constructors idea. However make "keyword" is verbose and unnecessary. I figured proc could be rewritten a little bit to behave like comfortable constructors. Core features:

Macro is applied as pragma, looks almost as if it is core compiler feature.

Single proc defined for constructing both ref and non-ref type versions.

Constructor can be generic (or is template correct term?).

Custom name for this/self object defined in constructor (like self in python).

Macro basically expands

proc init(self: var Foo) {.ctor.} =
         echo "ctor"
         result.a = 123

into:

proc init[TConstructorType: Foo|(ref Foo)](TSelf: type TConstructorType): TConstructorType =
         result = TConstructorType()
         var self = result
         echo "ctor"
         result.a = 123

Still cant figure out one bug: non-ref types have to use result to set up object instead of self because result = T(); var self = result; makes a copy of result into self where we need a reference/alias. If anyone has ideas how to solve this please speak up!

Code:

import macros

type
    Foo = object
        a: int
    FooRef = ref Foo


macro ctor(prc: typed{nkProcDef}): auto {.immediate.} =
    if prc[3][1][1].kind != nnkVarTy:
        error("Constructor must have var type as first parameter")
    if prc[3][0].kind != nnkEmpty:
        error("Constructor must not have return type")
    
    
    var type_identifier = prc[3][1][1][0]
    # echo repr(type_identifier)
    #repr(get_type(prc[3][1][1][0]))
    # if get_type(prc[3][1][1][0]).typekind == ntyRef:
    #     echo "ref"
    
    var self = prc[3][1][0]
    prc[3][1][0] = new_ident_node("TConstructorType")
    
    # GenericParams
    #   IdentDefs
    #     Ident !"T"
    #     Infix
    #       Ident !"|"
    #       Ident !"Foo"
    #       Par
    #         RefTy
    #           Ident !"Foo"
    #     Empty
    if prc[2].kind == nnkEmpty:
        prc[2] = new_nim_node(nnkGenericParams)
    
    prc[2].add(
        new_nim_node(nnkIdentDefs).add(
            new_ident_node("TConstructorType"),
            new_nim_node(nnkInfix).add(
                new_ident_node("|"),
                type_identifier,
                new_nim_node(nnkPar).add(
                    new_nim_node(nnkRefTy).add(
                        type_identifier
                    )
                )
            ),
            new_empty_node()
        )
    )
    prc[3][0] = new_ident_node("TConstructorType")                   # return type
    prc[3][1] = new_nim_node(nnkIdentDefs).add(
        new_ident_node("TSelf"),
        new_nim_node(nnkCommand).add(
            new_ident_node("type"),
            new_ident_node("TConstructorType")
        ),
        new_empty_node()
    )
    
    # Allocate type
    # Asgn
    #   Ident !"result"
    #   Call
    #     Ident !"TConstructorType"
    prc[6].insert(0,
        new_nim_node(nnkAsgn).add(
            new_ident_node("result"),
            new_call(
                new_ident_node("TConstructorType")
            )
        )
    )
    # Assign result to own-named `self` instance
    # VarSection
    #   IdentDefs
    #     Ident !"self"
    #     Empty
    #     Ident !"result"
    prc[6].insert(1,
        new_nim_node(nnkVarSection).add(
            new_nim_node(nnkIdentDefs).add(
                self,
                new_empty_node(),
                new_ident_node("result")
            )
        )
    )
    echo tree_repr(prc)
    return prc

# dump_tree:
#     proc init[TConstructorType: Foo|(ref Foo)](TSelf: type TConstructorType): TConstructorType =
#         result = TConstructorType()
#         var self = result
#         echo "ctor"
#         result.a = 123

proc init[T](self: var Foo, n: T) {.ctor.} =
    result.a = int(n)
    echo("Passed to ctor: ", $n)


var f1 = Foo.init(123)
var f2 = FooRef.init(321.2)
var f3 = (ref Foo).init(444)
echo repr(f1)
echo repr(f2)
echo repr(f3)

filwit (orginal) [2015-05-07T14:32:25+02:00] view original

rku: we need a reference/alias. If anyone has ideas how to solve this please speak up!

You can use template self: Foo = result instead. Although it might be better to do something like:

proc new(f:var Foo, a, b:int) {.ctor.} =
  f.a = a
  f.b = b

# turns into:

proc new(f:type Foo, a, b:int): Foo =
  proc construct(f:var Foo, a, b:int) {.inline.} =
    f.a = a
    f.b = b
  result = Foo()
  result.construct(a, b)

That way using result inside the original ctor code is illegal.

Jehan (orginal) [2015-05-07T15:13:59+02:00] view original

rku: However make "keyword" is verbose and unnecessary.

The verbosity is on purpose. The make pseudo-keyword I used serves as a visual marker to distinguish object construction from a procedure call. Technically, proc is also verbose and unnecessary, but it helps when reading the code. Code is written once and read and modified hundreds of times and should be optimized towards the latter. Line noise is the wrong optimization goal for code readability.

Also, my suggestion intentionally preserves the original procedure. E.g. one can do:

var x = make T.init
x.doSomething
x.init                    # reset the state of x without creating a new object

Finally, the make operator also works for non-proc callables.

E.g.

let curriedInit = ...
let t = make T.curriedInit

or:

type Generator = ref object
  original: int

type T = ref object
  value: int

proc `()`(x: Generator, y: T) = y.value = x.original

var gen = Generator(original: 314)
var u = make T.gen
echo repr(u)

And, of course, methods.

Conceptually, make is an operator that takes a type and a call and composes the allocation of the type with the semantics of the call. It's not just an alternate notation.

Sixte (orginal) [2015-05-07T16:29:43+02:00] view original

@Jehan

Is it principally possible to pass a macro a list of identifiers?

E.g. "make" a,b,c "keyword" ... and a list of expressions follows?

"make" stands for a macro, "keyword" is an additional literal, used by and within the macro

Or is this blocked by the parser?

Jehan (orginal) [2015-05-07T18:04:16+02:00] view original

The argument of a macro must itself be a syntactically valid Nim expression. However, there are some options to have "keywords" in the middle of other stuff. Example:

import pegs, strutils

template loop(body: untyped) =
  template until(cond: untyped) =
    if cond: break
  while true:
    body

proc main() =
  var s: string
  loop:
    s = stdin.readLine
    until s =~ peg"[0-9]+"
  echo s.parseInt

main()

or something LINQ-like:

let q =
  query do:
    for item in collection:
      where item.weight <= 12
      select (item, item.weight)

Parsing that grammar would be challenging with the current macro library (need some AST matching functionality for ease of use), though of course simple list comprehension like queries can be done without this:

template enumerate(s: untyped): auto =
  block:
    iterator temp(): auto = s
    var result = newSeq[type(temp())]()
    for item in temp():
      add(result, item)
    result

const n = 20

let triangles = enumerate do:
  for x in 1..n:
    for y in x..n:
      for z in y..n:
        if x*x + y*y == z*z:
          yield (a: x, b: y, c: z)

let even10 = enumerate do:
  for x in 1..10:
    if x mod 2 == 0:
      yield x

echo triangles
echo even10

Mirror of forum.nim-lang.org

1190 :: [RFC] Constructors proposition

Proposition to support constructors

Proposed syntax

Call syntax for object initialization

Ref-counted object initialization from non-ref object type

Auto-dereferencing in one more context