Implementing this proposition would improve nim in multiple ways. Transparent way to initialize objects without having to explicitly look up documentation. This is even more important with good editor support that provides code completion with documentation. It would further boost productivity of nim. Familiar concept would enhance nim's OOP capabilities while being completely optional. Also wrapping c++ libraries would be more easy and fit original api more closely as we would be able to use only object type to initialize the object. No more awkward and confusing newFoo and initBar.
I am proposing fully backwards-compatible syntax building on top of current nim's capabilities. At the same time it would be intuitive and obvious to people with general programming knowledge. Syntax consists of 3 key parts:
var o = Foo(p1, p2, p3)
As illustrated in example type name Foo is used as proc call. When such format is used compiler should implicitly allocate object (stack or gc), initialize it's fields to default values and invoke costructor procedure which would be defined as proc init(self: var Foo, p2: int, p2: float, p3: bool). When constructing a ref type compiler should dereference object before passing it to init proc. This is required so we can have single init proc for both stack-local and garbage-collected objects. Further more then we can call parent init proc manually at appropriate place in child init proc. If call syntax is used with type name and no init proc with suitable parameters exists compiler should yield error. If old object initialization syntax with colons is used no init proc should be called. If object constructor with no additional arguments (Foo()) is called then init proc should be implicitly called if such exists (proc init(self: var Foo)), if at least once proc named init with first argument self: var Foo exists but there is no proc with solely self: var Foo argument - error should be thrown, if no proc named init with first argument self: var Foo exists - object should be allocated and fields initialized to their default values (current behavior). Compiler should prove that not nil fields are initialized in init proc. However this may be optional and thus not nil field usage would be restricted in objects initialized using constructor.
Full example:
type Foo = object of RootObj
n: int
type Bar = object of Foo
proc init(self: var Foo, n: int) =
self.n = n
echo "Foo"
proc init(self: var Bar) =
init(Foo(self))
echo "Bar"
var o = Bar() # Calls ``init(self: var Foo)``
echo "---"
var r = ref Bar() # Calls ``init(self: var Bar)``, this will be covered in next paragraph
var old1 = Foo(n: 2) # Does not call init proc, sets n=2
var old3 = Bar(n: 3) # Does not call init proc, sets n=3
var err = Foo() # Raises error, no suitable constructor found, because ``init(self: var Foo, ...)`` is defined
Prints:
Foo
Bar
---
Foo
Bar
Nim actually supports this feature already, we just need a tiny tweak. Consider this:
var o = (ref Foo)()
Foo is non-ref object type. It would be much more intuitive to simply type this:
var o = ref Foo()
Now we have a clear distinction what sets stack and gc objects apart. Further more we can avoid multiple type definitions (Foo + FooRef) for same type. Best of all - there is no need whatsoever for new keyword like new which would seem out of place in nim.
As noted this is already available as experimental feature with {.experimental.} pragma.
Now that we can use proc init(self: var Foo) for constructors of both ref and non-ref types we need same thing for calling procs/methods on objects too. We should be able to do this:
proc talk(self: var Foo) =
echo "Hello"
var o = Foo()
var r = ref Foo()
o.talk() # Prints "Hello"
r.talk() # Prints "Hello", notice r has no deref operator []
I sit down and put some time into writing what i think we collectively imagine as suitable constructor implementation. This is continuation of discussion here: http://forum.nim-lang.org/t/703 So please comment/discuss. I intend to keep first post updated with details that we decide need changing and what not, while keeping list of modifications appended to this post. I know new feature introduction requires RFC now. Well im not capable to put together RFC like ones from IETF, this is best i could put together now, if its somehow lacking please excuse me and point out problems so they can be corrected.
EDIT1: Added note about auto-dereferencing being available as experimental feature.
var r = ref Foo()
o.talk() # Prints "Hello"
r.talk() # Prints "Hello", notice r has no deref operator []
It is possible using experimental. At first i liked it more, but now i'm unsure. If it does the autodereferencinf, then i cannot program nil safe procs like this:
type MyObject = ref object
len: int
proc size(o: MyObject): int =
return if o == nil: 0 else: o.len
var o: MyObject
echo o.size
Two notes:
First, it's not backwards compatible, since T(x) is already used for explicit conversion of x to type T. T() is the empty object constructor for T with no init being called.
Second, I reiterate my concern that C++-style constructors are basically broken. Disambiguation by overloading is a painful hack, because overloading exists to have procedures with similar behavior share the same name; furthermore, overloading is insufficient to distinguish between constructors with identical type signatures. Name-based constructors are generally superior.
Jehan nailed it.
Right now with the current syntax its easy to see whats happening and we have fine control over it, with your proposal, not so much.
The current system is pretty good IMO, it does not need changing. The only feature missing is a way to disable the default type constructor from outside the module while still exporting the type (which is comming eventually if I remember correctly from IRC).
IMHO, it is a very good design decision to make Ref-Types explicit: something, what Nim already does.
So, declare simply:
Type
RFoo = ref Foo
Foo = object
p1,p2,p3 : int
var vfoo = RFoo(p1: 0, p2: 1, p3: 2)
# var vfoo = RFoo(0, 1, 2) # does not work in current Nim...
# could it be done with a macro?
Automatic Dereferencing? I think that the "var" keyword in proc signatures should be extended with a "varc" keyword and only in this case automatic referencing (at caller side) should be performed. (Rust is the other extreme - tons of annotations have to be made).
proc talk(self: var Foo) =
echo "Hello" # we definitely expect a variable here, no reference counting for the garbage collector!
proc talk(self: RFoo) =
echo "Hello" # we get a reference and could bind it to something else (ref. counting involved)
the r[].size makes explicit that you simply pass the address of the value to the callee. (proc var is used, in Rust: the caller does'nt "own" the object) - proc talk var and proc talk ref are two different functions (what they should be).
Here is a simple example of how named constructors could work:
import macros
macro make(e: untyped): auto =
var tp, call: NimNode
let sym = genSym(nskVar)
case e.kind
of nnkCall:
case e[0].kind
of nnkDotExpr:
tp = e[0][0]
call = newCall(e[0][1])
add(call, sym)
for i in 1..len(e)-1:
add(call, e[i])
of nnkIdent:
tp = e[0]
call = newCall(!"init", sym)
else:
error("not a constructor call")
of nnkDotExpr:
tp = e[0]
call = newCall(e[1], sym)
else:
error("not a constructor call")
expectKind(tp, nnkIdent)
result = quote do:
var `sym` = `tp`()
`call`
`sym`
import strutils
type Obj = ref object
x: int
proc init(ob: Obj) =
ob.x = 1
proc init(ob: Obj, z: int) =
ob.x = z
proc fromString(ob: Obj, s: string) =
ob.x = s.parseInt
proc fromMin(ob: Obj, a, b: int) =
ob.x = min(a, b)
proc `$`(ob: Obj): string = "Obj(x: " & $ob.x & ")"
var a1 = make Obj.init
var a2 = make Obj.init(2)
var a3 = make Obj.init()
var a4 = make Obj.fromString("99")
var a5 = make Obj.fromMin(314, 2718)
var a6 = make Obj()
echo a1, " ", a2, " ", a3, " ", a4, " ", a5, " ", a6
@Jehan you are wrong on backwards compatibility. It is fully backwards-compatible. T(x) indeed is alreayd used as type conversion and it does not impact this proposal in any way. It is even used to call right parent init proc. While T() would call init implicitly it would do so only if any init with first argument var T existed. No such proc means no construction going on.
@HOLYCOWBATMAN you are wrong too actually. With my proposal everyone has fine control over the things still. Noone has to use this, but can if they wish so. This is not changing system, this is building on top of it.
From my experience in python single constructor works just fine. However it gets clumsy when one constructor does 10 different things. Opting in for single constructor is far from ideal. Besides from my c++ experience only problem in practice i had with multiple constructors is one constructor being able to call other sibling constructor to avoid code duplication. However in nim it would work just fine.
Now about zeroMatrix and identityMatrix. While it is not possible to have two constructors with same arguments it does not really make sense to have such thing anyway. Then comes compromise. For example procs zeroMatrix and identityMatrix that pretty much act as named cosntructors. Imagine constructor as general object setup. In this specific case:
type Matrix = object
m: seq[int]
proc init(self: var Matrix) =
self.m = @[]
Or maybe, just maybe, consider having two types inheriting matrix and performing different initialization. Then code quality goes up because now tools can recognize what kind of matrix it is, either ZeroMatrix or IdentityMatrix, not THE MATRIX one and only.
Given these types and init procs...
type Foo = ref object of RootObj
n: int
type Bar = ref object of Foo
next: Foo
proc init(self: var Foo, n: int) =
self.n = n
proc init(self: var Bar, next: Foo) =
init(Foo(self, 0))
self.next = next
And given that f is an instance of Foo, does this code invoke the constructor or would it be an explicit type conversion?
var x = Bar(f)
@ Jehan:
I tried to compile your example, but got a compiler error:
"Error: undeclared identifier: 'untyped' "
What did I wrong?
Sixte: does the macro reflect the content of an object here? What else should stand for len? I think it stands for a list...
Macros work on ASTs (abstract syntax tress). len(node) gives you the number of children that a node has, and node[i] produces the i-th child of a node.
Sixte: So, it is principally possible too to build a macro "multi-var" like ...
In principle, yes, though the syntax would be a bit of a pain to handle. But the following works:
import macros
macro `..=`*(lhs, rhs: untyped): expr =
# Check that the lhs is a tuple of identifiers.
expectKind(lhs, nnkPar)
for i in 0..len(lhs)-1:
expectKind(lhs[i], nnkIdent)
# Result is a statement list starting with an
# assignment to a tmp variable of rhs.
let t = genSym()
result = newStmtList(quote do:
let `t` = `rhs`)
# assign each component to the corresponding
# variable.
for i in 0..len(lhs)-1:
let v = lhs[i]
# skip assignments to _.
if $v.toStrLit != "_":
result.add(quote do:
`v` = `t`[`i`])
var x, y: int
(x, y) ..= (1, 2)
echo x, y
(x, _) ..= (3, 4)
echo x, y
Note that you can already do stuff like:
let (x, y) = (1, 2)
let (z, _) = (3, 4)
Hm, it seems difficult to avoid the '(' ... ')' stuff (and now we have some characters more to type...)
Anyway, I'll try to do some macro programming - very helpful. And this is not my thread. I'll start a separate thread with some questions in addition.
T(x) remains ambiguous.``T(x)`` can mean calling a constructor with one argument or an explicit type conversion.
Oh so thats what you meant, sorry, i misunderstood before.
Likewise, if you want T() to default to calling the argument-less constructor, then it becomes impossible to create a new object by calling T(). Stuff like this is why C++ and friends prefix object construction with new.
Good point. I dont think anyone wants that. I cant quite think of viable solution to that. If we resorted to introducing keywords like that then all of this can be done via macro. Kind of defeats purpose any change in core language.
First of all, that forces you to use dynamic dispatch and pay for the overhead of that.
Usually there is no overhead if correct proc can be figured out at compile time. Then its simply a call, no overhead except a little bit more work for compiler at compile time. Cost comes with procs that are called indirectly (virtual in c++). I assume it works similar in this case too.
Macro basically expands
proc init(self: var Foo) {.ctor.} =
echo "ctor"
result.a = 123
into:
proc init[TConstructorType: Foo|(ref Foo)](TSelf: type TConstructorType): TConstructorType =
result = TConstructorType()
var self = result
echo "ctor"
result.a = 123
Still cant figure out one bug: non-ref types have to use result to set up object instead of self because result = T(); var self = result; makes a copy of result into self where we need a reference/alias. If anyone has ideas how to solve this please speak up!
Code:
import macros
type
Foo = object
a: int
FooRef = ref Foo
macro ctor(prc: typed{nkProcDef}): auto {.immediate.} =
if prc[3][1][1].kind != nnkVarTy:
error("Constructor must have var type as first parameter")
if prc[3][0].kind != nnkEmpty:
error("Constructor must not have return type")
var type_identifier = prc[3][1][1][0]
# echo repr(type_identifier)
#repr(get_type(prc[3][1][1][0]))
# if get_type(prc[3][1][1][0]).typekind == ntyRef:
# echo "ref"
var self = prc[3][1][0]
prc[3][1][0] = new_ident_node("TConstructorType")
# GenericParams
# IdentDefs
# Ident !"T"
# Infix
# Ident !"|"
# Ident !"Foo"
# Par
# RefTy
# Ident !"Foo"
# Empty
if prc[2].kind == nnkEmpty:
prc[2] = new_nim_node(nnkGenericParams)
prc[2].add(
new_nim_node(nnkIdentDefs).add(
new_ident_node("TConstructorType"),
new_nim_node(nnkInfix).add(
new_ident_node("|"),
type_identifier,
new_nim_node(nnkPar).add(
new_nim_node(nnkRefTy).add(
type_identifier
)
)
),
new_empty_node()
)
)
prc[3][0] = new_ident_node("TConstructorType") # return type
prc[3][1] = new_nim_node(nnkIdentDefs).add(
new_ident_node("TSelf"),
new_nim_node(nnkCommand).add(
new_ident_node("type"),
new_ident_node("TConstructorType")
),
new_empty_node()
)
# Allocate type
# Asgn
# Ident !"result"
# Call
# Ident !"TConstructorType"
prc[6].insert(0,
new_nim_node(nnkAsgn).add(
new_ident_node("result"),
new_call(
new_ident_node("TConstructorType")
)
)
)
# Assign result to own-named `self` instance
# VarSection
# IdentDefs
# Ident !"self"
# Empty
# Ident !"result"
prc[6].insert(1,
new_nim_node(nnkVarSection).add(
new_nim_node(nnkIdentDefs).add(
self,
new_empty_node(),
new_ident_node("result")
)
)
)
echo tree_repr(prc)
return prc
# dump_tree:
# proc init[TConstructorType: Foo|(ref Foo)](TSelf: type TConstructorType): TConstructorType =
# result = TConstructorType()
# var self = result
# echo "ctor"
# result.a = 123
proc init[T](self: var Foo, n: T) {.ctor.} =
result.a = int(n)
echo("Passed to ctor: ", $n)
var f1 = Foo.init(123)
var f2 = FooRef.init(321.2)
var f3 = (ref Foo).init(444)
echo repr(f1)
echo repr(f2)
echo repr(f3)
rku: we need a reference/alias. If anyone has ideas how to solve this please speak up!
You can use template self: Foo = result instead. Although it might be better to do something like:
proc new(f:var Foo, a, b:int) {.ctor.} =
f.a = a
f.b = b
# turns into:
proc new(f:type Foo, a, b:int): Foo =
proc construct(f:var Foo, a, b:int) {.inline.} =
f.a = a
f.b = b
result = Foo()
result.construct(a, b)
That way using result inside the original ctor code is illegal.
rku: However make "keyword" is verbose and unnecessary.
The verbosity is on purpose. The make pseudo-keyword I used serves as a visual marker to distinguish object construction from a procedure call. Technically, proc is also verbose and unnecessary, but it helps when reading the code. Code is written once and read and modified hundreds of times and should be optimized towards the latter. Line noise is the wrong optimization goal for code readability.
Also, my suggestion intentionally preserves the original procedure. E.g. one can do:
var x = make T.init
x.doSomething
x.init # reset the state of x without creating a new object
Finally, the make operator also works for non-proc callables.
E.g.
let curriedInit = ...
let t = make T.curriedInit
or:
type Generator = ref object
original: int
type T = ref object
value: int
proc `()`(x: Generator, y: T) = y.value = x.original
var gen = Generator(original: 314)
var u = make T.gen
echo repr(u)
And, of course, methods.
Conceptually, make is an operator that takes a type and a call and composes the allocation of the type with the semantics of the call. It's not just an alternate notation.
@Jehan
Is it principally possible to pass a macro a list of identifiers?
E.g. "make" a,b,c "keyword" ... and a list of expressions follows?
"make" stands for a macro, "keyword" is an additional literal, used by and within the macro
Or is this blocked by the parser?
The argument of a macro must itself be a syntactically valid Nim expression. However, there are some options to have "keywords" in the middle of other stuff. Example:
import pegs, strutils
template loop(body: untyped) =
template until(cond: untyped) =
if cond: break
while true:
body
proc main() =
var s: string
loop:
s = stdin.readLine
until s =~ peg"[0-9]+"
echo s.parseInt
main()
or something LINQ-like:
let q =
query do:
for item in collection:
where item.weight <= 12
select (item, item.weight)
Parsing that grammar would be challenging with the current macro library (need some AST matching functionality for ease of use), though of course simple list comprehension like queries can be done without this:
template enumerate(s: untyped): auto =
block:
iterator temp(): auto = s
var result = newSeq[type(temp())]()
for item in temp():
add(result, item)
result
const n = 20
let triangles = enumerate do:
for x in 1..n:
for y in x..n:
for z in y..n:
if x*x + y*y == z*z:
yield (a: x, b: y, c: z)
let even10 = enumerate do:
for x in 1..10:
if x mod 2 == 0:
yield x
echo triangles
echo even10