Is there any reason for the inconsistency in naming? If not, has there been any thought to changing the names to make them more uniform.
(Of course, this hardly matters at all and is not really a problem, but it would feel nicer if the names were consistent.)
newT is for functions that return references, initT for functions that return values.
Note how there's both a newTable() and an initTable().
def: But seqs have value semantics, not ref semantics.
They can have either.
var a = @[1]
shallow a
var b = a
inc a[0]
echo a
echo b
Thanks, Jehan. I didn't know about the reference naming convention or about shallow.
Also, thanks Araq, I had figured it was probably historical accident for newSeq, but it's good to hear it straight from the horse's mouth.
newString and newSeq predate this naming convention.
Aren't there any plans to change these (or at least provide initSeq and initString aliases for them and mark the original ones as deprecated)?
I guess that would break pretty much all nim code out there though, but fixing it shouldn't be that hard...
@novist
Your macro stuff still distinguishes between ref and object and is just as "confusing".
[Note: This post (and this thread) is about consistent naming schemes for already-existing explicitly-invoked creation functions in the Nim library. I consider this is a separate issue to that of automatic constructors and new constructor-specific keywords that is discussed in this other thread: http://forum.nim-lang.org/t/703/5 ]
When I look through the system, tables and sets modules, it seems to me that part of the source of confusion is that there are actually 8 distinct examples of functionality, covered by just 2 naming convention prefixes, new and init (and additionally, as noted, newSeq and newString predate the naming convention):
- Allocate a new object of type T on the heap, zero it, and assign a ref to this object into a var parameter that will modify a pre-declared variable.
- Allocate a new object of type T (specified by typedesc) on the heap, zero it, and return a ref to this object.
- Return a container instance "by value" (ie, with value semantics) that has been pre-sized to a certain size (and the entries have been zeroed), eg, newSeq[T](len).
- Return a container instance "by value" (ie, with value semantics) that has been pre-sized to a certain size (but the entries are uninitialised), eg, creation of strings using newString(len).
- Allocate a new object on the heap and initialise it with the supplied information and return a ref to it, eg, newException[](...).
- Allocate a new container instance that is initialised as empty and return a ref to it, eg, newTable.
- Create a new container instance that is initialised as empty and return it "by value", eg, initTable, initSet.
- Initialise a new object of type T that has been passed in as a var parameter, eg, init(var HashSet).
These 8 examples of functionality can be partitioned into 4 general groups:
- G1: Allocate a new object on the heap (by type), default-initialise it, and return a ref.
- G2: Allocate a new object on the heap, initialise it with the supplied arguments, and return a ref.
- G3: Create a new object, initialise it with the supplied arguments and return it by value.
- G4: Initialise the object in a pre-declared variable that is passed as the first argument (whether the variable is a ref or not).
I think the confusion might be decreased if there were 4 (rather than 2) prefixes used for these 4 groups of functionality. To pick 4 arbitrary but commonly-used prefixes...
- new: Allocate a new object on the heap (by type), default-initialise it and return a ref (G1).
- For example: var f: ref Foo = new(Foo)
- newX: Allocate a new object on the heap, initialise it with the supplied arguments and return a ref (G2).
- For example: var f: ref Foo = newFoo(a, b)
- createX: Create a new object, initialise it with the supplied arguments and return it by value (G3).
- For example: var f: Foo = createFoo(a, b)
- init (or initX if more information is necessary): Initialise the object in a pre-declared variable that is passed as the first argument (whether the variable is a ref or not) (G4).
- For example (not a ref): var f: Foo; init(f, a, b) or var f: Foo; f.init(a, b).
- Alternately (a ref): var f: ref Foo; init(f, a, b) or var f: ref Foo; f.init(a, b)
These prefixes should be de-coupled from container-specific allocation behaviour such as "alloc but don't initialise" and "alloc a capacity but initialise as empty" that is provided for strings. These could be controlled by another set of suffixes like NoInit and OfCap. (It might make sense to provide corresponding procs for seqs too.)
Looking through the docs for the system module, the following name changes would be applied:
1. new[T](a: var ref T) (link to docs)
- -> initNew[T](a: var ref T)
2. new[](T: typedesc): ref T:type
- -> new[](T: typedesc): ref T:type (No change)
3. new[T](a: var ref T; finalizer: proc (x: ref T))
- -> initNew[T](a: var ref T; finalizer: proc (x: ref T))
4. newSeq[T](s: var seq[T]; len: int) (link to docs)
- -> init[T](s: var seq[T]; len = 0)
5. newSeq[T](len = 0): seq[T]
- -> createSeq[T](len = 0): seq[T]
6. newString(len: int): string (link to docs)
- -> createStringNoInit(len: int): string
7. newStringOfCap(cap: int): string
- -> createStringOfCap(cap: int): string
8. newException[](exceptn: typedesc; message: string): expr (link to docs)
- -> newException[](exceptn: typedesc; message: string): expr (No change)
Likewise, looking through the tables module:
9. newTable[A, B](initialSize = 64): TableRef[A, B] (link to docs)
- -> newTable[A, B](initialSize = 64): TableRef[A, B] (No change)
- (Note: It remains as newTable rather than createTable, because the type TableRef that is returned "by value" is transparently a ref to a Table.)
10. initTable[A, B](initialSize = 64): Table[A, B] (link to docs)
- -> createTable[A, B](initialSize = 64): Table[A, B]
And finally, looking through the sets module:
11. init[A](s: var HashSet[A]; initialSize = 64) (link to docs)
- -> init[A](s: var HashSet[A]; initialSize = 64) (No change)
12. initSet[A](initialSize = 64): HashSet[A]
- -> createSet[A](initialSize = 64): HashSet[A]
(Whew, that ended up longer than expected...)
I would say that this makes the language easier to learn because it defines standard, consistent "verbs" for the various procs in the standard library, which programmers can rely upon when encountering new modules, and which make it easier to learn the language due to consistencies that can be learned (and that will ultimately become idiomatic, just like len to obtain the length of a collection).
As I see it, this result is a combination of 3 specific effects:
- There are recognisable idiomatic verbs shared consistently between different types, to operate upon those types in the same way. This is just like using len for lengths (Why len and not sometimes length or size or getLength? For consistency!), or add for appending elements/chars to seqs/strings, or read & write for getting/putting bytes/chars/etc from files/buffers/etc.
- When you see create vs new, you can guess what it will be returning, even if you're unfamiliar with this particular proc. Likewise, if you want to return a ref rather than a value, you can guess that the proc you want will begin with new rather than create.
- Finally, it makes the calling conventions unambiguous: When you see create or new, you know it will return the new value to you; when you see init, you know you supply the uninitialised variable as the first argument.
Yes, there's always learning, but there's learning idioms & recogniseable patterns vs learning inconsistencies (because "that's just the way it is") and not being able to rely upon your intuition when there's an unknown proc (or you're trying to guess which proc you should use).
As to the open category of constructors, I don't see any problems extending this naming scheme beyond data-structures to include file-constructors.
- The open/close idiom, to obtain & release I/O resources (files, file descriptors, sockets) is so familiar that I think it makes sense to retain open & close as keyword components in proc names -- much like new is the idiomatic keyword to allocate an object of type T on the heap.
- I suspect that renaming the familiar open(filename, ...): File to something like openFile would cause more confusion than it would solve. That said, perhaps openFile would make sense as an alias that adheres to the naming scheme, for the form of open that doesn't take a var File parameter.
- So open(filename: string; mode: FileMode = fmRead; bufSize: int = - 1): File would gain an alias openFile(filename: string; mode: FileMode = fmRead; bufSize: int = - 1): File.
- This is analogous to newTable that allocates you a new Table.
- And to open a socket, it would be openSocket, etc.
3. For the two forms of open that take a var File parameter, I would suggest that their names should begin with init for consistency with all the other init-an-uninitialized-variable procs.
- So open(f: var File; filename: string; mode: FileMode = fmRead; bufSize: int = - 1): bool should become init(f: var File; filename: string; mode: FileMode = fmRead; bufSize: int = - 1): bool.
- No need for File in the name, because it's already known unambiguously from the first parameter.
- This is similar to my previous suggestion of initNew[T](a: var ref T) from new[T](a: var ref T). init for calling convention + open/new for the constructor operation.
- An argument could also be made for just init rather than initOpen, by analogy with init[T](s: var seq[T]; len = 0). But I think that the analogy of new -> initNew is a better analogy for open.
- I searched through the system module docs, for more occurrences of the string : var, but I didn't see any more constructor categories... If there are any that I've missed, please point them out.
- I did think that setShallow[T](s: var seq[T]) would make more sense than shallow[T](s: var seq[T]) (because then you have setLen, setShallow, etc.), but that's tangential.