This idea has been nagging me at the back of my head. I was thinking of writing up an RFC, but don't want to waste my time if this has already been thought of before or if it presents more problems than it provides benefits for.
The idea: rather than requiring an API to have an explicit out parameter in a proc (e.g. for performance benefits), the special result variable could serve as an out parameter at the call site for every proc. At the callsite the result variable can be provided via a keyword argument. There is some ambiguity here: do we overwrite the user's result variable or "append" to it (e.g. if it is a container, to re-use pre-existing allocated memory).
Append is most useful convention here (IMO). The append coding convention would require implementations to rely on the default initialization (T.default) instead of an explicit initialization of result as I see in some Nim code in the wild.
Using pixie as an example:
Currently:
var p = parsePath(segA)
p.addPath(parsePath(segB)) # temp Path constructed from parsePath(segB)
With this idea:
var p2 = parsePath(segA)
parsePath(segB, result = p2) # no temp path & parsePath now does NOT return a Path
In the "Current" code p.addPath performs a .add's on the underlying seq storing path commands, see here
RVO might optimize the temporary out in the "current" code and optimize it to look like the "With the idea" code, but I think the intent is more clear with this language design idea.
The implementation of parsePath would have to change from:
proc parsePath*(path: string): Path {.raises: [PixieError].} =
## Converts a SVG style path string into seq of commands.
result = newPath()
...
to:
proc parsePath*(path: string): Path {.raises: [PixieError].} =
## Converts a SVG style path string into seq of commands.
# assume result is initialized
...
Note in this case, proc newPath*() = Path() (see), i.e. Path.default
Potential problems or reasons to be against this idea:
What are your thoughts? Has this been thought of before?
Change the behavior of the libraries you're using without modifying their code?
Yes, literally this. This is the same in concept to sugar.dup but in the inverse situation. Of course sugar.dup is not as brittle. Well, sugar.dup is brittle in the sense that not everyone uses this API design pattern.
Also, you are kinda forced to use the result variable, even if you didn't before
This is an implementation detail. result doesn't need to be explicitly written into in the proc implementation. return statements can be re-written to assign/mutate result and then an empty return statement can proceed. In fact, I'm sure this feature could be written as a macro, but support at the language level feels more appropriate for this.
If only there was a way to know if a particular proc really supports "append" semantics... Like, what if there could be a variant of a proc with the same name, but that accepts an additional var parameter and doesn't return anything?..
Sure, but in the same mindset as what you said "That will only work for procs that happen to use the result variable in the way you described." - your suggestion requires library authors to write code a certain way as well. Why not have a standard convention baked into the language? If you go off the path of this convention: mark your proc as "overwrite" or "append" depending on what is preferred.
Ok there is added complexity, but when writing a proc, when do you realistically assign result to a non-default value? My guess is you only do that when it's a "factory-style" proc, that is it only performs trivial computation & initialization is the bulk of the "work" in the proc. This feature would really be used with "large" objects, i.e. container-like objects
A lot of code in the wild could theoretically support "append" semantics in the way I have described or by simply providing the appropriate overloads as you suggest, but library authors probably don't because it is more intuitive to write code with a return value. I'd rather not have to fork and/or submit a PR to all the libraries I care to use, but I guess I'm forced to do so, either that or just write everything myself with my own coding standards :)
Obviously this suggestion is not a need, it is a practical convenience that I thought played nicely with current language design w.r.t the bounded result variable in proc implementations. I don't have anything quantitive on this anyway w.r.t performance, so my guess is: it's probably not faster when optimizations are turned on.
Lesson learnt for me though: I'll use out parameters more frequently in Nim or maybe I'll write a macro for fun that does this. My mind is still adjusting from modern C++ guidelines, e.g. https://abseil.io/tips/176
your suggestion requires library authors to write code a certain way as well
No, it doesn't require them to do so. But if the temporary allocation and shallow-copying of the contents of e.g. the result sequence is a performance concern, the author should provide an overload to append the result to an existing sequence. If the author is ignorant to this (quite obvious) problem, it is unreasonable to assume that the internals of the library are written with performance in mind. The real solution is to improve the library or use a different one, not try to monkey-patch it by introducing an ugly feature to the language for very marginal gains.
My mind is still adjusting from modern C++ guidelines Ah, that explains why I immediately felt the vibes of std::shared_from_this and similar abominations...
Obviously this suggestion is not a need, it is a practical convenience
I don't have anything quantitive on this anyway w.r.t performance, so my guess is: it's probably not faster when optimizations are turned on.
If performance is not what you're worried about, what is the benefit at all? Can you explain in more detail? Because in your parsePath example, addPath looks more convenient and clear than injecting something via result.
Giving out parameters append semantics is completely wrong, you want var parameters.
This is what I meant. Not clear, my bad. I understand that out used to be a keyword in Nim, which from my understanding is removed from the language.
If performance is not what you're worried about, what is the benefit at all? Can you explain in more detail?
Performance is what I care about. My intuition tells me performance will be better in debug builds. But, what I was stating earlier was that I do not have anything quantitive and in optimized builds I would expected this "performance benefit" to be close to 0 if the optimizer does it job w.r.t RVO, but this is an assumption.
The real solution is to improve the library or use a different one
The point is to let the compiler generate these overload variants. Why write the same code twice? That's the idea behind sugar.dup, but use more intuitive syntax with return types.
Take readFile currently to read N files you need to:
proc concatFiles(paths: openArray[string]): string =
for f in paths:
result &= readFile(f) # two separate allocations in debug build
vs
proc concatFiles(paths: openArray[string]): string =
for f in paths:
readFile(f, result=result) # no temporary allocations
Yes I could patch the stdlib or simply copy and paste the implementation to support this overloaded variant.
The first variant requires RVO to be optimal (which may or may not be applied, let's pray to the compiler or manually check it's output). The second works even in a debug build. This is discussed in Araq's blog actually, https://nim-lang.org/araq/destructors.html ("Return values are harmful"), this is 8yrs ago so not sure it still applies
Clearly this idea I have had is still in the "ruminating phase" to determine whether positives out-weight the negatives.
I don't see how the compiler can generate the overload automatically in a way that actually avoids temporary allocations.
Let's go with the readFile example. Here is the simplified implementation:
proc readFile(filename: string): string =
var f: File
if open(f, filename):
let len = getFileSize(f)
if len > 0:
result = newString(len)
discard readBuffer(f, addr(result[0]), len)
How should the generated version that accepts result should look like, and how does compiler get there?
If you think a routine requires a var x overload for optimization purposes, write an overload and make a PR. It's a trivial change. The amount of places where it measurably matters and such version doesn't exist yet is small, at least in the stdlib.
I don't think the problem of designing a performant and convenient APIs should be solved by adding dubious or, at least, uncommon language features. It's a hack where none is required.
From my point of view, the opposite situation is true: the stdlib lacks routines that are guaranteed to elide copies of the input used to construct the returned value. view types are underused, for once.
In many places there's only var versions and no returning ones, which makes the calling code messier with multiple var declarations used exactly twice: to fill and then to read once, instead of straight clean function chaining.
The compiler's optimizer (neither Nim's nor C's) do not help.
That is very unfortunate. Is this the problem of the character of the Nim-generated C code, that prevents optimizations? Does moving to intermediate representation in Nimony potentially could improve this situation?
How should the generated version that accepts result look like, and how does the compiler get there?
Rewrite the proc to support the append-only convention.
proc readFile(filename: string): string =
var f: File
if open(f, filename):
let
len = getFileSize(f)
currLen = result.len
if len > 0:
result.setLen(currLen + len)
discard readBuffer(f, addr(result[currLen]), len)
Yeah, OK now I'm basically writing readFile(filename: string, result: var string) or more accurately readFile(filename: string, result: out string) with a different syntax.
In many places there's only var versions and no returning ones, which makes the calling code messier with multiple var declarations used exactly twice: to fill and then to read once, instead of straight clean function chaining.
That's the problem which sugar.dup solves
That is very unfortunate. Is this the problem of the character of the Nim-generated C code, that prevents optimizations?
Yeah it is surprising, since C++ compilers perform this optimization all the time, or maybe it doesn't work as well as I think it does.
Does moving to intermediate representation in Nimony potentially could improve this situation?
I never did formalize the optimization successfully and haven't seen it done by anybody else, so no.
However, a more modern stdlib could use the pattern proc process(s: sink string): string = result = ensureMove(s); ... which accomplishes the same. (And which is why your "we must use more openArray[char]" isn't much of an optimization at all.)
Yeah it is surprising, since C++ compilers perform this optimization all the time, or maybe it doesn't work as well as I think it does.
No idea what you are talking about, no C++ compiler performs this optimization at all...
How would you use sugar.dup for this? I thought it was for situations where you want to use a proc that modifies its first argument as a pure function, like so:
echo "std::cout".dup(removePrefix("std::"))