Hi,
I'm thinking/dreaming about improving the collections in Nim, or create a separate library. This post is not a proposal for an actual solution, just some play with the code. Please give me some feedback how to continue.
Background: I've done most of the exercises on https://www.4clojure.com/ , if you are unfamiliar with Clojure or with this site, that's a couple of weeks amazing experiment. The code below is a copy from https://clojuredocs.org/clojure.core/lazy-seq
So the following code creates the lazy list of all the primes, then we print out the first 25 of them. The code is divided into to parts, building the general functions (see their comments for examples), and implementing the actual lazy list with the function sieve.
import future
# (f, x0) -> x0; f(x0); f(f(x0)); ...
proc myIterate[T](f: (proc(anything: T): T), x0: T): (iterator(): T) =
iterator t(): T {.closure.}=
var x = x0
while true:
yield x
x = f(x)
result = t
# (1;2;3;4;5, x>3) -> 4;5
proc myFilter[T](it: (iterator(): T), f: (proc(anything :T):bool) ): (iterator(): T) =
iterator t2(): T {.closure.}=
for x in it():
if f(x):
yield x
result = t2
# (1;2;3;4;5, 3) -> 1;2;3
# (1;2, 3) -> 1;2
proc myTake[T](it: (iterator(): T), n: int): (iterator(): T) =
iterator t(): T {.closure.}=
var s = 0
while s < n:
s += 1
var r = it()
if finished(it):
break
yield r
result = t
proc myInc(x: int): int=
x+1
# (1;2;3) -> @[1,2,3]
proc mySeq[T](it: (iterator(): T)): seq[T] =
result = newSeq[T]()
for x in it():
result.add(x)
# I might have found a compiler bug, I need this line to demonstrate:
let vv = myFilter(myIterate(myInc, 0), (x:int) => x>15)
# (2;3;4;5;6;...) -> 2; and items from sieve(3;5;7;9;11;13;15;...)
# (3;5;7;9;11;...) -> 3; and items from sieve(5;7;11;13;...)
proc sieve(it: (iterator(): int)): (iterator(): int)=
iterator t(): int {.closure.}=
var first = it()
yield first
var rest = sieve(myFilter(it, (x:int) => x mod first != 0))
for x in rest():
yield x
result = t
# print out the first 25 primes
echo sieve(myInc.myIterate(2)).myTake(25).mySeq()
# result: @[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Some remarks:
I should have used a template for generating these functions, I'll try to do that, now it is easier to read this way.
All the calculation is done on stack (checked this with valgrind), there is no heap usage. This is good in general, but in this extreme case we could have stack overflow.
The above code is slow, it takes 10 seconds to generate 20000 primes. I know that this is an overusing of iterators. The slowness is not coming from printing, and probably not from the divisions (I'll check this). I don't know whether this can be improved. Clojure is slow in general when using lazy lists. Update: I've did some benchmark tests, and it is not that slow. The Eratosthenes sieve finds the first 20k primes in 0.002 sec, but this algorithm is different, it calculates modulus all the time, the equivalent algorithm without iterators runs in 2.2 secs, so I guess the 5 times overhead is understandable because of the function calls.
I might have found a compiler bug. If I remove the unused variable "vv" AND rename "t2" to "t" in myFilter, then it doesn't compile. If I modify only one of these (i.e. remove the unused variable "vv" OR rename "t2" to "t" in myFilter), then it still compiles and runs fine. I'll open a github issue if you don't have an easy explanation for this.
The function names mimic Clojure's, but sometimes I changed their parameters' order. This felt better this way in Nim, because we could write things like list.myTake(10) to keep only the first 10 elements.
Thanks, Peter
This is looking very interesting. I think the bid advantage of such a design is that it avoids the necessity to implement something like e.g. map for each collection individually. An iterator can serve as a common denominator for all collections. The only important thing is that a collection can be "converted to" and "build from" an iterator. I fancy something like this:
someCollection.toIterator # i.e. 'items'
.map(...) # or filter, flatMap, take, drop, whatever
.build(someOtherCollection)
In this case the whole map/filter/... shebang only has to be implemented for iterator.
Hi bluenote,
There are other advantages. You can apply filter, map, partition-by to a sequence as well, but with iterators you don't have to allocate memory for the inner steps, all the values are calculated on demand. I've heard about a big data project in python, where iterators are used all the time not to copy 8GB memory at each step. Back to Nim: you can also postpone the calculation until it is really needed (or sometimes they are not needed at all).
There is one obvious slowdown with map: if you use map a sequence, then you know how large sequence you have to allocate for the result. This information is lost when you use an iterator. Rust uses a function called size_hint() to make this work (see https://doc.rust-lang.org/std/iter/trait.Iterator.html ).
Thanks, Peter