nimforum mirror - Iterator as argument of iterator

RPG (orginal) [2016-02-08T22:06:19+01:00] view original

I want to implement a pythonic thing:


def myiter(anotheriter):
    for i in range(0, 3):
        for j in anotheriter:
            yield j

d = {1: 4, 2: 5, 3: 6}
for i in myiter(d.values()):
    print(i)
for i in myiter(d.keys()):
    print(i)

It there a way to do the same in Nim?

import tables
iterator myiter(anotheriter: iterator): int =
  for i in 0..<3:
    for j in anotheriter:
      yield j

let d = {1: 4, 2: 5, 3: 6}.toTable
for i in myiter(d.values()): # test.nim(8, 18) Error: attempting to call undeclared routine: 'values'
  echo(i)
for i in myiter(d.keys()):
  echo(i)

Araq (orginal) [2016-02-09T17:20:13+01:00] view original

Yes, use a .closure iterator.

http://nim-lang.org/docs/manual.html#iterators-and-the-for-statement-first-class-iterators

RPG (orginal) [2016-02-09T21:32:01+01:00] view original

An inline iterator is an iterator that's always inlined by the compiler leading to zero overhead for the abstraction, but may result in a heavy increase in code size. Inline iterators are second class citizens; They can be passed as parameters only to other inlining code facilities like templates, macros and other inline iterators.

I want to pass one inline iterator (tables.values) to another inline iterator (myiter). Using a .closure. in this case means that I need to create a closure wrapper for every iterator in system.nim?

I just want to avoid copying overhead for iterable data structures like tables:

import tables, sequtils
let d = {1: 4, 2: 5, 3: 6}.toTable

# current implementation:
iterator myiter[T](anotheriter: varargs[T]): T =
  for i in 0..<3:
    for j in anotheriter:
      yield j

for i in myiter(toSeq(d.values())): echo i

# optimized variant doesn't compile:
iterator myiteropt[T](anotheriter: iterator): T =
  for i in 0..<3:
    for j in anotheriter:
      yield j

for i in myiteropt(d.values()): echo i

def (orginal) [2016-02-09T22:13:21+01:00] view original

Something like this? https://github.com/def-/nim-iterutils/commit/63d2891e5d9964c0949cec197fad8df2e7041189

cdunn2001 (orginal) [2016-02-09T23:12:54+01:00] view original

Wow. That is some very informative code. I remember how much I liked Python's itertools when it came out. This is better!

Krux02 (orginal) [2016-02-10T19:11:07+01:00] view original

@def

That is some decent work, there. Would in be possible to have an automatic convergence of inline iterators to closure iterators? Honestly I think something like this should at some point replace sequtils entirely.

RPG (orginal) [2016-02-10T19:45:06+01:00] view original

@def this is it, thanks! Looks like this should be inside sequtils.

Would in be possible to have an automatic convergence of inline iterators to closure iterators?

Nim allows to write pattern-based optimizations, someone skilled could implement this:)

UPD. But there is an issue.


#!/usr/bin/env python
def myiter(anotheriter):
    for i in range(0, 3):
        for j in anotheriter:
            yield j

d = {1: 4, 2: 5, 3: 6}
# output: [4, 5, 6, 4, 5, 6, 4, 5, 6]
print([x for x in myiter(d.values())])

import future, tables
let d = {1: 4, 2: 5, 3: 6}.toTable

template toClosure*(i): auto =
  iterator j: type(i) {.closure.} =
    for x in i:
      yield x
  j

iterator myiteropt(anotheriter: iterator): auto =
  for i in 0..<3:
    for j in anotheriter():
      yield j

# output: @[4, 5, 6]
echo lc[x | (x <- myiteropt(toClosure(d.values()))), int]
# expected @[4, 5, 6, 4, 5, 6, 4, 5, 6]

RPG (orginal) [2016-02-10T20:44:54+01:00] view original

Not sure if this is a bug, but:

template toClosure(i): auto =
  iterator j: type(i) {.closure.} =
    for x in i:
      yield x
  j

iterator myiteropt(anotheriter: iterator): auto =
  for i in 0..<3:
    for j in anotheriter():
      yield j

proc x[T](a: openArray[T]) =
  for i in myiteropt(toClosure(a.items())): # Error: illegal capture 'a'
    echo i

x([1,2,3])
# this compiles
# for i in myiteropt(toClosure([1,2,3].items())):
#   echo i

def (orginal) [2016-02-10T21:55:34+01:00] view original

But there is an issue.

I know you've seen this fixed in my repo, but for everyone else, it works if you deepCopy the iterator:

import future, tables
let d = {1: 4, 2: 5, 3: 6}.toTable

template toClosure*(i): auto =
  iterator j: type(i) {.closure.} =
    for x in i:
      yield x
  j

iterator myiteropt(anotheriter: iterator): auto =
  for i in 0..<3:
    var iter: type(anotheriter)
    iter.deepCopy(anotheriter)
    for j in iter():
      yield j

echo lc[x | (x <- myiteropt(toClosure(d.values()))), int]

Not sure if this is a bug

It works with a seq, openarrays can't be captured:

template toClosure(i): auto =
  iterator j: type(i) {.closure.} =
    for x in i:
      yield x
  j

iterator myiteropt(anotheriter: iterator): auto =
  for i in 0..<3:
    for j in anotheriter():
      yield j

proc x[T](a: seq[T]) =
  for i in myiteropt(toClosure(a.items())):
    echo i

x(@[1,2,3])

RPG (orginal) [2016-02-16T20:43:30+01:00] view original

I decided to use closure iterator as optimization to avoid allocation of temporary array. But in my tests closure iterator is a slowest possible solution, it is even slower than converting iterator to sequence.

# nim c -d:release "test.nim" && ./test 20000
import os, strutils, times

proc printf(formatstr: cstring) {.importc: "printf", varargs,
                                  header: "<stdio.h>".}
template bench(descr, body) =
  block:
    let timeStart = cpuTime()
    body
    printf("%-30s %.3f\n", descr, cpuTime() - timeStart)

iterator testIter(): int =
  for i in 1..parseInt(paramStr(1)): yield i

template toClosure*(i): auto =
  iterator j: type(i) {.closure.} =
    for x in i:
      yield x
  j

iterator combinations1[T](x: varargs[T]): (T, T) =
  for i in x:
    for j in x:
      yield (i, j)

iterator combinations2[T](it: iterator(): T): (T, T) =
  var iter1: type(it)
  iter1.deepCopy(it)
  for i in iter1():
    var iter2: type(it)
    iter2.deepCopy(it)
    for j in iter2():
      yield (i, j)

template combinations3(it: expr, body: stmt) {.immediate.} =
  for i in it:
    for j in it:
      let u {.inject.} = i
      let v {.inject.} = j
      body

bench("convert to seq"):
  var res = 0
  var s: seq[int] = @[]
  for i in testIter():
    s.add(i)
  for u, v in combinations1(s):
    res += u * v
  echo res
bench("closure iterator"):
  var res = 0
  for u, v in combinations2(toClosure(testIter())):
    res += u * v
  echo res
bench("template"):
  var res = 0
  combinations3(testIter()):
    res += u * v
  echo res

Results for N=20000, gcc optimization level -O2:


40004000100000000
convert to seq                 0.367
40004000100000000
closure iterator               2.049
40004000100000000
template                       0.304

Results for N=20000, gcc optimization level -O3 (template version is twice slower!):


40004000100000000
convert to seq                 0.366
40004000100000000
closure iterator               2.054
40004000100000000
template                       0.610

Clang results are also interesting (-02 or -O3):


40004000100000000
convert to seq                 0.934
40004000100000000
closure iterator               2.359
40004000100000000
template                       0.002

So my questions are:

Why closures so slow?

Why passing inline iterator as argument is impossible?

Is there a way to make iterator work as template?

Tarmean (orginal) [2016-02-17T16:30:33+01:00] view original

Iirc you basically throw the stack frame onto the heap when using a closure, I think nim stores them as a tuple of pointers to the proc and the environment respectively. Then you copy them for each iteration of the inner loop which isn't exactly great if you want to run fast. It might be a lot closer if you could reset the inner iterator directly but I am not sure whether that is possible in nim without going into unsafe territory. The third example is literally translated into something like:

for i in testIter():
  for j in testIter():
     var u {.inject.} = i
     var v {.inject.} = j
     res += u * v

This could bloat the code if the iterators are complex and it isn't as powerful as first class iterators but it is unsurprisingly a lot faster.

I feel like it should be possible to solve this with a macro but I am not entirely sure how and only tried for a couple minutes. My first idea was some kind of partial macro application - to use a macro which creates a macro which can be called with a second iterator and code block. But I am not sure how to add the first iterator call in there, just throwing it into the second macro's ast doesn't work because then the macro tries to use the for loops instead of creating them. Grabbing the nodes from the call site and capturing them in the inner macro with something like newNimNode(nnkForStmt).add ident"i", iteratorCall... would be ideal but for some reason the everything started breaking when I tried. But then, I am awful with macros so it probably is possible.

Edit: Fun fact, if you create a macro within a macro the inner one can capture the result variable and break things.

macro outer(name, first: untyped): stmt =
  var
    loops = newNimNode(nnkForStmt).add(
      newNimNode(nnkAccQuoted).add(ident"arg1"),
      first,
      newNimNode(nnkForStmt).add(
        newNimNode(nnkAccQuoted).add(ident"arg2"),
        newNimNode(nnkAccQuoted).add(ident"second"),
        newStmtList().add(
          newNimNode(nnkAccQuoted).add(ident"exp")
        )
      )
    )
    quoted = newCall(
      ident"quote",
      newNimNode(nnkDo).add(
        newEmptyNode(),
        newEmptyNode(),
        newEmptyNode(),
        newEmptyNode(),
        newEmptyNode(),
        newEmptyNode(),
        newStmtList(loops)
      )
    )
  result = newStmtList().add(
    newNimNode(nnkMacroDef).add(
      name,
      newEmptyNode(),
      newEmptyNode(),
      newNimNode(nnkFormalparams).add(
        ident"untyped",
        newNimNode(nnkIdentDefs).add(
          ident"second",
          ident"arg1",
          ident"arg2",
          ident"exp",
          ident"untyped",
          newEmptyNode()
        )
      ),
      newEmptyNode(),
      newEmptyNode(),
      newStmtList(newNimNode(nnkReturnStmt).add(quoted))
    )
  )

outer(myIt, countdown(2, 0))
myIt(countup(0, 2), x, y):
  echo x, " ", y
#results in
# StmtList
#   ForStmt
#     StmtList
#       Ident !"x"
#     Call
#       Sym "countdown"
#       IntLit 2
#       IntLit 0
#     ForStmt
#       StmtList
#         Ident !"y"
#       StmtList
#         Call
#           Sym "countup"
#           IntLit 0
#           IntLit 2
#       StmtList
#         StmtList
#           Command
#             Sym "echo"
#             Ident !"x"
#             StrLit
#             Ident !"y"

This kinda works but is ugly. Is there a better way to use these types of macros?

RPG (orginal) [2016-02-17T21:32:46+01:00] view original

Thanks for your response.

IIUC this macro also expands at callsite and behaves like inline iterator. I'm working on graph algorithms and some iterators are really big. But I'm not sure if it is possible to create fast non-inline iterator. Probably it should be proc with callback argument.

Example of big iterator: https://gist.github.com/scriptum/34933a248dc22d0dd516#file-digraph-nim-L158

This is depth-first search algorithm and I cannot make it work with iterator as source of nodes. Fortunately it not used frequently and inlining shouldn't hurt much.

# I want to have flexible interface of DFS traverse that accepts iterator or array or single node
for u, v in graph.dfs("source_node"): # ok
...
for u, v in graph.dfs(["source_node1", "source_node2"]): # ok
...
var t = {"source_node1": 1, "source_node2": 2}.toTable
# no way to do this!
for u, v in graph.dfs(t.keys()):
...
for u, v in graph.dfs(t.values()):
...

Currently I prefer python+networkx for small graphs because it is flexible enough with iterators.

Mirror of forum.nim-lang.org

2027 :: Iterator as argument of iterator