Recently I encountered a bottleneck while using parseEnum for some field inside a large CSV file.
parseEnum represented about 60% of the runtime, most of it due to repeated string allocation in a loop.
# strutils module.
proc parseEnum*[T: enum](s: string): T =
## Parses an enum ``T``.
##
## Raises ``ValueError`` for an invalid value in `s`. The comparison is
## done in a style insensitive way.
for e in low(T)..high(T):
if cmpIgnoreStyle(s, $e) == 0:
return e
raise newException(ValueError, "invalid enum value: " & s)
Since 'MyEnum' had many values, rather than hand-rolling a custom parseEnum proc, I wrote it using macros to autogenerate the case switch.
import macros
macro case_list(typ: typedesc[enum]; has_error: static[bool]): untyped =
result = newStmtList()
let cas = newNimNode(nnkCaseStmt)
for node in getType(getTypeInst(typ)[1]):
if node.kind == nnkEmpty:
cas.add ident"s"
else:
cas.add newNimNode(nnkOfBranch).add( newStrLitNode($node),
newAssignment(ident"result", node)
)
if has_error:
cas.add newNimNode(nnkElse).add(
newNimNode(nnkRaiseStmt).add(
newCall(ident("newException"), ident("ValueError"),
infix(newStrLitNode("invalid enum value: "), "&", ident("s"))
)
)
)
else:
cas.add newNimNode(nnkElse).add(
newAssignment(ident"result", ident("default"))
)
proc parseEnum[T: enum](s: string): T =
case_list(T, true)
proc parseEnum[T: enum](s: string; default: T): T =
case_list(T, false)
It would be possible to replace the parseEnum proc in strutils by those above, but it would then import the macros module even if those function are not used. It also substantially slowdown nim compilation to c.
I think this is the right approach but would probably need to be implemented as compiler magic.
I have some idea how to do this, but would prefer an official okay to avoid wasting time.
Ops, I will be affected as well. In my case I will have lots of non consecutive enums hence it will be even slower.
@Araq, would you accept the fix with fields() and fieldPairs() magic iterators to start working on enums. I think it makes a lot of sense for non-ordinal enums to be able to iterate ids directly, not only low(T)..high(T) hitting lots of values that do not belong to the enum.
fields() # will iterate ints
fieldPairs() # will iterate (string, int) or (cstring, int) or (string{lit}, int)
The parseEnum implementation will become
proc parseEnum*[T: enum](s: string): T =
## Parses an enum ``T``.
##
## Raises ``ValueError`` for an invalid value in `s`. The comparison is
## done in a style insensitive way.
for name, id in fieldPairs(T):
if cmpIgnoreStyle(s, name) == 0:
return id
raise newException(ValueError, "invalid enum value: " & s)
and no string allocations.
@def: ha I knew I had already seen some code that did this, but I couldn't remember where.
@cdome: That's a good idea. I have some half working version using RTTI hack but it isn't all that fast.
proc parseEnum[T: enum](s: string): T =
let typ = cast[PNimType](getTypeInfo(T(0)))
let n = typ.node
var values = n.sons
for i in 0..<n.len:
#cmpIgnoreStyle(cstring, cstring)
if cmpIgnoreStyle(s, values[i].name) == 0:
return T(values[i].offset)
I think thats how $enum string are generated from.
from memory look for reprEnum proc
would you accept the fix with fields() and fieldPairs() magic iterators to start working on enums. I think it makes a lot of sense for non-ordinal enums to be able to iterate ids directly, not only low(T)..high(T) hitting lots of values that do not belong to the enum.
Not sure I like it, with fieldPairs it's still O(N) where the string case is O(1).
Hi Araq, could please elaborate "string case is O(1)" a bit more?
O(1) means hashtables to me, hence I presume something along these lines (not a final version):
import tables
import hashes
proc genTableForEnum[T: enum](): Table[string, T] =
result = initTable[string, T](rightSize(ord(high(T)) - ord(low(T)) + 1))
for i in low(T)..high(T):
# TODO: skip holes in enum
result[$i] = i
proc parseEnum*[T: enum](s: string): T =
proc hash(s: string): Hash =
# BUG: you can't mixin new Hash function into table
result = hashIgnoreStyle(s)
const enumTable = genTableForEnum[T]()
result = enumTable[s]
This one looks really efficient except hash table is bigger than it could have been. I couldn't make tables module to work because I couldn't replace a standard hash with hashIgnoreStyle. I guess tables needs "mixin hash" is several places. Module gentabs did work, but it is deprecated.
Let me know your thoughts.
Would you accept the pull request with compile time hashtable implementation as in example I have provided? Of course in the finished state with handling of all edge cases, tests cases and stuff.
IMO, Parashurama's macro and internal Nim's string case transformation to hash table in order to handle cmpIgnoreCase would require extra allocation of temporary string.
Ideal implementation would have hash and == inside parseEnum, but mixin stops working in this way. Looks like yet another bug
import tables
import hashes
import strutils
proc genTableForEnum[T: enum](): Table[string, T] =
result = initTable[string, T](rightSize(ord(high(T)) - ord(low(T)) + 1))
for i in items(T):
# TODO: ignore holes
result[$i] = i
proc hash(s: string): Hash {.inline.} =
hashIgnoreStyle(s)
proc `==`(s1, s2: string): bool {.inline.} =
cmpIgnoreStyle(s1, s2) == 0
proc parseEnum*[T: enum](s: string): T =
const enumTable = genTableForEnum[T]()
enumTable[s]