I'd be interested to know if this is any faster against your input:
import strutils
let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)
var ln: TaintedString = ""
var parts: seq[string]
while inFile.readLine(ln):
parts = ln.split('\t')
for idx, val in pairs(parts):
parts[idx] &= "_mark"
outFile.writeLine(parts.join(", "))
outFile.close()
inFile.close()
Ok, I've just tested a few different versions. Here are my results, against a 23KB (2000 line) file.
main.nim - uses a loop and a split
import strutils
let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)
var ln: TaintedString = ""
var parts: seq[string]
while inFile.readLine(ln):
parts = ln.split('\t')
for idx, val in pairs(parts):
parts[idx] &= "_mark"
outFile.writeLine(parts.join(", "))
outFile.close()
inFile.close()
main_sequtils.nim - uses sequtils
import strutils, sequtils
let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)
var ln: TaintedString = ""
while inFile.readLine(ln):
outFile.writeLine(map(ln.split('\t'), proc(x: string): string = x & "_mark").join(", "))
outFile.close()
inFile.close()
main_lines.nim - uses the lines() proc
import strutils
let
inFile = open("input.txt", fmRead)
outFile = open("output.txt", fmWrite)
var parts: seq[string]
for ln in inFile.lines:
parts = ln.split('\t')
for idx, val in pairs(parts):
parts[idx] &= "_mark"
outFile.writeLine(parts.join(", "))
outFile.close()
inFile.close()
main_streams.nim - uses the streams module
import strutils, streams
let
inFile = newFileStream("input.txt", fmRead)
outFile = open("output.txt", fmWrite)
var
parts: seq[string]
ln = ""
while inFile.readLine(ln):
parts = ln.split('\t')
for idx, val in pairs(parts):
parts[idx] &= "_mark"
outFile.writeLine(parts.join(", "))
outFile.close()
inFile.close()
main_streams_sequtils - uses the streams module, with sequtils
import strutils, streams, sequtils
let
inFile = newFileStream("input.txt", fmRead)
outFile = open("output.txt", fmWrite)
var ln = ""
while inFile.readLine(ln):
outFile.writeLine(map(ln.split('\t'), proc(x: string): string = x & "_mark").join(", "))
outFile.close()
inFile.close()
My results are as shown below, using Python 3.5.1 and Nim 0.13.0:
My input data is very simple, of the form:
need more
data to
process for
testing purposes
test test
test 1234
testing 5678
hello 8910
another test
only short
Does your file have many columns? Perhaps that impacts the results?
@nimer
Do you really expect that the few Nim developers can provide an optimized and tuned proc for all possible use cases?
I can not remember that I was ever in need for a very fast spilt by string! And speed is one optimization, small and simple code is another, maybe contradicting. Maybe you can provide a fast and elegant split function, I assume that developers will accept it. If you have no idea yourself how to do it, you may look for C solutions, maybe C included in Python lib. But care for copyright.
I thought it should be close to C's speed since Nim compiles the source to C code.
That does not in any way follow, since most of the time is spent executing code that isn't in your source.
the split() function in nim sucks! it's paticularly slow when splitting by string (not a char), several times slower.
It's an open source project. If you think it sucks, make it not suck.
I can't believe nim has such a awful performance.
Nim doesn't have awful performance.