nimforum mirror - a problem of a novice

nimer (orginal) [2016-05-26T18:05:56+02:00] view original

Couldn't render post #13883.

euant (orginal) [2016-05-26T18:09:20+02:00] view original

Do you have an example of the input file and the expected output?

nimer (orginal) [2016-05-26T18:10:49+02:00] view original

> @euant yes, both the input and output are correct

euant (orginal) [2016-05-26T18:27:51+02:00] view original

I'd be interested to know if this is any faster against your input:

import strutils

let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)

var ln: TaintedString = ""
var parts: seq[string]

while inFile.readLine(ln):
  parts = ln.split('\t')
  
  for idx, val in pairs(parts):
    parts[idx] &= "_mark"
  
  outFile.writeLine(parts.join(", "))

outFile.close()
inFile.close()

euant (orginal) [2016-05-26T18:28:46+02:00] view original

Also, make sure to compile it with -d:release.

nimer (orginal) [2016-05-26T18:46:44+02:00] view original

@euant I've tried your version, and it's indeed faster than mine, about 42s. the -d:release option is added to both. But I'm curious why it's still slower than python?

euant (orginal) [2016-05-26T19:07:13+02:00] view original

@nimer: I'm not sure. Can you please post your Python code, along with the "input.txt" file? I'm sure my version can probably be optimised further.

nimer (orginal) [2016-05-26T19:33:19+02:00] view original

Couldn't render post #13890.

euant (orginal) [2016-05-26T20:07:03+02:00] view original

Ok, I've just tested a few different versions. Here are my results, against a 23KB (2000 line) file.

main.nim - uses a loop and a split

import strutils

let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)

var ln: TaintedString = ""
var parts: seq[string]

while inFile.readLine(ln):
  parts = ln.split('\t')
  
  for idx, val in pairs(parts):
    parts[idx] &= "_mark"
  
  outFile.writeLine(parts.join(", "))

outFile.close()
inFile.close()

main_sequtils.nim - uses sequtils

import strutils, sequtils

let inFile = open("input.txt", fmRead)
let outFile = open("output.txt", fmWrite)

var ln: TaintedString = ""

while inFile.readLine(ln):
  outFile.writeLine(map(ln.split('\t'), proc(x: string): string = x & "_mark").join(", "))

outFile.close()
inFile.close()

main_lines.nim - uses the lines() proc

import strutils

let
    inFile = open("input.txt", fmRead)
    outFile = open("output.txt", fmWrite)

var parts: seq[string]

for ln in inFile.lines:
  parts = ln.split('\t')
  
  for idx, val in pairs(parts):
    parts[idx] &= "_mark"
  
  outFile.writeLine(parts.join(", "))

outFile.close()
inFile.close()

main_streams.nim - uses the streams module

import strutils, streams

let
    inFile = newFileStream("input.txt", fmRead)
    outFile = open("output.txt", fmWrite)

var
    parts: seq[string]
    ln = ""

while inFile.readLine(ln):
  parts = ln.split('\t')
  
  for idx, val in pairs(parts):
    parts[idx] &= "_mark"
  
  outFile.writeLine(parts.join(", "))

outFile.close()
inFile.close()

main_streams_sequtils - uses the streams module, with sequtils

import strutils, streams, sequtils

let
    inFile = newFileStream("input.txt", fmRead)
    outFile = open("output.txt", fmWrite)

var ln = ""

while inFile.readLine(ln):
  outFile.writeLine(map(ln.split('\t'), proc(x: string): string = x & "_mark").join(", "))

outFile.close()
inFile.close()

My results are as shown below, using Python 3.5.1 and Nim 0.13.0:

euant (orginal) [2016-05-26T20:10:48+02:00] view original

I've also just tested again with a much bigger file, at 5.9MB (512000 lines):

My input data is very simple, of the form:


need	more
data	to
process	for
testing	purposes
test	test
test	1234
testing	5678
hello	8910
another	test
only	short

Does your file have many columns? Perhaps that impacts the results?

Stefan_Salewski (orginal) [2016-06-01T19:35:14+02:00] view original

@nimer

Do you really expect that the few Nim developers can provide an optimized and tuned proc for all possible use cases?

I can not remember that I was ever in need for a very fast spilt by string! And speed is one optimization, small and simple code is another, maybe contradicting. Maybe you can provide a fast and elegant split function, I assume that developers will accept it. If you have no idea yourself how to do it, you may look for C solutions, maybe C included in Python lib. But care for copyright.

Araq (orginal) [2016-06-01T19:49:51+02:00] view original

Well, I figured it takes me less time to improve it than to answer here, so there you go, now it's at least not embarrassing slow anymore. But in general Stefan is right, Nim gives you the building blocks for great performance, but many things in the stdlib are not particularly optimized (yet).

nimer (orginal) [2016-06-01T20:18:24+02:00] view original

well, I'm just kind of frustrated by the inefficiency of the split function, as I have quite a lot text files to process for reformatting right now. Anyway, there will be a solution.

jibal (orginal) [2016-06-02T06:09:41+02:00] view original

I thought it should be close to C's speed since Nim compiles the source to C code.

That does not in any way follow, since most of the time is spent executing code that isn't in your source.

jibal (orginal) [2016-06-02T06:12:40+02:00] view original

the split() function in nim sucks! it's paticularly slow when splitting by string (not a char), several times slower.

It's an open source project. If you think it sucks, make it not suck.

I can't believe nim has such a awful performance.

Nim doesn't have awful performance.

nimer (orginal) [2016-06-02T09:14:30+02:00] view original

@jibal I'm just talking about the split function and my experience. Don't be so sensitive, Nim defender :P

yglukhov (orginal) [2016-06-02T09:32:03+02:00] view original

@nimer, have you tried the latest nim devel, specifically after this commit?

nimer (orginal) [2016-06-02T10:26:06+02:00] view original

@yglukhov not yet, I don't know to what extent the situation has been improved, but I'll give it a try.

nimer (orginal) [2016-06-02T11:54:15+02:00] view original

@yglukhov

you are right, the devel has made a big improvement on the splitting by string. Very positive feedback, only half of the time is needed, compared with the current stable version. Excellent job!

Mirror of forum.nim-lang.org

2282 :: a problem of a novice