nimforum mirror - Performance against Python: find substring in string

DeletedUser (orginal) [2021-04-07T16:20:37+02:00] view original

Hi guys! I am completely new here, so hey!

I am a data scientist and a Python and R developer, working a little in C, and only recently have I met Nim. What really interested me in it is its performance. But problem is, I don't see this performance. I am sure I am doing something wrong, so please have a look at the below scripts and let me know what I am doing wrong.

I am comparing the corresponding Python and Nim scripts, which (in a long loop) search for a substring within a string. Here's the Python script:

import datetime
import time

N = 100000000

full_string = """I think this very thing is not like the thing that I think it is
I would be happy to eat a banana collected from an apple tree, though I am not really sure
it would taste like a banana - rather like an apple. What's the point in eating a banana that
tastes like an apple? Or, for that matter, eating an apple that tastes like a banana?

Some day I will explain you everything in detail. Today, gotta go since I have some performance
comparisons to do. Bye!
"""

sub_string = "gotta go since"

def find_string_in_py(string, substring):
    return sub_string in full_string

def repeat(full_string, substring, n_iter):
    start_time = time.time()
    for i in range(n_iter):
        find_string_in_py(full_string, substring)
    elapsed_time = time.time() - start_time
    print(str(datetime.timedelta(seconds=round(elapsed_time))))

repeat(full_string, sub_string, N)

And this is the corresponding (is it?) Nim script:

import strutils, times

const full_string = """I think this very thing is not like the thing that I think it is
I would be happy to eat a banana collected from an apple tree, though I am not really sure
it would taste like a banana - rather like an apple. What's the point in eating a banana that
tastes like an apple? Or, for that matter, eating an apple that tastes like a banana?

Some day I will explain you everything in detail. Today, gotta go since I have some performance
comparisons to do. Bye!
"""

const sub_string = "gotta go since"
const N = 100000000

proc find_string_in_nim(fullstring: string, substring: string): bool =
    result = sub_string in full_string

proc repeat(full_string: string, substring: string, n_iter: int): void =
    let time = cpuTime()
    for i in 0..n_iter:
        discard find_string_in_nim(full_string, substring)
    echo "Time taken: ", cpuTime() - time

repeat(full_string, sub_string, N)

I am compiling the nim script using the simplest command:

nim c myscript.nim

The results are astonishing, with Python requiring 18 seconds and Nim... 323 seconds. (I did the same in C, and it took even less time than Python, but let's not worry about that for the moment.) I really think I must have done something wrong, and will be obliged for your help in pointing out what it was. Something with the loop maybe? The idea is to run the find_string_in_nim() function N times, that's it. The results are discarded.

(Please note that I am not asking why Nim is so slower than Python but rather what I did wrong!)

Thanks!

Araq (orginal) [2021-04-07T16:28:27+02:00] view original

Use the -d:release or -d:danger flag, I think pretty much every tutorial mentions it.

DeletedUser (orginal) [2021-04-07T18:36:04+02:00] view original

Thanks. Yup, tutorial might mention this, but the first official Nim tutorial from here: says: "By default, the Nim compiler generates a large number of runtime checks aiming for your debugging pleasure. With -d:release some checks are turned off and optimizations are turned on." Well I see optimizations are turned on but would you say that this say, don't bother with compiling without this flag because you will get some dramatically poor performance? It does not say so, so I did not use it.

Anyway, in my machine (working in HP G7, WSL 1 under a Windows machine, 8 cores, 32 GB of RAM), I still get 29 seconds for nim c -d:release only_nim.nim and 17.6 secs for nim c -d:danger only_nim.nim, the former being quite worse than Python and the latter being slightly better.

Given 18 seconds of Python, I would say this is not very promising. After reading some performance comparisons, I hoped to see like ~2-4 times better performance.

Then I changed the substring to be searched to "gotta go sincedd" (not to be found), and here Python performed slightly better (16 against 18 seconds) than Nim with the d:danger flag.

I will tomorrow be able to check it also on another machine, but this is not promising. Any ideas what might be happening?

doofenstein (orginal) [2021-04-07T18:52:38+02:00] view original

stories like this happened a few times like this before in this forum, where someone compared the performance of Python and Nim with string operation.

When you're doing string operations in Python the vast majority of work is done by well optimised routines written in C, so the performance will be similar.

Nim starts to shine once you want to write even more optimised (maybe with simd) or custom routines, because you then don't need to switch programming language or write wrappers.

ynfle (orginal) [2021-04-07T18:58:53+02:00] view original

nim 1.0.6 is very old, try 1.4.4 or even devel

DeletedUser (orginal) [2021-04-07T19:06:27+02:00] view original

Sorry, I did not find anything similar to my comparisons (I just searched, I did not read the forum, though).

On the one hand, it makes sense. On the other hand, I was expecting much better performance anyway. Just checked on ZBook (again, WSL 1, 8 cores, 32 GB RAM), and again Python was slightly better.

I understand this is due to Python using some really optimized algorithms for string operations (bravo Python!). On the other hand, C is about 4 times better in the same thing than Python (though it's possible that it's just quicker in looping, and we how efficient looping can be in C and how inefficient in Python).

But does this all mean that to make Nim shine indeed, it's not enough to write in some standard way, but I should go deep into the language and use some advanced optimization tools, even when I am doing so simple as using the substring in string search? Optimization tools exist also for Python, and sometimes using them it's possible to get tens or hundreds times of improvement in performance. I was hoping to use Nim in order to be sure I would get better performance. Here, I just prototyped some simple functions in Nim and Python, and I optimized them in neither Nim nor Python. Both lannguages still have some tools for optimization, and can I be sure I would do better indeed using Nim? Most comparisons I saw were very promising, but the first thing I am checking myself, and something like that...

Araq (orginal) [2021-04-07T19:07:36+02:00] view original

Well you expect 2-4 times better performance for a program that spends all its time in the Python interpreter's C code. And that's simply unreasonable. In addition to that you use a severely outdated Nim version and Nim's find operation uses the Boyer-Moore Horspool algorithm which is typically worse than an SSE brute force solution would be.

Araq (orginal) [2021-04-07T19:13:26+02:00] view original

This version runs faster for me:



import times

const full_string = """I think this very thing is not like the thing that I think it is
I would be happy to eat a banana collected from an apple tree, though I am not really sure
it would taste like a banana - rather like an apple. What's the point in eating a banana that
tastes like an apple? Or, for that matter, eating an apple that tastes like a banana?

Some day I will explain you everything in detail. Today, gotta go since I have some performance
comparisons to do. Bye!
"""

const sub_string = "gotta go since"
const N = 100000000

proc strstr(haystack, needle: cstring): cstring {.importc, header: "<string.h>".}

proc contains(haystack, needle: string): bool = strstr(cstring(haystack), cstring(needle)) != nil

proc find_string_in_nim(fullstring: string, substring: string): bool =
    result = sub_string in full_string

proc repeat(full_string: string, substring: string, n_iter: int): void =
    let time = cpuTime()
    for i in 0..n_iter:
        echo find_string_in_nim(full_string, substring)
    echo "Time taken: ", cpuTime() - time

repeat(full_string, sub_string, N)

DeletedUser (orginal) [2021-04-07T19:20:13+02:00] view original

Sorry, this is what comes in Ubuntu after sudo apt-get nim. If this is severely outdated, then it's bad - but again, it's what comes with it. I tried to install the newest version from the nim web page, but I failed and before spending time on that, I downloaded it using apt-get.

Sorry that I disappointed you so much with my question. I don't know Nim and just want to learn it and what's happening. I don't care if my Python code spends almost all the time in C; what counts for me is that it's quick. I know Python uses C, but should I compare Python with Nim not using Python that uses C? That would be crazy, I think. Python is Python, when it's using C and when it's not.

If Nim is not for such poor developers as I am (as I mentioned, I am a data scientists - I am not a programmer), then OK, it's not for me, too difficult. Maybe I should stay with Python and C.

Araq (orginal) [2021-04-07T19:25:09+02:00] view original

If you don't want to learn, sticking with what you know is usually better, yes. Esp if you're a scientist, as scientists should never pursue knowledge.

Yardanico (orginal) [2021-04-08T08:21:50+02:00] view original

If the application "requires" that loop, the compiler would not optimize it away. As @Araq said, optimizations can do a lot of stuff (removing entire loops, functions, etc), but they won't change logic of your application

DeletedUser (orginal) [2021-04-08T08:27:07+02:00] view original

OK. But so did the compiler understand here, perhaps because of the discard expression, that the for was not needed in my example script?

Yardanico (orginal) [2021-04-08T08:34:49+02:00] view original

It doesn't matter if it was discard or something else, the compiler simply understands that this loop doesn't change anything, so the "program" works fine without it, so it just removed that loop. For keeping the results you have to use some clever tricks, e.g. in @treeform 's benchy there's a keep procedure to tell the compiler to not optimize a specific return value away - https://github.com/treeform/benchy/blob/master/src/benchy.nim#L50

DeletedUser (orginal) [2021-04-08T08:39:37+02:00] view original

OK, now I get it. Thank you!

federico3 (orginal) [2021-04-08T13:47:18+02:00] view original

@nyggus Ubuntu "groovy" provides version 1.2.6 and "hirsute" provides 1.4.2

cblake (orginal) [2021-04-08T14:21:34+02:00] view original

With the same defs of full_string and sub_string (in both languages) this python 3.8:

from time import time
def count(full, sub):
    t0 = time()
    start = n = 0
    while True:
        if (spot := full.find(sub, start)) >= 0:
            start = spot + 1
            n += 1
        else:
            break
    print("Time taken: ", time() - t0, " matches: ", n)
count(full_string * 50000, sub_string)

runs almost exactly 10x slower (not a mere 2-4x as you were expecting) on a Linux machine than this Nim (-d:danger, --passC:-flto, etc.):

import times, strutils
proc count(full, sub: string) =
    let t0 = epochTime()
    var start, n: int # auto inits to 0
    while true:
      if (let spot = full.find(sub, start); spot) >= 0:
          start = spot + 1
          n += 1
      else:
          break
    echo "Time taken: ", epochTime() - t0, " matches: ", n
count(repeat(full_string, 50000), sub_string)

The difference from your benchmark attempt is that here we pump up the data scale by 50,000 back-to-back copies, searching a big string (in small hops).

Anyway, most things depend a lot on, well, a lot of things. Nim responds well to effort applied towards optimization. (Even Python can respond well via Cython/Pythran/etc./etc. - How well just depends).

The subculture of "write my microbenchmark in languages X,Y,Z and draw conclusions" mostly leads to misguided conclusions. What is measured is mostly developer effort/skill/lang familiarity (often limited by other arbitrary constraints like "Python without Cython" that would not apply in a real world scenario).

Clonk (orginal) [2021-04-08T15:03:08+02:00] view original

Note that usually the best way to install Nim and get the latest version is choosenim : https://github.com/dom96/choosenim

SunnyCorleone (orginal) [2021-04-08T16:07:21+02:00] view original

I ran both on a pretty old laptop, nim compiled with just -d:release

Python(v3.8.5): 75s Nim(v1.44): 58.3s

Araq (orginal) [2021-04-08T17:50:47+02:00] view original

Nim devel's find now uses C's strstr if applicable. And strstr uses SSE instructions, I hope.

Mirror of forum.nim-lang.org

7758 :: Performance against Python: find substring in string