nimforum mirror - Low-latency, Python-calling, concurrent design

jasonfi (orginal) [2024-12-05T05:36:03+01:00] view original

Hi Nim community,

I'm working on a design for a problem I have: I need to call a Python function, which itself performs IO which can take a few seconds, but the requests to do this need to be concurrent.

The design I came up with creates x number of threads in advance (in a threadpool), with more threads to be added in a worse case scenario (if all threads are busy). Each thread waits for a request by listening to their dedicated channel to the main thread. On receiving a request they call the Python function and return the result over the channel to the main thread. The main thread keeps track of which requests were sent to which thread.

This is simple enough, but I was wondering if there's a better way to do it, e.g. async. I'm not sure that async would play well with Python calls via nimpy though. So I'm interested in feedback and alternative suggestions on how to handle the problem.

Thanks, Jason

Clonk (orginal) [2024-12-05T10:50:36+01:00] view original

Calling Python from Nim - if you call Python from Nimpy / CPython - will not remove Python multithreading implementation.

IMO, I would try to remove the Python requirement (if you can).

Otherwise, depending on how big your data is and where the bottleneck is :

Having N process Python and a messaging queue (the Nim process would act like a broker for the N worker) might be more efficient. => This can be highly efficient BUT it implies marshalling data and concurrent access to your IO so make sure those 2 things are not bottleneck.

On the otherh and if concurrent IO access is a bottleneck then async is the way to go : having more CPUs will not help you if your CPUs are all waiting IO access.

jackhftang (orginal) [2024-12-05T12:27:52+01:00] view original

What you are describing is called load balancing pattern in context of zeromq.

If I were you and want a quick working (and still extensible) prototype, I would make use of zeromq as it is fast, mature, proven-technology and has many language bindings. A typical model would be clients, broker, workers are running in separate process. It is possible with threads, but process provide independent life-cycle, so that workers can come in and out. Here show a toy example in python to see if it suits your taste. Note that you should follow Paranoid Pirate Pattern to write a reliable and robust one.

worker.py


import zmq
import sys

workerId = f"worker{sys.argv[1] if len(sys.argv) > 1 else ""}".encode()
context = zmq.Context()
socket = context.socket(zmq.DEALER)
socket.setsockopt(zmq.IDENTITY, workerId)
socket.connect("ipc://backend")
socket.send_multipart([b"PING"])
while True:
    [command, clientId, x] = socket.recv_multipart()
    print(f"worker receive request {x} from client {clientId}")
    # do something here
    y = 2 * int(x)
    socket.send_multipart([b"RESULT", clientId, f"{y}".encode()])

client.py


import zmq
import time

context = zmq.Context()
client = context.socket(zmq.REQ)
client.connect("ipc://frontend")
for x in range(10):
  print(f"client send {x}")
  client.send(f"{x}".encode())
  y = client.recv()
  print(f"client receive {x} -> {int(y)}")
  time.sleep(1)

broker.py


import zmq

context = zmq.Context()
backend = context.socket(zmq.ROUTER)
backend.bind("ipc://backend")
frontend = context.socket(zmq.ROUTER)
frontend.bind("ipc://frontend")

workers = []
poller = zmq.Poller()
poller.register(backend, zmq.POLLIN)
poller.register(frontend, zmq.POLLIN)
while True:
    sockets = dict(poller.poll())
    if backend in sockets:
        raw = backend.recv_multipart()
        workerId = raw[0]
        command = raw[1]
        if command == b"PING":
            print(f"new worker {workerId}")
            workers.append(workerId)
        elif command == b"RESULT":
            [clientId, result] = raw[2:]
            print(f"worker {workerId} reply {result} to client {clientId}")
            frontend.send_multipart([clientId, b"", result])
            workers.append(workerId)
    if frontend in sockets:
        raw = frontend.recv_multipart()
        clientId = raw[0]
        request = raw[2]
        assert len(workers) > 0, "no workers"
        workerId = workers.pop()
        print(f"client {clientId} request {request} to worker {workerId}")
        backend.send_multipart([workerId, b"", clientId, request])

And then start them in the following order.


python broker.py
python worker.py 0
python worker.py 1
python client.py

The model above is distributed and scalable. As for latency, I am not sure how 'low' you want, but as you can accept something done in python and copying over channel, then I think zeromq over ipc (unix socket) should be okay to you.

jasonfi (orginal) [2024-12-05T13:31:51+01:00] view original

Thanks for the advice so far, I should note that the Python/NimPy call is called from a Nim function that must perform work both before and after the Python call. So offloading directly to Python isn't feasible.

For the time being I'd like to keep the Python code as its from a library I don't want to rewrite.

I'm thinking of going with the design I outlined, which uses Nim threads and channels.

Clonk (orginal) [2024-12-05T17:28:52+01:00] view original

Using multiple Nim thread will not make the Python interpreter multihtreaded -> nimpy still has the same limitation to call Python function from the main thread that instantiated the interpreter.

In a nutshell, multithreading solves CPU bottleneck, async solves IO bottleneck so reading your question I think it's important to determine where your bottleneck is so you can choose the best solution.

jasonfi (orginal) [2024-12-05T17:46:19+01:00] view original

Are you saying that each Nim thread shares a single Python process when calling with Nimpy?

In that case I'd rather rewrite the applicable parts of the Python library in Nim.

Clonk (orginal) [2024-12-06T01:23:22+01:00] view original

Are you saying that each Nim thread shares a single Python process when calling with Nimpy ?

Not exactly. Nimpy has the same limitation as CPython and therefore has to go through the GIL for multithreading. A great example of how to use the GIL from Nimpy is here : https://thefriendlyghost.nl/unleashedgil/

If having a lock is okay for your use case then it's a potential solution. Since your issue seems to be IO bound my gut feeling is that multithreading may not be the best solution for your use case; so I strongly advise to measure the added latency of a Python + Nimpy + GIL multithreaded solution versus an async one.

enthus1ast (orginal) [2024-12-06T10:34:36+01:00] view original

or just call the python script via shell and marshall the input/output in eg json. No multithreading gil and dead simple. Maybe also fast enough.

jasonfi (orginal) [2024-12-06T18:18:44+01:00] view original

Thanks to all for your thoughts on this.

Mirror of forum.nim-lang.org

12616 :: Low-latency, Python-calling, concurrent design