nimforum mirror - cleanly terminating a thread if it takes too long

choltreppe (orginal) [2024-01-07T12:47:42+01:00] view original

I am developing a website that does some computations on user inputs, which can take a long time depending on the input. So I want my server to just give up if it takes too long.

None of my functions access global data, except reading global let vars. So I thought I could just run the computation in a seperate thread, and if that thread is still running after a given time I just kill it.

I already have that implemented with following helper function (stript down a bit, removed some irrelevant parts):

import std/private/threadtypes
import mummy, mummy/routers

proc cancel(thread: SysThread) {.importc: "pthread_cancel", header: "<pthread.h>".}


var router*: Router

const timeoutTime = 4_000

proc post*(url: string, cb: proc(req: string): string) =
  
  type Args = tuple
    req: Request
    cb: proc(req: string): string
  
  proc threadCbHandler(args: Args) {.thread, nimcall.} =
    {.gcsafe.}:
      args.req.respond(200, body = args.cb(args.req.body))
  
  router.post(url) do (request: Request):
    {.gcsafe.}:
      var thread: Thread[Args]
      createThread(thread, threadCbHandler, (request, cb))
      sleep timeoutTime
      if thread.running:
        cancel thread.handle
        request.respond(422)

This solution seems to work fine, but now and then my server stops responding and I have to restart. So I thought maybe this is the cause. Could this leak memory?

thanks in advance,

chol

guzba (orginal) [2024-01-08T07:17:38+01:00] view original

I suggest taking a different approach tbh. Could the work being done have a timeout it checks as it tries to make progress, and if the timeout is exceeded that causes it to just raise an exception or something?

PMunch (orginal) [2024-01-08T09:08:38+01:00] view original

Keep in mind that cancelling a thread like this won't release memory. Nim, or you, could theoretically set up a thread cancellation clean-up handler. But AFAIK this is not implemented anywhere at the moment.

Another issue, and what I suspect is the cause for you issues, is the fact that you access global memory without locks and share data between threads without any concerns about where it's freed. This likely causes some illegal state which leaves your program spinning.

mratsim (orginal) [2024-01-08T11:49:21+01:00] view original

Re pthread_cancel. See the doc: https://www.man7.org/linux/man-pages/man3/pthread_cancel.3.html

It's useless if the code you run does not have cancellation points.

It does not reclaim resources (memory, database handles, files, ...) unless you provide cancellation clean-up handlers.

In general, cooperative cancellation is easier to debug (you put an shouldAbort: ptr Atomic[bool] as parameter that you check regularly), yet still incredibly hard: https://gist.github.com/Matthias247/ffc0f189742abf6aa41a226fe07398a8

And for preemptive cancellation, via pthread_cancel or signals need you to install signal handlers on your threads. Also signals are OS-specific.

PMunch (orginal) [2024-01-08T12:11:04+01:00] view original

You could always set the cancellation type to asynchronous, to just have it terminate anywhere. But you'd still need cleanup handlers. I still agree though that handling this with an abort mechanism is much better.

choltreppe (orginal) [2024-01-08T13:07:36+01:00] view original

I suggest taking a different approach tbh. Could the work being done have a timeout it checks as it tries to make progress, and if the timeout is exceeded that causes it to just raise an exception or something?

There are multiple algorithms and most of them are quite complex. So I try to avoid having to place time checks everywhere.

Another issue, and what I suspect is the cause for you issues, is the fact that you access global memory without locks and share data between threads without any concerns about where it's freed. This likely causes some illegal state which leaves your program spinning.

The global data that I access is never written to (except once at program startup). I would have made them const if they werent recursive data types. So I dont think I could run into race conditions (?) and memory-leaks shouldnt be a problem either, since that data should live the whole program livespan anyways. Or did I missunderstood your concern and there are some other potential issues?

You could always set the cancellation type to asynchronous, to just have it terminate anywhere. But you'd still need cleanup handlers. I still agree though that handling this with an abort mechanism is much better.

I will look into that.

If thats too complicated I might take a complete different approach and compile the whole computation logic into a separate program that my server runs for every request. so I can savely just kill that process then.

Mirror of forum.nim-lang.org

10857 :: cleanly terminating a thread if it takes too long