In a long running application i periodically call (linux) executables and collect their return code and output. Sometimes waitForExit excepts with "Invalid argument", but it still collects the stdout.
I tried to reproduce the crash. The application below crashes after a few seconds. When using less threads, it takes longer to crash.
Any tips? Did i found a bug?
Nim Compiler Version 2.0.0 [Linux: amd64]
Compiled at 2023-08-01
Copyright (c) 2006-2023 by Andreas Rumpf
git hash: a488067a4130f029000be4550a0fb1b39e0e9e7c
active boot switches: -d:release
sleep202308173453.nim
import os, strutils
let p = paramStr(1).parseInt
# echo paramStr(1)
sleep(p)
quit 0
waitForExiterror202308172930.nim
import osproc, random, os
randomize()
proc th(foo: bool) {.thread.} =
while true:
var ra = rand(1000)
var rb = rand(1000)
var rc = rand(100)
let cmd = getAppDir() / "sleep202308173453 " & $ra
var pr = startProcess(
command = cmd,
options = {poEvalCommand}
)
sleep(rc)
try:
let exitCode = pr.waitForExit(rb)
var output = ""
for line in pr.lines():
output.add line & "\n"
except:
echo "CRASH"
echo getCurrentExceptionMsg()
echo "ra:", ra
echo "rb:", rb
echo "rc:", rc
quit()
pr.close()
var threads: array[127, Thread[bool]]
for idx in 0 .. threads.len-1:
createThread(threads[idx], th, true)
while true:
sleep(1000)
I'd guess you're hitting some kind of system limit. I have this status line script - https://pastebin.com/4Sz0wr21. Sometimes IO operations just fail, though in my case error is less unambigious, ExceptionMsg is "OS error: Bad file descriptor". It appears in log file once in 24 hours, but it could fail after couple seconds/minutes if I lower the sleep amount.
So I catch an exception and ignore it.
proc exec_cmd(c: Command) {.raises: [IOError, OSError], thread.} =
while true:
try:
let
(stdout, _) = execCmdEx(c.cmd)
parsed = stdout.strip().split('\n', 1)
c.display = parsed[0]
if parsed.len > 1 and parsed[1].isColor(): c.color = Color(parsed[1])
except IOError, OSError:
dumplog($getTime() & " " & c.cmd & " Failed! MSG: " & getCurrentException().msg)
sleep(c.interval)
In your case, you should probably restart process on exception.
pr.peekExitCode indeed returns the correct exit code (if not killed by signal).
Then maybe i could even go without restarting the application.