Having some strange issues with C compilation speed when compiling nim (devel, ubuntu 24.04, gcc 13.3.0)
Here is an example that involves a lot of C targets:
# test.nim
import std/uri
import chronos
import chronicles
import websock/websock
proc main() {.async.} =
let ws = WebSocket.connect(parseUri("ws://localhost:8080"))
waitFor main()
I've instrumented the C compilation stage to get the total compilation time and extract the commands like this:
diff --git a/compiler/extccomp.nim b/compiler/extccomp.nim
index e6e35f462..f274ea407 100644
--- a/compiler/extccomp.nim
+++ b/compiler/extccomp.nim
@@ -993,7 +993,13 @@ proc callCCompiler*(conf: ConfigRef) =
script.add("\n")
if optCompileOnly notin conf.globalOptions:
+ for cmd in cmds:
+ echo cmd
+ let tStart = getTime()
execCmdsInParallel(conf, cmds, prettyCb)
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
if optNoLinking notin conf.globalOptions:
# call the linker:
var objfiles = ""
Then I'm running the compiler
nim c -f test.nim > cmds.txt
and then the commands via
# run_cmds.nim
import osproc, times, strformat
var cmds: seq[string]
for line in lines("cmds.txt"):
cmds.add(line)
let tStart = getTime()
let ret = execProcesses(
cmds, {poStdErrToStdOut, poUsePath, poParentStreams}, n=countProcessors())
if ret == 0:
let tEnd = getTime()
let dt = 1e-3*((tEnd - tStart).inMilliseconds).float
echo ">>> compiled in {dt:.3f} s".fmt
else:
echo ">>> failed"
The compiler reports around 1.7 seconds on my machine, while the run_cmds.nim reports only 0.67 seconds.
What could be the problem? Some envvars set by the compiler that affect gcc?
Added a bit more instrumentation with a simple myExecProcesses.
diff --git a/compiler/extccomp.nim b/compiler/extccomp.nim
index e6e35f462..575fca10e 100644
--- a/compiler/extccomp.nim
+++ b/compiler/extccomp.nim
@@ -968,6 +968,34 @@ proc preventLinkCmdMaxCmdLen(conf: ConfigRef, linkCmd: string) =
else:
execLinkCmd(conf, linkCmd)
+proc myExecProcesses(
+ cmds: openArray[string],
+ options={poStdErrToStdOut, poParentStreams},
+ n=countProcessors()): int =
+ assert n > 0
+ result = 0
+
+ var queue = newSeq[Process]()
+
+ proc poll(q: var seq[Process], maxLen: int): int =
+ assert maxLen >= 0
+ result = 0
+ while q.len > maxLen:
+ var i = 0
+ while i < q.len:
+ if q[i].running:
+ i += 1
+ continue
+ let ret = q[i].peekExitCode
+ doAssert ret >= 0
+ result = max(result, ret)
+ q.del(i)
+
+ for cmd in cmds:
+ result = max(result, poll(queue, n - 1))
+ queue.add startProcess(cmd, options=options + {poEvalCommand})
+ result = max(result, poll(queue, 0))
+
proc callCCompiler*(conf: ConfigRef) =
var
linkCmd: string = ""
@@ -993,7 +1021,34 @@ proc callCCompiler*(conf: ConfigRef) =
script.add("\n")
if optCompileOnly notin conf.globalOptions:
+ for cmd in cmds:
+ echo cmd
+
+ block:
+ let tStart = getTime()
+ discard myExecProcesses(
+ cmds,
+ {poStdErrToStdOut, poUsePath, poParentStreams},
+ n=countProcessors())
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
+
+ block:
+ let tStart = getTime()
+ discard execProcesses(
+ cmds,
+ {poStdErrToStdOut, poUsePath, poParentStreams},
+ n=countProcessors())
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
+
+ let tStart = getTime()
execCmdsInParallel(conf, cmds, prettyCb)
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
if optNoLinking notin conf.globalOptions:
# call the linker:
var objfiles = ""
The reported compile times are all identical, but the same commands executed in parallel outside of nim executable finish faster.
Another peace of info is that the speedup isn't noticeable with few large C files, but is noticeable with many small. Any ideas what is going on?
Ok, this one
diff --git a/compiler/extccomp.nim b/compiler/extccomp.nim
index e6e35f462..8232b1609 100644
--- a/compiler/extccomp.nim
+++ b/compiler/extccomp.nim
@@ -993,7 +993,29 @@ proc callCCompiler*(conf: ConfigRef) =
script.add("\n")
if optCompileOnly notin conf.globalOptions:
+ for cmd in cmds:
+ echo cmd
+
+ block:
+ let tStart = getTime()
+ let p = startProcess(
+ "xargs -I{} -P$1 sh -c '{}'" % [$countProcessors()],
+ options={poEvalCommand})
+ let inp = p.inputStream
+ for cmd in cmds:
+ inp.writeLine(cmd)
+ inp.flush()
+ inp.close()
+ let ret = p.waitForExit()
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
+
+ let tStart = getTime()
execCmdsInParallel(conf, cmds, prettyCb)
+ let tEnd = getTime()
+ let dt = 1e-3*(tEnd - tStart).inMilliseconds.float
+ stderr.writeLine ">>> compiled in ", dt.formatFloat(ffDecimal, 3)
if optNoLinking notin conf.globalOptions:
# call the linker:
var objfiles = ""
actually shows different compilation times, and is consistent with executing the commands outside of nim. So It looks like there is some issue with process handling in the version of nim that compiles the compiler.
Figured out the problem lied with using fork instead of spawn, this change fixes things
diff --git a/lib/pure/osproc.nim b/lib/pure/osproc.nim
index e7f82face..bb16a917e 100644
--- a/lib/pure/osproc.nim
+++ b/lib/pure/osproc.nim
@@ -944,8 +944,9 @@ elif not defined(useNimRtl):
pStdin, pStdout, pStderr, pErrorPipe: array[0..1, cint]
options: set[ProcessOption]
- const useProcessAuxSpawn = declared(posix_spawn) and not defined(useFork) and
- not defined(useClone) and not defined(linux)
+ # const useProcessAuxSpawn = declared(posix_spawn) and not defined(useFork) and
+ # not defined(useClone) and not defined(linux)
+ const useProcessAuxSpawn = true
when useProcessAuxSpawn:
proc startProcessAuxSpawn(data: StartProcessData): Pid {.
raises: [OSError], tags: [ExecIOEffect, ReadEnvEffect, ReadDirEffect, RootEffect], gcsafe.}
But why is fork used on linux in the first place? startProcess calls exec after spinning the process up anyway?