It seems that a lot of the profilers people use for their nim executables aren't available for mac's, so what do you mac users use? I've tried nim's nimprof but what I'm interested in this the time spent in each proc, not the number of times a proc was called.
I guess I could do something manually by calculating the time a proc was entered and exited within the proc itself, and update a global var, and do that for each each proc and then echo the results at the end for each proc?
Thanks in advance :)
I use Apple instruments in Time profiler mode. Compile with --debugger:native to get the Nim code in the profiler instead of the C code.
See example investigations:
I have written a command line sampling profile for nim that works on Windows, Linux and ... with some effort Mac.
https://github.com/treeform/hottie
To use it on mac you would have to generate a self signed certificate and code sign the hottie executable. Which is very annoying.
I have written a command line sampling profile
+1 that's seems to be exactly what I want. Maybe it would be worth to mention it in the docs for std/nimprof .
Since I'm on a mac and I wanted to try & write a simple profiler myself to just spit out times.
eg if I have this example code (just a quick example that does nothing):
import std/[sequtils]
proc A(x: seq[int]) =
for _ in 0 .. x.high
var y = x
proc B(x: seq[int]) =
for _ in 0 .. x.high:
var y = deepcopy(x)
var sq = toseq(1..100)
A(sq)
B(sq)
... and I want to know how much time is spent in each proc and also how much time deepcopy is using (as opposed to normal copy) so I can use a template (which Elegantbeef showed me), where I'd put callTime "foo": block wherever I wanted to time:
import std/[strformat, monotimes, times, tables, algorithm, sequtils]
var callTimes = initOrderedTable[string, (int, Duration)]()
template callTime(name: static string, body: untyped) =
let start = getMonoTime()
body
let delta = getMonoTime() - start
if name notin callTimes:
callTimes[name] = (1, delta)
else:
callTimes[name][0] += 1
callTimes[name][1] += delta
proc A(x: seq[int]) =
callTime "proc A":
for _ in 0 .. x.high:
callTime "copy":
var y = x
proc B(x: seq[int]) =
callTime "proc B":
for _ in 0 .. x.high:
callTime "deepcopy":
var y = deepcopy(x)
var sq = toseq(1..100)
A(sq)
B(sq)
echo "\nSorted by number of calls (#calls, time):"
callTimes.sort(proc(x, y: (string, (int, Duration))): int = cmp(x[1][0], y[1][0]), Descending)
for k, v in callTimes:
echo fmt"{k:<10}: {v}"
echo "\nSorted by total time in call (#calls, time):"
callTimes.sort(proc(x, y: (string, (int, Duration))): int = cmp(x[1][1], y[1][1]), Descending)
for k, v in callTimes:
echo fmt"{k:<10}: {v}"
echo "\nSorted by total time divided by number of calls (time):"
callTimes.sort(proc(x, y: (string, (int, Duration))): int = cmp(x[1][1] div x[1][0], y[1][1] div y[1][0]), Descending)
for k, v in callTimes:
echo fmt"{k:<10}: {v[1] div v[0]}"
and I compile with eg --gc:orc --deepcopy:on -d:release I get output like this:
Sorted by number of calls (#calls, time):
copy : (100, 3 microseconds and 827 nanoseconds)
deepcopy : (100, 114 microseconds and 100 nanoseconds)
proc A : (1, 18 microseconds and 351 nanoseconds)
proc B : (1, 128 microseconds and 35 nanoseconds)
Sorted by total time in call (#calls, time):
proc B : (1, 128 microseconds and 35 nanoseconds)
deepcopy : (100, 114 microseconds and 100 nanoseconds)
proc A : (1, 18 microseconds and 351 nanoseconds)
copy : (100, 3 microseconds and 827 nanoseconds)
Sorted by total time divided by number of calls (time):
proc B : 128 microseconds and 35 nanoseconds
proc A : 18 microseconds and 351 nanoseconds
deepcopy : 1 microsecond and 141 nanoseconds
copy : 38 nanoseconds
... so now I know that most of the time is spend in proc B and that deepcopy is like 30x as slow as normal copy.
To flesh out on @evoalg idea, here is a simple tol that gives report on number of function called and time spent.
You only need to add {.meter.} pragma to each of the proc of interest
https://github.com/mratsim/constantine/blob/master/metering/tracer.nim
This gives a report like this (well there are proc name collisions)
|--------------------------------------------------|--------------|--------------------|---------------|-----------------|--------------------------|--------------------------|
| Procedures | # of Calls | Throughput (ops/s) | Time (µs) | Avg Time (µs) | CPU cycles (in billions) | Avg cycles (in billions) |
| UseAssembly | | | | | indicative only | indicative only |
|--------------------------------------------------|--------------|--------------------|---------------|-----------------|--------------------------|--------------------------|
|`+=`* | 11473| inf| 0.000| 0.000|
|`-=`* | 18603| 2067000000000.000| 0.009| 0.000|
|double* | 7212| 2404000000000.000| 0.003| 0.000|
|sum* | 21058| 7019333333333.333| 0.003| 0.000|
|diff* | 8884| 2961333333333.333| 0.003| 0.000|
|diff* | 10| inf| 0.000| 0.000|
|double* | 4186| inf| 0.000| 0.000|
|prod* | 14486| 1609555555555.555| 0.009| 0.000|
|square* | 16| inf| 0.000| 0.000|
|neg* | 2093| inf| 0.000| 0.000|
|neg* | 2050| inf| 0.000| 0.000|
|div2* | 512| inf| 0.000| 0.000|
|`*=`* | 5584| 620444444444.444| 0.009| 0.000|
|square* | 1116| inf| 0.000| 0.000|
|square_repeated* | 126| 1235294117.647| 0.102| 0.001|
|finalExpEasy* | 1| 5555555.556| 0.180| 0.180|
|cyclotomic_inv* | 5| 1000000000.000| 0.005| 0.001|
|cyclotomic_inv* | 1| inf| 0.000| 0.000|
|cyclotomic_square* | 6| 70588235.294| 0.085| 0.014|
|cyclotomic_square* | 309| 70499657.769| 4.383| 0.014|
|cycl_sqr_repeated* | 25| 5556790.398| 4.499| 0.180|
|millerLoopGenericBLS12* | 1| 279251.606| 3.581| 3.581|
|finalExpHard_BLS12* | 1| 178475.817| 5.603| 5.603|
|pairing_bls12* | 1| 105196.718| 9.506| 9.506|
|--------------------------------------------------|--------------|--------------------|---------------|-----------------|--------------------------|--------------------------|
I already tried to install valgrind on my mac using brew but it says:
valgrind: Linux is required for this software.
Error: valgrind: An unsatisfied requirement failed this build.
... and I also tried to install someone's workaround from their git but it said my mac os version is too modern and isn't supported.