I'm thinking of setting up a github action bot to track metrics for each commit/merge. Basically it should run the same benchmark code twice. One uses a Nim compiler without the commit, the other uses a Nim compiler with the commit. Finally, It comments on that PR so that we have long-term results which we look up at any time and find performance regressions easier. I have an unfinished protype here => https://github.com/nim-lang/Nim/pull/19941
Any ideas about how to make benchmarks more reliable or which metrics to collect? For a start, I propose to collect compilation time and memory usage for each commit/merge.
There are many different ones you could measure. But probably a good start is the compiler bootstrapping time.
The most important thing though is to use dedicated hardware to get consistent results. Doubt GitHub Actions will be very consistent here.
The most important thing though is to use dedicated hardware to get consistent results. Doubt GitHub Actions will be very consistent here.
As a workaround for now, we can use compiler without the latest commit and compiler with the latest commit to run the benchmark at the same machine.