When building something with Nim, we're looking at several minutes of compile time for many of our projects - in large part, this is due to the design of the compiler and language that requires a global view of the application - ie it can't compile modules separately but rather must process all modules together - there is some work being done in the form of incremental compilation which maybe will be faster and maybe won't, but it turns out that a large chunk of being slow is simply the compiler itself that has plenty of room for improvement - ever wondered what it's doing?
Here's an example from the first minute or so of trying to compile Nimbus:
Function / Call Stack CPU Time Instructions Retired
interiorAllocatedPtr__NuzKjA4SX9afyji9cHHIuKpQ 13.2% 18,482,825,586
rawAlloc__mE4QEVyMvGRVliDWDngZCQ 5.0% 10,616,810,721
newObj 4.9% 10,749,863,778
collectZCT__EN6T32AMm3va9bsrdxtF0cg 4.1% 7,704,340,832
doOperation__sl6eqhLncFedgwzv6TlMVw 3.9% 6,381,249,734
markStackAndRegisters__U6T7JWtDLrWhtmhXSoy9a6g 3.7% 5,376,577,880
[vmlinux] 2.3% 2,390,688,855
transform__FpyLDebN7eBB2pkKKmjXJg 2.0% 3,547,992,357
rawDealloc__K7uQ6aTKvW6OnOV8EMoNNQ 2.0% 3,797,918,125
rawExecute__hemGrN9b53Mp9aLYjv1tCS5g 1.9% 4,101,776,876
copyTree__Dsjo9bte8vGxzhtcSrsTyiQ_2 1.7% 2,467,009,537
__memset_avx2_unaligned_erms 1.7% 2,964,855,898
genLiteral__PEuKCZcy9a56kIfBOLoHU5Q 1.6% 1,891,781,214
matchesAux__jWX5qJnM9cS16h0kw9aDyhrg 1.5% 1,881,562,474
isOnStack__plOlFsQAAvcYd3nF5LfWzw 1.5% 3,010,897,164
skipTypes__zsqmUNR5OZrTUna0Y9bdu9bg 1.4% 1,918,790,960
__memmove_avx_unaligned_erms 1.3% 2,029,660,892
unsureAsgnRef 1.3% 2,783,168,218
toCChar__JTr4d3QfIoJwmoCY9bN9adqQ 1.2% 2,631,730,402
collectCTBody__XHio9cMpnLoH7GyCj1Z9besg_2 1.1% 1,876,488,115
addChar 1.1% 2,173,179,662
Marker_tyRef__fKfcLzXYiz5jNu3NH3Tv8Q 1.1% 1,995,347,114
By and large, the compiler spends most of its time allocating memory and garbage collecting it - looking at the various garbage collection functions, that's roughly 30% together - with another 10% allocating. This is Nim 1.2.12, compiling nimbus_beacon_node.
The first function that does something that is not memory allocation/deallocation is transform at 2% which is the hash function generating symbol names - notably there's very little useful "compile" work going on there, so if you're looking for low-hanging fruit to help optimize nim, memory usage and allocation is a good place to start.
Want to repro? Trivial, clone the nimbus-eth2 repo and run a hotspot analysis with VTune - the profile is free and trivial to use - just start compiling and attach to the process - it's slow enough that you don't have to worry about it finishing :)