What do you use for monitoring your nim applications? how do you get crashdumps and read them? what rules need stick to to be able to catch bugs in production?
does the try/except saves? share your experience.
Monitoring: anything what works with C should also works with Nim. So, personal preferences. :) I like to use DTrace, Valgrind and SIGINFO call. That's all what I need.
Crashdumps: again, anything what support C should work. But, in my opinion, much better option is to catch exceptions and print something useful for an user or a developer.
Try/except: a hamburger code (all code between try/catch) works pretty well to catch any exception. Just, I think it is better to catch each exception separately. It will produce more useful messages then, especially if you distribute your programs only as binaries.
Personally I prefer to have most of high level procedures decorated with pragma raises: [] and allowing exceptions only in low level code, which also is decorated with pragma raises. Something like this: https://github.com/thindil/nish/blob/trunk/src/nish.nim
Debugging with gcc + Valgrind.
For exception handling I simply set the highest level function of whatever API i was building as ` {.raises: [].}` and the compiler will kindly tell you what exception are not handled
If you are interesting in monitoring a long-running service, there is a new kid in town: https://grafana.com/oss/alloy-opentelemetry-collector/
I have not yet tried it out, but I plan to. Maybe there is someone who has already kicked the tires, and can share first impressions with us?
Nim programs do not tend to die slowly. If there are no bugs, they live forever. But the more usual story (at least for me) is that they commit a sudden death leaving no trace. Therefore I see testing much more important than monitoring. Try to simulate most common workflows under heavy conditions and throw some random monkeys (fuzzing) in the game. Watch out for the program going in to an infinite loop, ending up in to a segmentation fault, or accessing a variable in parallel. By avoiding just these three conditions, silent deaths become rare and monitoring then makes more sense.
For crash dumps, https://github.com/status-im/nim-libbacktrace allows you collect stack traces without the significant runtime overhead that nim has when --stacktrace is enabled - in particular, --stacktrace cause a lot of overhead for all code, not just the collection point while the library makes this zero-cost performance-wise (it reads line information from the debug information generated by gcc).
Regarding metrics/opentelemetry, we instrument our code with https://github.com/status-im/nim-metrics - the outcome can be seen here for example: https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&from=now-12h&to=now&var-instance=geth-09.ih-eu-mda1.nimbus.holesky&var-container=beacon-node-holesky-libp2p&refresh=15m - as you can see, we can also keep track of memory usage of specific nim types over time - this is invaluable for tracking down memory leaks (which usually happen when some "root object" holds on to some references which prevents the GC from collecting them).
Regarding exception handling, nim exceptions are fine if they're never raised (it's very expensive to do so) and never caught (it's very hard to remember to catch them in the right places - code that tries to do this usually ends up having lots of bugs) - for production/library code, we avoid them and use Result instead - see https://status-im.github.io/nim-style-guide/errors.html - in scripts and short-running restartable processes where showing a call stack to the user is acceptable UX, you might make a different tradeoff.
For performance tuning, vtune, gcc and all the tools built for C continue to work - sometimes, it's useful to use the --lineDir:off option to have these tools show the generated C code instead of Nim code which can be useful if you're hunting for mysterious sources of excessive memory allocations, stack usage or copying (nim introduces a lot of temporaries to express its semantics in C).
A bit out of the box approach. It may be a good option to consider language agnostic approach. Analytics, error handling, monitoring, logging, etc done same way for all langs. Huge reduction in complexity.
So, the question itself "how do you monitor Java" - may be wrong, it may be more optimal to look at it as a universal computing nodes, and handle all nodes same way, no matter which lang it uses, be it Java, Node.JS or Nim.
P.S.
Also, the same applies to web server. Basically you can easily implement any web server in any lang as say Node.JS/C++/Whatever WebServer handling N workers started as child processes, sending it Requests as STDIN and getting Responses as STDOUT in JSON or any EfficientBinaryFormat. Again - huge reduction in complexity. As for the performance, for 95% there will be no difference, or even improvement.