FastRPC is a lightweight, high-performance RPC library for Nim, designed for developers who want simple, fast, and reliable remote procedure calls without heavy dependencies.
FastRPC offers a clean API, efficient message serialization, and minimal overhead. It's multi-threaded but could be adapted to async pretty easily.
Originally it was a fork of nim-json-rpc but I ended up re-writing almost the entire thing targeting embedded devices. It was phased out in favor of a much more complicated system using flatbuffers which wasn't much if any faster and was much harder to work with, but luckily it was developed as an open source project.
It needed some cleanup before release to the public so it lay fallow until I needed it for a new project.
BTW, here's what GPT5 says about the latency numbers I'm getting on my M3 (plenty good enough for my uses):
✅ Bottom Line
You’re in Cap’n Proto territory already.
Cap’n Proto might edge you out on absolute lowest latency (especially p50 < 20 µs), but your new numbers are excellent and probably good enough for most workloads.
If your use case isn’t ultra low-latency critical (like HFT or real-time control systems), there’s little practical gain in switching — you’re already at the point where application logic dominates latency, not transport.
Metric | Your RPC Layer | Cap'n Proto (typical) |
---|---|---|
Average (p50) | ~46.8 µs | 10–30 µs |
Standard Deviation | ~13 µs | ~5–10 µs |
Max Observed | ~365 µs | <100 µs (usually) |
Variance | 0.000176 ms² | Very low |
Total Calls Tested | 10,000 | Varies (benchmarks) |
BTW, I'd like to move it to CBOR one day. Maybe morph it to support CoAP.
Note that the main performance increase is due to avoiding JsonNode and serializing directly into value objects. Allocations kill performance!
You could also have switched to my packedjson module
For sure - though the bigger change was re-implementing nim-json-rpc router logic so it could pack and unpack directly from a stream/string rather than a JsonNodes or similar. Well and making a non-async TCP and UDP server. I probably wouldn't do that again.
Supporting packedjson would be trivial and likely the performance would be pretty decent. Maybe 10x less.
But I agree, why does everything have to be in JSON, it sucks
Especially for embedded where you want an int16 to be an int16, not actually a float so you actually get a 53bit integer... or something.
MsgPack and/or CBOR are pretty much ideal in terms of a serialization spec as I've found. CBOR specifications add the last little bit MsgPack needed to become a universal TLV format.
std/json is indeed not a hard one to beat, but packedjson still creates a dynamic, heap-based json-like structure in memory which in and of iteself is an inefficient approach no matter how efficiently you do it - if a JSON file contains lots of numbers for example, you'll end up with lots of little JsonNode instances that not only is exploded and fragmented in memory but also causes significant GC/ref-counting overhead.
Libraries like `nim-json-serialization` instead directly create object instances from the underlying stream of bytes which fundamentally makes them more efficient for known types/JSON shapes, and at that that point experience shows it actually doesn't greatly matter if it's JSON, CBOR, protobuf or MsgPack, for decoding performance - the significant difference between them instead relates to their "encoded size" and in some cases, their inherent security properties, and not the "parsing" step.
if a JSON file contains lots of numbers for example, you'll end up with lots of little JsonNode instances that not only is exploded and fragmented in memory but also causes significant GC/ref-counting overhead.
That's not how packedjson works, it does a single allocation for a full JSON tree, not individual nodes...
heap-based json-like structure in memory which in and of iteself is an inefficient approach no matter how efficiently you do it - if a JSON file contains lots of numbers for example, you'll end up with lots of little JsonNode instances that not only is exploded and fragmented in memory but also causes significant GC/ref-counting overhead.
Yep that’s the real performance killer.
It’s sort of similar to the N+1 issue with SQL ORMs. A single allocation is almost negligible, but N+1 allocations are terrible.
Though JSON will be noticeably slower due to the complexities of parsing numbers from text to binary. Also base64 encoding byte streams, etc. And handling escaping values.
That's not how packedjson works, it does a single allocation for a full JSON tree, not individual nodes...
Nice! I figured it used iterators or something.
A single allocation per request isn't terrible performance wise. The network syscall overhead usually still dominates in that case. Still it leaves you at least 10x slower than a native method call.
Could be handy to adapt packedjson to MsgPack/CBOR. There’s cases where an unknown payload is handy.
CBOR can do sub-CBOR streams naturally so it’s not too bad to eagerly serialize though! FastRPC uses that trick for streaming responses.