Hello folks,
Another success story to add to Nim,
@Vindaar developed Nim to Cuda runtime compilation pipeline here https://github.com/mratsim/constantine/pull/487
This allows us Nim to write Cuda kernels that are specialized at runtime.
Of particular interest is that we use an intermediary GPU AST: https://github.com/mratsim/constantine/blob/0900bc0/constantine/math_compiler/experimental/nim_ast_to_cuda_ast.nim#L11-L38
import std / [macros, strutils, sequtils, options, sugar, tables, strformat]
type
GpuNodeKind = enum
gpuVoid # Just an empty statement. Useful to not emit anything
gpuProc # Function definition (both device and global)
gpuCall # Function call
gpuTemplateCall # Call to a Nim template
gpuIf # If statement
gpuFor # For loop
gpuWhile # While loop
gpuBinOp # Binary operation
gpuVar # Variable declaration
gpuAssign # Assignment
gpuIdent # Identifier
gpuLit # Literal value
gpuArrayLit # Literal array constructor `[1, 2, 3]`
gpuPrefix # Prefix e.g. `-`
gpuBlock # Block of statements
gpuReturn # Return statement
gpuDot # Member access (a.b)
gpuIndex # Array indexing (a[b])
gpuTypeDef # Type definition
gpuObjConstr # Object (struct) constructor
gpuInlineAsm # Inline assembly (PTX)
gpuAddr # Address of an expression
gpuDeref # Dereferences an expression
gpuCast # Cast expression
gpuComment # Just a comment
gpuConstexpr # A `constexpr`, i.e. compile time constant (Nim `const`)
that currently maps to Cuda, but in fine can map to AMD, Apple Metal, OpenCL, Vulkan, ...
In practice with very short code we got a 5.3x improvement over a popular open-source GPU-accelerated cryptography library on a typical cryptography bottleneck (Poseidon2 Merkle-Trees for the connoisseurs).
Note that besides runtime Cuda compilation, Constantine supports direct PTX emission (Nvidia assembly) inlined in LLVM IR:
This was key to the popular DeepSeek R1 team to accelerate their LLM (https://github.com/deepseek-ai/DeepGEMM)
and also AMDGPU code generation as mentioned there: https://forum.nim-lang.org/t/12184
Work was done initially sponsored by myself, and then continued as part of Lita (https://www.lita.foundation/) where we develop an
to accelerate cryptographic workloads such as what was introduced in Google Wallet last week https://blog.google/products/google-pay/google-wallet-age-identity-verifications/