nimforum mirror - Nim AST -> GPU AST -> Cuda Runtime compilation

mratsim (orginal) [2025-05-06T11:05:58+02:00] view original

Hello folks,

Another success story to add to Nim,

Twitter: https://x.com/0xLita/status/1919404858316325166

Writeup: https://www.lita.foundation/blog/nvrtc-cuda-poc-building-a-gpu-prover-with-runtime-compilation

@Vindaar developed Nim to Cuda runtime compilation pipeline here https://github.com/mratsim/constantine/pull/487

This allows us Nim to write Cuda kernels that are specialized at runtime.

Of particular interest is that we use an intermediary GPU AST: https://github.com/mratsim/constantine/blob/0900bc0/constantine/math_compiler/experimental/nim_ast_to_cuda_ast.nim#L11-L38

 import

type GpuNodeKind gpuVoid gpuProc gpuCall gpuTemplateCall gpuIf gpuFor gpuWhile gpuBinOp gpuVar gpuAssign gpuIdent gpuLit gpuArrayLit gpuPrefix gpuBlock gpuReturn gpuDot gpuIndex gpuTypeDef gpuObjConstr gpuInlineAsm gpuAddr gpuDeref gpuCast gpuComment gpuConstexpr

std / [macros, strutils, sequtils, options, sugar, tables, strformat] = enum # Just an empty statement. Useful to not emit anything # Function definition (both device and global) # Function call # Call to a Nim template # If statement # For loop # While loop # Binary operation # Variable declaration # Assignment # Identifier # Literal value # Literal array constructor `[1, 2, 3]` # Prefix e.g. `-` # Block of statements # Return statement # Member access (a.b) # Array indexing (a[b]) # Type definition # Object (struct) constructor # Inline assembly (PTX) # Address of an expression # Dereferences an expression # Cast expression # Just a comment # A `constexpr`, i.e. compile time constant (Nim `const`)

that currently maps to Cuda, but in fine can map to AMD, Apple Metal, OpenCL, Vulkan, ...

In practice with very short code we got a 5.3x improvement over a popular open-source GPU-accelerated cryptography library on a typical cryptography bottleneck (Poseidon2 Merkle-Trees for the connoisseurs).

Note that besides runtime Cuda compilation, Constantine supports direct PTX emission (Nvidia assembly) inlined in LLVM IR:

https://github.com/mratsim/constantine/blob/v0.2.0/constantine/platforms/llvm/asm_nvidia.nim#L325-L359

This was key to the popular DeepSeek R1 team to accelerate their LLM (https://github.com/deepseek-ai/DeepGEMM)

and also AMDGPU code generation as mentioned there: https://forum.nim-lang.org/t/12184

Work was done initially sponsored by myself, and then continued as part of Lita (https://www.lita.foundation/) where we develop an

an LLVM-based compiler stack

a cryptographic proof generation backend

to accelerate cryptographic workloads such as what was introduced in Google Wallet last week https://blog.google/products/google-pay/google-wallet-age-identity-verifications/

Mirror of forum.nim-lang.org

12868 :: Nim AST -> GPU AST -> Cuda Runtime compilation