In my latest PR I managed to JIT compile from LLVM IR to AMD GPU:
https://github.com/mratsim/constantine/pull/453
Now the question becomes can e do Nim -> AMD GPU (or Nvidia as AMD toolchain supports both).
As we have a LLVM-IR => AMD (and LLVM-IR => Nvidia) we can use those as NLVM backends.
Alternatively, all GPU languages are C-like (AMD Hip, Nvidia Cuda, OpenCL) and WebGPU is C/Rust/OCaml/Nim-like, so it should be possible to add a Nim backend towards those langs and then use runtime compilation:
See https://llvm.org/docs/AMDGPUUsage.html#processors
I develop on RDNA3 so anything between Radeon HD 7790 (2014) to today.
Awesome, does that mean that I can write something like the following and have it run on the GPU?
proc reductionShader(env: GlEnvironment, barrier: BarrierHandle,
buffers: Locker[tuple[input: seq[int32], output: Atomic[int32]]],
smem: ptr seq[int32], n: uint) {.gcsafe.} =
let localIdx = env.gl_LocalInvocationID.x
let localSize = env.gl_WorkGroupSize.x
let gridSize = localSize * 2 * env.gl_NumWorkGroups.x
var globalIdx = env.gl_WorkGroupID.x * localSize * 2 + localIdx
var sum: int32 = 0
while globalIdx < n:
# echo "ThreadId ", localIdx, " indices: ", globalIdx, " + ", globalIdx + localSize
unprotected buffers as b:
sum = sum + b.input[globalIdx] + b.input[globalIdx + localSize]
globalIdx = globalIdx + gridSize
smem[localIdx] = sum
wait barrier
var stride = localSize div 2
while stride > 0:
if localIdx < stride:
# echo "Final reduction ", localIdx, " + ", localIdx + stride
smem[localIdx] += smem[localIdx + stride]
wait barrier # was memoryBarrierShared
stride = stride div 2
if localIdx == 0:
unprotected buffers as b:
atomicInc b.output, smem[0]
This year I read through Cuda by Example, and decided that I wanted to try getting all of the examples working with Nim.
I started working on a library named Hippo to add templates and macros to enable programming CUDA C or HIP in Nim. I got the basics working with multiple targets including Cuda / Nvidia, Hip -> rocm, HIP -> cuda, and CPU only with HIP-CPU (handy for debugging). https://github.com/monofuel/hippo
I recently got a Nim PR merged adding backends for nvcc and hipcc, now available in Nim >= 2.1.9. both CUDA C and HIP require using the C++ backend for Nim.
It still needs a lot more work, but I'm amazed that I've made at least this much progress. Here's an example of a julia set generator using Hippo: https://github.com/monofuel/hippo/blob/master/tests/hip/julia.nim
my workflow has been to do the exercises from the book in CUDA C, then port them to HIP (usually as easy as just running hipify) and then write it with Nim + hippo. There is room for improvement to make the library more nim-y, but things are at least working.
@planetis, sorry I missed your question. At the moment no because I write LLVM IR directly. However using the technique from the following PR https://github.com/mratsim/constantine/pull/487 it should be possible. Note that it is Cuda focused but should be straightforward to adapt it to AMD.
@monofuel, I've seen your nvcc/hipcc PR and hippo, great work!