Laguna-Dense CUDA · RFT arm = DPO

From EvanOLeary/laguna-xs2-dense-k8-cuda-sft-v2 (SFT-extended). DPO on SakanaAI CUDA Engineer Archive traces: per task, prefer the verified correct+fastest kernel over an incorrect/slow one (Correct + CUDA_Speedup_Native). β=0.1, ref = frozen SFT-extended. Exploits the archive's evolutionary refinement ordering. Compared vs offline-GRPO on KernelBench-Lite L1 (K=4).

https://github.com/Tyronita/laguna-dense-cuda-kernels

Downloads last month
43
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EvanOLeary/laguna-xs2-dense-k8-cuda-dpo