Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
-
Updated
Mar 19, 2026 - Python
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
Learn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docs
Automatic Triton kernel generation and optimization for Intel GPU, powered by Claude Code.
Noeris — autonomous kernel fusion discovery + Triton autotuning for LLM kernels and Gemma layer deeper fusion (A100/H100 wins).
Domain-specific fine-tuned code model for AMD ROCm GPU kernel optimization. SFT + GRPO on MI300X. 14% vs CUDA hand-tuned. 🤗 HF Space: https://huggingface.co/spaces/XMRTDAO/rocm-kernel-tuner
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).
Optimize PyTorch GPU kernels by autonomously profiling, extracting, and improving Triton or CUDA C++ code for better performance and efficiency.
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."