kernel-optimization

Star

Here are 8 public repositories matching this topic...

RightNow-AI / autokernel

Sponsor

Star

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

gpu cuda pytorch triton kernel-optimization autoresearch

Updated Mar 19, 2026
Python

WecoAI / weco-cli

Star

Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.

machine-learning code-generation code-optimization prompt-engineering kernel-optimization

Updated May 7, 2026
Python

AICL-Lab / diy-flash-attention

Star

Learn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docs

tutorial cuda pytorch triton educational attention-mechanism gpu-programming forward-pass flash-attention kernel-optimization online-softmax

Updated May 14, 2026
Python

IntelLabs / Triton8

Star

Automatic Triton kernel generation and optimization for Intel GPU, powered by Claude Code.

triton code-generation gpu-computing intel-gpu xpu llm-agents kernel-optimization

Updated May 12, 2026
Python

PwnKit-Labs / noeris

Star

Noeris — autonomous kernel fusion discovery + Triton autotuning for LLM kernels and Gemma layer deeper fusion (A100/H100 wins).

benchmarking cuda pytorch triton autotuning gemma gpu-kernels github-actions kernel-fusion llm-training llm-inference kernel-optimization

Updated May 13, 2026
Python

xmrtdao / rocm-kernel-tuner

Star

Domain-specific fine-tuned code model for AMD ROCm GPU kernel optimization. SFT + GRPO on MI300X. 14% vs CUDA hand-tuned. 🤗 HF Space: https://huggingface.co/spaces/XMRTDAO/rocm-kernel-tuner

hackathon rocm fine-tuning huggingface vllm qwen mi300x kernel-optimization amd-hackathon

Updated May 10, 2026
Python

ssmall256 / mps-kernels-skill

Star

Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).

python machine-learning deep-learning metal gpu pytorch mps apple-silicon kernel-optimization metal-shading-language pytorch-mps

Updated Feb 16, 2026
Python

Teascented-swimmingstroke954 / autokernel

Star

Optimize PyTorch GPU kernels by autonomously profiling, extracting, and improving Triton or CUDA C++ code for better performance and efficiency.

rust reinforcement-learning kernel deep-learning gpu cuda configuration pytorch triton halide tensor kconfig tvm kernel-optimization autoresearch

Updated May 14, 2026
Python

Improve this page

Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel-optimization

Here are 8 public repositories matching this topic...

RightNow-AI / autokernel

WecoAI / weco-cli

AICL-Lab / diy-flash-attention

IntelLabs / Triton8

PwnKit-Labs / noeris

xmrtdao / rocm-kernel-tuner

ssmall256 / mps-kernels-skill

Teascented-swimmingstroke954 / autokernel

Improve this page

Add this topic to your repo