llama-cpp
Here are 330 public repositories matching this topic...
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.
-
Updated
May 14, 2026 - Python
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
-
Updated
May 21, 2025 - Python
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
-
Updated
Jul 12, 2024 - Python
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
-
Updated
Aug 28, 2025 - Python
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
-
Updated
Jun 10, 2023 - Python
DocMind AI is a powerful, open-source Streamlit application leveraging LlamaIndex, LangGraph, and local Large Language Models (LLMs) via Ollama, LMStudio, llama.cpp, or vLLM for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats, securely and privately, all offline.
-
Updated
May 13, 2026 - Python
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
-
Updated
Mar 24, 2025 - Python
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
-
Updated
May 17, 2024 - Python
BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT
-
Updated
Jun 4, 2023 - Python
Demos of Google's Gemma models running locally on NVIDIA Jetson Orin Nano, from the Tokyo Dev Day (Gemma 2) to the latest Gemma 4 VLA agent with voice + vision.
-
Updated
Apr 17, 2026 - Python
◉ Universal Intelligence: AI made simple.
-
Updated
Apr 16, 2026 - Python
OpenVitamin is a local-first AI execution platform that unifies Agents, Workflows, and multi-model inference into a single programmable system — designed for building real, production-grade AI applications.
-
Updated
Apr 14, 2026 - Python
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
-
Updated
Apr 18, 2026 - Python
Strix Halo local LLM guide: 63-97 t/s direct MoE on Ryzen AI MAX+ 395 / 128GB unified memory. Setup, model choices, benchmarks, and raw evidence.
-
Updated
May 10, 2026 - Python
Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma
-
Updated
Oct 7, 2024 - Python
Lightweight Modular AI Routing Engine for Local LLMs — Run specialised experts efficiently on consumer GPUs using smart Mixture-of-Experts routing.
-
Updated
Apr 6, 2026 - Python
Local diagnostic CLI for NVIDIA DGX Spark (GB10). Detects power caps, unified memory pressure, thermal risk, Docker/runtime issues, and validates vLLM/Ollama/llama.cpp/SGLang recipes.
-
Updated
Apr 24, 2026 - Python
Improve this page
Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."