Tracked Repositories

129 open-source AI inference repositories across 61 organizations.

129 repositories across 61 organizations

HuggingFace

4 repos·225.7k ★·88 commits this week

huggingface/transformers

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

Python

160.8k

Wiki

huggingface/diffusers

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

Python

33.7k

Wiki

huggingface/candle

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

Rust

20.3k

Wiki

huggingface/text-generation-inferencearchived

HuggingFace TGI — LLM serving (archived March 2026, read-only)

Python

10.9k

Wiki

TensorFlow

3 repos·204.5k ★·305 commits this week

tensorflow/tensorflow

Industry-standard deep learning framework with XLA compilation backend

C++

195.2k

305

Wiki

tensorflow/serving

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

C++

6.4k

Wiki

tensorflow/tflite-micro

TensorFlow Lite for microcontrollers and embedded devices

C++

2.9k

Wiki

Ollama

1 repo·171.9k ★·12 commits this week

ollama/ollama

User-friendly local LLM runner built on llama.cpp (~167K stars)

171.9k

Wiki

ggml-org

2 repos·162.1k ★·196 commits this week

ggml-org/llama.cpp

High-performance LLM inference in C/C++ (CPU + GPU)

C++

112.1k

119

Wiki

ggml-org/whisper.cpp

High-performance Whisper speech recognition in C/C++

C++

50.0k

Wiki

Open WebUI

1 repo·138.1k ★

open-webui/open-webui

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

Python

138.1k

Wiki

Meta / PyTorch

3 repos·109.1k ★·557 commits this week

pytorch/pytorch

Primary ML framework; torch.compile + AOTInductor for production inference optimization

Python

100.1k

440

Wiki

pytorch/executorch

PyTorch's portable execution framework for on-device inference

Python

4.6k

117

Wiki

pytorch/servearchived

TorchServe — production PyTorch model serving (archived August 2025)

Java

4.4k

Wiki

DeepSeek AI

1 repo·103.6k ★

deepseek-ai/DeepSeek-V3

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

Python

103.6k

Wiki

vLLM Project

2 repos·80.7k ★·225 commits this week

vllm-project/vllm

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

Python

80.7k

207

Wiki

vllm-project/vllm-gaudi

vLLM community plugin for Intel Gaudi accelerators

Python

Wiki

Google AI Edge

12 repos·77.6k ★·238 commits this week

google-ai-edge/mediapipe

Cross-platform ML pipeline framework (vision, audio, NLP)

C++

35.3k

Wiki

google-ai-edge/gallery

AI Edge model gallery

Kotlin

23.2k

Wiki

google-ai-edge/LiteRT-LM

LiteRT for language model inference

C++

5.1k

Wiki

google-gemma/cookbook

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

Jupyter Notebook

3.6k

Wiki

google-ai-edge/mediapipe-samples

Sample apps using MediaPipe

Jupyter Notebook

2.7k

Wiki

google-ai-edge/LiteRT

Google's Lite Runtime (successor to TensorFlow Lite)

C++

2.4k

Wiki

google/XNNPACK

Highly optimized neural network operators library (ARM, x86, WASM)

2.3k

Wiki

google-ai-edge/model-explorer

Model visualization and exploration tool

JavaScript

1.5k

Wiki

google-ai-edge/litert-torch

LiteRT integration with PyTorch

Jupyter Notebook

1.0k

Wiki

google-ai-edge/litert-samples

Sample code for LiteRT

Kotlin

314

Wiki

google-ai-edge/ai-edge-quantizer

Quantization tooling for AI Edge models

Python

133

Wiki

google-ai-edge/models-samples

Sample models for AI Edge

Jupyter Notebook

Wiki

Nomic AI

1 repo·77.4k ★

nomic-ai/gpt4all

Desktop AI app + SDK for running LLMs locally (~73K stars)

C++

77.4k

Wiki

Apple / ML-Explore

10 repos·57.9k ★·22 commits this week

ml-explore/mlx

Array framework for ML on Apple silicon (Python)

C++

26.3k

Wiki

ml-explore/mlx-examples

Example models and applications using MLX

Python

8.6k

Wiki

maderix/ANE

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

Objective-C

6.7k

Wiki

ml-explore/mlx-lm

LLM inference and fine-tuning with MLX

Python

5.4k

Wiki

apple/coremltools

Tools for converting & running models with Core ML

Python

5.3k

Wiki

ml-explore/mlx-swift-examples

Example apps using MLX Swift

Swift

2.6k

Wiki

ml-explore/mlx-swift

Swift bindings for MLX

C++

1.8k

Wiki

ml-explore/mlx-swift-lm

LLM inference in Swift via MLX

Swift

517

Wiki

ml-explore/mlx-data

Efficient data loading for MLX

C++

473

Wiki

ml-explore/mlx-c

C bindings for MLX

C++

205

Wiki

BerriAI

1 repo·47.8k ★·116 commits this week

BerriAI/litellm

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

Python

47.8k

116

Wiki

Oobabooga

1 repo·47.2k ★·55 commits this week

oobabooga/text-generation-webui

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

Python

47.2k

Wiki

Mudler (LocalAI)

1 repo·46.4k ★·56 commits this week

mudler/LocalAI

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

46.4k

Wiki

Exo Explore

1 repo·44.8k ★·7 commits this week

exo-explore/exo

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

Python

44.8k

Wiki

Ray Project

1 repo·42.6k ★·95 commits this week

ray-project/ray

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

Python

42.6k

Wiki

DeepSpeed AI

1 repo·42.4k ★·11 commits this week

deepspeedai/DeepSpeed

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

Python

42.4k

Wiki

Microsoft / ONNX

2 repos·41.4k ★·70 commits this week

onnx/onnx

Open Neural Network Exchange format specification

Python

20.9k

Wiki

microsoft/onnxruntime

Microsoft's cross-platform, high-performance ONNX inference engine

C++

20.6k

Wiki

LM-Sys

1 repo·39.5k ★

lm-sys/FastChat

LLM serving framework and home of Chatbot Arena (~37K stars)

Python

39.5k

Wiki

JAX (Google DeepMind)

1 repo·35.7k ★·145 commits this week

jax-ml/jax

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

Python

35.7k

145

Wiki

Miscellaneous

4 repos·31.8k ★·29 commits this week

mlc-ai/mlc-llm

MLC's universal LLM deployment engine (multi-backend)

Python

22.7k

Wiki

tile-ai/tilelang

Tile-based ML language and compiler

Python

6.3k

Wiki

Anemll/Anemll

Community on-device LLM project

Python

1.6k

Wiki

waybarrios/vllm-mlx

vLLM-style inference on Apple silicon via MLX

Python

1.2k

Wiki

SGLang

1 repo·28.1k ★·345 commits this week

sgl-project/sglang

High-throughput LLM/VLM serving with RadixAttention and structured generation

Python

28.1k

345

Wiki

Tencent

2 repos·27.9k ★·15 commits this week

Tencent/ncnn

High-performance neural network inference for mobile (Android/iOS)

C++

23.3k

Wiki

Tencent/TNN

Tencent Neural Network — mobile and edge inference

C++

4.6k

Wiki

NVIDIA

3 repos·27.1k ★·150 commits this week

NVIDIA/TensorRT-LLM

NVIDIA's optimized LLM inference library (GPU)

Python

13.7k

148

Wiki

NVIDIA/TensorRT

NVIDIA's high-performance deep learning inference SDK (GPU)

C++

13.0k

Wiki

NVIDIA/TensorRT-Edge-LLM

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

Python

408

Wiki

Mozilla AI

1 repo·24.5k ★·2 commits this week

mozilla-ai/llamafile

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

C++

24.5k

Wiki

Triton Language (OpenAI)

1 repo·19.2k ★·25 commits this week

triton-lang/triton

Python-like GPU kernel language used by vLLM FlashAttention and PyTorch inductor

MLIR

19.2k

Wiki

MLC AI

1 repo·18.0k ★·1 commits this week

mlc-ai/web-llm

High-performance LLM inference in web browsers via WebGPU

TypeScript

18.0k

Wiki

KVCache AI

1 repo·17.2k ★·3 commits this week

kvcache-ai/ktransformers

CPU-GPU hybrid inference; runs DeepSeek 671B on 14GB VRAM + 382GB DRAM with massive speedup over llama.cpp

Python

17.2k

Wiki

Alibaba

1 repo·15.2k ★·13 commits this week

alibaba/MNN

Alibaba's neural network inference framework for mobile & edge

C++

15.2k

Wiki

jundot

1 repo·14.8k ★·48 commits this week

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Python

14.8k

Wiki

Apache

2 repos·13.8k ★·10 commits this week

apache/tvm

Apache TVM ML compiler — auto-tunes models for any hardware target

Python

13.4k

Wiki

apache/tvm-ffi

Apache TVM Foreign Function Interface for deep learning compilation

C++

397

Wiki

Blaizzy (Community MLX)

5 repos·13.1k ★·40 commits this week

Blaizzy/mlx-audio

Audio models (TTS, ASR) with MLX

Python

7.1k

Wiki

Blaizzy/mlx-vlm

Vision-language models on Apple silicon via MLX

Python

4.8k

Wiki

Blaizzy/mlx-audio-swift

Swift audio inference using MLX

Swift

629

Wiki

Blaizzy/mlx-embeddings

Text embedding models with MLX

Python

385

Wiki

Blaizzy/mlx-video

Video model inference with MLX

Python

230

Wiki

K2 / Next-gen ASR

1 repo·12.4k ★·5 commits this week

k2-fsa/sherpa-onnx

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

C++

12.4k

Wiki

OpenVINO Toolkit / Intel

3 repos·11.9k ★·87 commits this week

openvinotoolkit/openvino

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

C++

10.3k

Wiki

openvinotoolkit/nncf

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

Python

1.2k

Wiki

openvinotoolkit/openvino.genai

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

C++

509

Wiki

RunAnywhere

2 repos·11.9k ★

RunanywhereAI/runanywhere-sdks

RunAnywhere SDKs for on-device inference deployment

C++

10.3k

Wiki

RunanywhereAI/RCLI

RunAnywhere CLI tool

C++

1.5k

Wiki

Intel

2 repos·11.5k ★·4 commits this week

intel/ipex-llmarchived

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

Python

8.8k

Wiki

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity

Python

2.6k

Wiki

Mistral AI

1 repo·10.8k ★

mistralai/mistral-inferencearchived

Official minimal inference library for all Mistral models (7B, Mixtral, Pixtral)

Jupyter Notebook

10.8k

Wiki

Triton Inference Server

1 repo·10.7k ★·6 commits this week

triton-inference-server/server

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

Python

10.7k

Wiki

Dusty-NV (NVIDIA Jetson)

1 repo·8.9k ★

dusty-nv/jetson-inference

DNN inference library & tutorials for NVIDIA Jetson

C++

8.9k

Wiki

BentoML

1 repo·8.7k ★

bentoml/BentoML

Unified serving framework: real-time APIs, task queues, batching, multi-model chains

Python

8.7k

Wiki

Nexa AI

1 repo·8.1k ★

NexaAI/nexa-sdk

Unified SDK for running LLMs and multimodal models locally

Kotlin

8.1k

Wiki

InternLM / Shanghai AI Lab

1 repo·7.9k ★·10 commits this week

InternLM/lmdeploy

High-throughput LLM serving with TurboMind engine (C++/CUDA)

Python

7.9k

Wiki

PaddlePaddle (Baidu)

1 repo·7.3k ★

PaddlePaddle/Paddle-Lite

Lightweight inference engine for mobile & embedded from PaddlePaddle

C++

7.3k

Wiki

AI Dynamo (NVIDIA)

1 repo·6.9k ★·158 commits this week

ai-dynamo/dynamo

Datacenter-scale distributed inference serving framework (Rust + Python, disaggregated prefill/decode, engine-agnostic)

Rust

6.9k

158

Wiki

ArgMax

4 repos·6.5k ★

argmaxinc/WhisperKit

On-device Whisper inference for Apple platforms (Swift)

Swift

6.1k

Wiki

argmaxinc/whisperkittools

Python tooling for WhisperKit model optimization

Python

244

Wiki

argmaxinc/OpenBench

On-device AI benchmarking framework

Jupyter Notebook

Wiki

argmaxinc/argmax-sdk-swift-playground

Swift playground for ArgMax SDK

Swift

Wiki

Cactus Compute

5 repos·5.5k ★·1 commits this week

cactus-compute/cactus

Cactus core edge inference framework

5.2k

Wiki

cactus-compute/cactus-react-native

React Native bindings for Cactus

C++

172

Wiki

cactus-compute/cactus-flutter

Flutter bindings for Cactus

C++

Wiki

cactus-compute/cactus-kotlin

Kotlin/Android bindings for Cactus

Kotlin

Wiki

cactus-compute/demo-cactus-chat

Demo chat app using Cactus

TypeScript

Wiki

Osaurus

1 repo·5.5k ★·95 commits this week

osaurus-ai/osaurus

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

Swift

5.5k

Wiki

TurboDeRP (ExLlamaV2)

1 repo·4.5k ★

turboderp-org/exllamav2

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

Python

4.5k

Wiki

OpenNMT

1 repo·4.5k ★·5 commits this week

OpenNMT/CTranslate2

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

C++

4.5k

Wiki

OpenXLA

1 repo·4.3k ★·249 commits this week

openxla/xla

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

C++

4.3k

249

Wiki

Luminal AI

1 repo·2.8k ★·17 commits this week

luminal-ai/luminal

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

Rust

2.8k

Wiki

Liquid AI

5 repos·2.8k ★·6 commits this week

Liquid4All/cookbook

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

Jupyter Notebook

2.0k

Wiki

Liquid4All/liquid-audio

Speech-to-Speech audio models by Liquid AI

Python

511

Wiki

Liquid4All/leap-finetune

Minimal fine-tuning repo for LFM2, fully open-source

Python

173

Wiki

Liquid4All/LeapSDK-Examples

Example apps for LeapSDK

Kotlin

Wiki

Liquid4All/docs

Liquid AI documentation

MDX

Wiki

Fluid Inference

3 repos·2.2k ★·16 commits this week

FluidInference/FluidAudio

On-device audio inference framework

Swift

2.1k

Wiki

FluidInference/mobius

Fluid Inference core runtime

Python

Wiki

FluidInference/text-processing-rs

Rust text processing library for inference

Rust

Wiki

Try Mirai

2 repos·1.7k ★·28 commits this week

trymirai/uzu

Mirai's on-device inference runtime

Rust

1.6k

Wiki

trymirai/lalamo

Mirai's LLaMA-based on-device model

Python

Wiki

UbiquitousLearning

1 repo·1.5k ★

UbiquitousLearning/mllm

Multimodal LLM inference framework for mobile & edge

C++

1.5k

Wiki

Qualcomm

2 repos·1.5k ★·35 commits this week

qualcomm/ai-hub-models

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

Python

1.0k

Wiki

qualcomm/ai-hub-apps

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

Python

412

Wiki

ARM Software

1 repo·1.3k ★

ARM-software/armnn

ARM Neural Network SDK for ARM & Mali devices

C++

1.3k

Wiki

AMD ROCm

4 repos·1.1k ★·101 commits this week

ROCm/aiter

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

Python

440

Wiki

ROCm/AMDMIGraphX

AMD's graph inference engine for MI-series GPUs

C++

304

Wiki

ROCm/flash-attention

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

Python

231

Wiki

ROCm/ATOM

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

Python

Wiki

NimbleEdge

2 repos·535 ★

NimbleEdge/deliteAI

NimbleEdge's deliteAI on-device inference framework

C++

533

Wiki

NimbleEdge/executorch

NimbleEdge fork of ExecuTorch with edge optimizations

Python

Wiki

Picovoice

1 repo·312 ★

Picovoice/picollm

Picovoice's on-device LLM inference engine

Python

312

Wiki

Zetic AI

5 repos·63 ★

zetic-ai/ZETIC_Melange_apps

MLange sample applications

Swift

Wiki

zetic-ai/zetic_mlange_ext

MLange extension library

C++

Wiki

zetic-ai/ZeticMLangeiOS

iOS framework for MLange

Swift

Wiki

zetic-ai/ZETIC_MLange_document

MLange SDK documentation

Python

Wiki

zetic-ai/ZeticMLangeExtiOS

iOS extension framework for MLange

Swift

Wiki