Tracked Repositories

129 open-source AI inference repositories across 61 organizations.

129 repositories across 61 organizations

HuggingFace

4 repos·225.7k·88 commits this week

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

160.8k
57

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

33.7k
22

Minimalist Rust ML framework for inference — targets browser WASM and GPU, zero Python dependency

20.3k
9

HuggingFace TGI — LLM serving (archived March 2026, read-only)

10.9k

TensorFlow

3 repos·204.5k·305 commits this week

Industry-standard deep learning framework with XLA compilation backend

195.2k
305

TensorFlow Serving — high-performance gRPC/REST serving for TF models (multi-version, canary, batching)

6.4k

TensorFlow Lite for microcontrollers and embedded devices

2.9k

Ollama

1 repo·171.9k·12 commits this week

User-friendly local LLM runner built on llama.cpp (~167K stars)

171.9k
12

ggml-org

2 repos·162.1k·196 commits this week

High-performance LLM inference in C/C++ (CPU + GPU)

112.1k
119

High-performance Whisper speech recognition in C/C++

50.0k
77

Open WebUI

1 repo·138.1k

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

138.1k

Meta / PyTorch

3 repos·109.1k·557 commits this week

Primary ML framework; torch.compile + AOTInductor for production inference optimization

100.1k
440

PyTorch's portable execution framework for on-device inference

4.6k
117

TorchServe — production PyTorch model serving (archived August 2025)

4.4k

DeepSeek AI

1 repo·103.6k

Reference inference code for DeepSeek-V3 (671B MoE); includes FP8 training framework

103.6k

vLLM Project

2 repos·80.7k·225 commits this week

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

80.7k
207

vLLM community plugin for Intel Gaudi accelerators

39
18

Google AI Edge

12 repos·77.6k·238 commits this week

Cross-platform ML pipeline framework (vision, audio, NLP)

35.3k
26

AI Edge model gallery

23.2k
17

LiteRT for language model inference

5.1k
50

Official Gemma model cookbook — recipes, fine-tuning, deployment guides

3.6k
3

Sample apps using MediaPipe

2.7k

Google's Lite Runtime (successor to TensorFlow Lite)

2.4k
59

Highly optimized neural network operators library (ARM, x86, WASM)

2.3k
40

Model visualization and exploration tool

1.5k
1

LiteRT integration with PyTorch

1.0k
5

Sample code for LiteRT

314
28

Quantization tooling for AI Edge models

133
9

Sample models for AI Edge

22

Nomic AI

1 repo·77.4k

Desktop AI app + SDK for running LLMs locally (~73K stars)

77.4k

Apple / ML-Explore

10 repos·57.9k·22 commits this week

Array framework for ML on Apple silicon (Python)

26.3k
12

Example models and applications using MLX

8.6k

Reverse-engineered Apple Neural Engine (ANE) — hardware ops, memory layout, firmware interactions

6.7k

LLM inference and fine-tuning with MLX

5.4k

Tools for converting & running models with Core ML

5.3k
3

Example apps using MLX Swift

2.6k

Swift bindings for MLX

1.8k

LLM inference in Swift via MLX

517
7

Efficient data loading for MLX

473

C bindings for MLX

205

BerriAI

1 repo·47.8k·116 commits this week

Unified OpenAI-compatible proxy for 100+ LLM providers (vLLM, Ollama, Bedrock, Azure, etc.)

47.8k
116

Oobabooga

1 repo·47.2k·55 commits this week

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

47.2k
55

Mudler (LocalAI)

1 repo·46.4k·56 commits this week

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

46.4k
56

Exo Explore

1 repo·44.8k·7 commits this week

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

44.8k
7

Ray Project

1 repo·42.6k·95 commits this week

Distributed AI compute engine; Ray Serve handles online and async batch inference (~39K stars)

42.6k
95

DeepSpeed AI

1 repo·42.4k·11 commits this week

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

42.4k
11

Microsoft / ONNX

2 repos·41.4k·70 commits this week

Open Neural Network Exchange format specification

20.9k
21

Microsoft's cross-platform, high-performance ONNX inference engine

20.6k
49

LM-Sys

1 repo·39.5k

LLM serving framework and home of Chatbot Arena (~37K stars)

39.5k

JAX (Google DeepMind)

1 repo·35.7k·145 commits this week

Composable NumPy transformations (JIT, grad, vmap) compiled via XLA to GPUs and TPUs — primary DeepMind research/production runtime

35.7k
145

Miscellaneous

4 repos·31.8k·29 commits this week

MLC's universal LLM deployment engine (multi-backend)

22.7k

Tile-based ML language and compiler

6.3k
23

Community on-device LLM project

1.6k

vLLM-style inference on Apple silicon via MLX

1.2k
6

SGLang

1 repo·28.1k·345 commits this week

High-throughput LLM/VLM serving with RadixAttention and structured generation

28.1k
345

Tencent

2 repos·27.9k·15 commits this week

High-performance neural network inference for mobile (Android/iOS)

23.3k
15

Tencent Neural Network — mobile and edge inference

4.6k

NVIDIA

3 repos·27.1k·150 commits this week

NVIDIA's optimized LLM inference library (GPU)

13.7k
148

NVIDIA's high-performance deep learning inference SDK (GPU)

13.0k

C++ LLM/VLM inference runtime for Jetson and NVIDIA edge devices

408
2

Mozilla AI

1 repo·24.5k·2 commits this week

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

24.5k
2

Triton Language (OpenAI)

1 repo·19.2k·25 commits this week

Python-like GPU kernel language used by vLLM FlashAttention and PyTorch inductor

19.2k
25

MLC AI

1 repo·18.0k·1 commits this week

High-performance LLM inference in web browsers via WebGPU

18.0k
1

KVCache AI

1 repo·17.2k·3 commits this week

CPU-GPU hybrid inference; runs DeepSeek 671B on 14GB VRAM + 382GB DRAM with massive speedup over llama.cpp

17.2k
3

Alibaba

1 repo·15.2k·13 commits this week

Alibaba's neural network inference framework for mobile & edge

15.2k
13

jundot

1 repo·14.8k·48 commits this week

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

14.8k
48

Apache

2 repos·13.8k·10 commits this week

Apache TVM ML compiler — auto-tunes models for any hardware target

13.4k
8

Apache TVM Foreign Function Interface for deep learning compilation

397
2

Blaizzy (Community MLX)

5 repos·13.1k·40 commits this week

Audio models (TTS, ASR) with MLX

7.1k
24

Vision-language models on Apple silicon via MLX

4.8k
14

Swift audio inference using MLX

629
2

Text embedding models with MLX

385

Video model inference with MLX

230

K2 / Next-gen ASR

1 repo·12.4k·5 commits this week

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

12.4k
5

OpenVINO Toolkit / Intel

3 repos·11.9k·87 commits this week

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

10.3k
60

Neural Network Compression Framework — quantization, pruning, sparsity for OpenVINO

1.2k
8

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

509
19

RunAnywhere

2 repos·11.9k

RunAnywhere SDKs for on-device inference deployment

10.3k

RunAnywhere CLI tool

1.5k

Intel

2 repos·11.5k·4 commits this week

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

8.8k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity

2.6k
4

Mistral AI

1 repo·10.8k

Official minimal inference library for all Mistral models (7B, Mixtral, Pixtral)

10.8k

Triton Inference Server

1 repo·10.7k·6 commits this week

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

10.7k
6

Dusty-NV (NVIDIA Jetson)

1 repo·8.9k

DNN inference library & tutorials for NVIDIA Jetson

8.9k

BentoML

1 repo·8.7k

Unified serving framework: real-time APIs, task queues, batching, multi-model chains

8.7k

Nexa AI

1 repo·8.1k

Unified SDK for running LLMs and multimodal models locally

8.1k

InternLM / Shanghai AI Lab

1 repo·7.9k·10 commits this week

High-throughput LLM serving with TurboMind engine (C++/CUDA)

7.9k
10

PaddlePaddle (Baidu)

1 repo·7.3k

Lightweight inference engine for mobile & embedded from PaddlePaddle

7.3k

AI Dynamo (NVIDIA)

1 repo·6.9k·158 commits this week

Datacenter-scale distributed inference serving framework (Rust + Python, disaggregated prefill/decode, engine-agnostic)

6.9k
158

ArgMax

4 repos·6.5k

On-device Whisper inference for Apple platforms (Swift)

6.1k

Python tooling for WhisperKit model optimization

244

On-device AI benchmarking framework

88

Swift playground for ArgMax SDK

21

Cactus Compute

5 repos·5.5k·1 commits this week

Cactus core edge inference framework

5.2k
1

React Native bindings for Cactus

172

Flutter bindings for Cactus

71

Kotlin/Android bindings for Cactus

71

Demo chat app using Cactus

28

Osaurus

1 repo·5.5k·95 commits this week

Native macOS AI agent harness in Swift — any model, persistent memory, autonomous execution, MCP server, MLX + Apple Neural Engine, fully offline

5.5k
95

TurboDeRP (ExLlamaV2)

1 repo·4.5k

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

4.5k

OpenNMT

1 repo·4.5k·5 commits this week

Fast C++ inference for Transformer models; INT8/INT16 CPU quantization, multi-platform

4.5k
5

OpenXLA

1 repo·4.3k·249 commits this week

Compiler for JAX, TF, PyTorch targeting GPU, TPU, and CPU from a unified IR

4.3k
249

Luminal AI

1 repo·2.8k·17 commits this week

Rust-based deep learning compiler with a small static graph IR for fast, portable inference (CUDA, Metal, CPU)

2.8k
17

Liquid AI

5 repos·2.8k·6 commits this week

Examples, tutorials and apps for Liquid AI LFM + LEAP SDK

2.0k
2

Speech-to-Speech audio models by Liquid AI

511

Minimal fine-tuning repo for LFM2, fully open-source

173
1

Example apps for LeapSDK

66
1

Liquid AI documentation

25
2

Fluid Inference

3 repos·2.2k·16 commits this week

On-device audio inference framework

2.1k
15

Fluid Inference core runtime

69
1

Rust text processing library for inference

33

Try Mirai

2 repos·1.7k·28 commits this week

Mirai's on-device inference runtime

1.6k
20

Mirai's LLaMA-based on-device model

77
8

UbiquitousLearning

1 repo·1.5k

Multimodal LLM inference framework for mobile & edge

1.5k

Qualcomm

2 repos·1.5k·35 commits this week

State-of-the-art ML models optimized for Qualcomm Snapdragon NPU/DSP/QNN deployment

1.0k
32

Sample apps and tutorials for deploying models on Qualcomm hardware (TFLite, ONNX, QNN)

412
3

ARM Software

1 repo·1.3k

ARM Neural Network SDK for ARM & Mali devices

1.3k

AMD ROCm

4 repos·1.1k·101 commits this week

AI Tensor Engine for ROCm — centralized repo for high-perf AI operators on AMD Instinct GPUs

440
53

AMD's graph inference engine for MI-series GPUs

304
15

ROCm fork of FlashAttention with Composable Kernel (CK) and Triton backends

231

AiTer Optimized Model — lightweight vLLM-like server built on AITER kernels for ROCm

91
33

NimbleEdge

2 repos·535

NimbleEdge's deliteAI on-device inference framework

533

NimbleEdge fork of ExecuTorch with edge optimizations

2

Picovoice

1 repo·312

Picovoice's on-device LLM inference engine

312

Zetic AI

5 repos·63

MLange sample applications

58

MLange extension library

4

iOS framework for MLange

1

MLange SDK documentation

0

iOS extension framework for MLange

0