Powered by RunAnywhere — On-device AI infrastructure

Tracked Repositories

117 repositories across 44 organizations.

huggingface

3 repos
huggingface/transformersPython42 commits

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

159.1kDeepWiki
huggingface/diffusersPython22 commits

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

33.3kDeepWiki

HuggingFace TGI — LLM serving (archived March 2026, read-only)

10.8kDeepWiki

ollama

1 repo
ollama/ollamaGo30 commits

User-friendly local LLM runner built on llama.cpp (~167K stars)

168.3kDeepWiki

ggml

2 repos
ggml-org/llama.cppC++93 commits

High-performance LLM inference in C/C++ (CPU + GPU)

102.8kDeepWiki

High-performance Whisper speech recognition in C/C++

48.4kDeepWiki

open-webui

1 repo

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

130.9kDeepWiki

nomic-ai

1 repo

Desktop AI app + SDK for running LLMs locally (~73K stars)

77.3kDeepWiki

vllm-project

2 repos
vllm-project/vllmPython159 commits

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

75.9kDeepWiki

vLLM community plugin for Intel Gaudi accelerators

38DeepWiki

google

12 repos

Cross-platform ML pipeline framework (vision, audio, NLP)

34.6kDeepWiki
google-ai-edge/galleryKotlin13 commits

AI Edge model gallery

19.9kDeepWiki

LiteRT for language model inference

3.2kDeepWiki
google-ai-edge/mediapipe-samplesJupyter Notebook4 commits

Sample apps using MediaPipe

2.6kDeepWiki
google/XNNPACKC71 commits

Highly optimized neural network operators library (ARM, x86, WASM)

2.3kDeepWiki

Google's Lite Runtime (successor to TensorFlow Lite)

2.1kDeepWiki

Model visualization and exploration tool

1.4kDeepWiki
google-ai-edge/litert-torchJupyter Notebook2 commits

LiteRT integration with PyTorch

987DeepWiki

Sample code for LiteRT

266DeepWiki

Quantization tooling for AI Edge models

114DeepWiki

Sample models for AI Edge

19DeepWiki

AI Edge APIs — upstream repo deleted (404), local copy retained

apple

9 repos
ml-explore/mlxC++8 commits

Array framework for ML on Apple silicon (Python)

25.3kDeepWiki

Example models and applications using MLX

8.5kDeepWiki

Tools for converting & running models with Core ML

5.2kDeepWiki
ml-explore/mlx-lmPython8 commits

LLM inference and fine-tuning with MLX

4.6kDeepWiki

Example apps using MLX Swift

2.5kDeepWiki

Swift bindings for MLX

1.7kDeepWiki

Efficient data loading for MLX

468DeepWiki

LLM inference in Swift via MLX

365DeepWiki
ml-explore/mlx-cC++3 commits

C bindings for MLX

192DeepWiki

oobabooga

1 repo

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

46.4kDeepWiki

mudler

1 repo
mudler/LocalAIGo58 commits

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

45.1kDeepWiki

exo-explore

1 repo
exo-explore/exoPython9 commits

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

43.5kDeepWiki

deepspeedai

1 repo
deepspeedai/DeepSpeedPython2 commits

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

42.0kDeepWiki

microsoft

2 repos
onnx/onnxPython10 commits

Open Neural Network Exchange format specification

20.6kDeepWiki

Microsoft's cross-platform, high-performance ONNX inference engine

19.8kDeepWiki

lm-sys

1 repo

LLM serving framework and home of Chatbot Arena (~37K stars)

39.4kDeepWiki

miscellaneous

5 repos
mlc-ai/mlc-llmPython4 commits

MLC's universal LLM deployment engine (multi-backend)

22.4kDeepWiki
tile-ai/tilelangPython15 commits

Tile-based ML language and compiler

5.5kDeepWiki

Community on-device LLM project

1.6kDeepWiki

vLLM-style inference on Apple silicon via MLX

781DeepWiki

Internal fork of OpenEvolve

tencent

2 repos
Tencent/ncnnC++8 commits

High-performance neural network inference for mobile (Android/iOS)

23.1kDeepWiki

Tencent Neural Network — mobile and edge inference

4.6kDeepWiki

nvidia

2 repos
NVIDIA/TensorRT-LLMPython118 commits

NVIDIA's optimized LLM inference library (GPU)

13.3kDeepWiki

NVIDIA's high-performance deep learning inference SDK (GPU)

12.9kDeepWiki

sgl-project

1 repo
sgl-project/sglangPython246 commits

High-throughput LLM/VLM serving with RadixAttention and structured generation

25.6kDeepWiki

mozilla-ai

1 repo

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

24.1kDeepWiki

mlc-ai

1 repo
mlc-ai/web-llmTypeScript3 commits

High-performance LLM inference in web browsers via WebGPU

17.7kDeepWiki

alibaba

1 repo
alibaba/MNNC++10 commits

Alibaba's neural network inference framework for mobile & edge

14.8kDeepWiki

apache

2 repos
apache/tvmPython33 commits

Apache TVM ML compiler — auto-tunes models for any hardware target

13.3kDeepWiki
apache/tvm-ffiC++6 commits

Apache TVM Foreign Function Interface for deep learning compilation

375DeepWiki

blaizzy

5 repos
Blaizzy/mlx-audioPython3 commits

Audio models (TTS, ASR) with MLX

6.6kDeepWiki
Blaizzy/mlx-vlmPython21 commits

Vision-language models on Apple silicon via MLX

4.2kDeepWiki

Swift audio inference using MLX

552DeepWiki

Text embedding models with MLX

346DeepWiki

Video model inference with MLX

186DeepWiki

k2-fsa

1 repo
k2-fsa/sherpa-onnxC++24 commits

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

11.5kDeepWiki

triton-inference-server

1 repo

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

10.5kDeepWiki

openvinotoolkit

2 repos

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

10.0kDeepWiki

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

482DeepWiki

dusty-nv

1 repo

DNN inference library & tutorials for NVIDIA Jetson

8.8kDeepWiki

intel

1 repo
intel/ipex-llmarchivedPython

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

8.8kDeepWiki

nexa-ai

1 repo

Unified SDK for running LLMs and multimodal models locally

7.9kDeepWiki

internlm

1 repo
InternLM/lmdeployPython12 commits

High-throughput LLM serving with TurboMind engine (C++/CUDA)

7.8kDeepWiki

paddlepaddle

1 repo

Lightweight inference engine for mobile & embedded from PaddlePaddle

7.2kDeepWiki

argmax

4 repos

On-device Whisper inference for Apple platforms (Swift)

6.0kDeepWiki

Python tooling for WhisperKit model optimization

241DeepWiki
argmaxinc/OpenBenchJupyter Notebook2 commits

On-device AI benchmarking framework

83DeepWiki

Swift playground for ArgMax SDK

19DeepWiki

cactus-compute

13 repos

Cactus core edge inference framework

4.6kDeepWiki

React Native bindings for Cactus

153DeepWiki

Flutter bindings for Cactus

69DeepWiki

Kotlin/Android bindings for Cactus

66DeepWiki

Function-calling Gemma hackathon project

38DeepWiki

Demo chat app using Cactus

28DeepWiki

Org profile and community health files

Distributed LLM inference via Cactus

Fast model transfer protocol

Fine-grained model transfer protocol

Homebrew tap for Cactus

Open-source claw project

Open-source code project by Cactus

meta

1 repo
pytorch/executorchPython61 commits

PyTorch's portable execution framework for on-device inference

4.5kDeepWiki

turboderp-org

1 repo

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

4.5kDeepWiki

tensorflow

1 repo

TensorFlow Lite for microcontrollers and embedded devices

2.8kDeepWiki

fluid-inference

3 repos

On-device audio inference framework

1.8kDeepWiki
FluidInference/mobiusPython2 commits

Fluid Inference core runtime

60DeepWiki

Rust text processing library for inference

24DeepWiki

try-mirai

3 repos
trymirai/uzuRust23 commits

Mirai's on-device inference runtime

1.5kDeepWiki
trymirai/lalamoPython3 commits

Mirai's LLaMA-based on-device model

71DeepWiki

Swift SDK for Uzu

53DeepWiki

ubiquitouslearning

1 repo

Multimodal LLM inference framework for mobile & edge

1.5kDeepWiki

arm-software

1 repo

ARM Neural Network SDK for ARM & Mali devices

1.3kDeepWiki

nimbleedge

2 repos

NimbleEdge's deliteAI on-device inference framework

532DeepWiki

NimbleEdge fork of ExecuTorch with edge optimizations

picovoice

1 repo

Picovoice's on-device LLM inference engine

309DeepWiki

rocm

1 repo
ROCm/AMDMIGraphXC++15 commits

AMD's graph inference engine for MI-series GPUs

289DeepWiki

zetic-ai

18 repos

MLange sample applications

53DeepWiki

MLange extension library

iOS sample app for MLange

Swift LLM template for iOS

Flutter SDK for MLange

Android sample app for MLange

Kotlin LLM template for Android

React Native LLM template

Org profile and community health files

zetic-ai/ai-edge-torchJupyter Notebook

Zetic fork of Google AI Edge Torch

Convert ONNX models to PyTorch

MLange SDK documentation

iOS framework for MLange

iOS extension framework for MLange

Flutter LLM template

Whisper sample app using MLange

YOLOv8 demo using MLange

Simplified YOLOv8n demo using MLange