Newsletter

Weekly automated briefings on the state of inference.

Cover for 2026-W27

2026-W27·18 min readLatest

Hy3 Sends MoE Serving The Bill

3,162 commits1,168 issues2,767 PRs105 releases

Cover for 2026-W26

2026-W26·15 min read

KV Cache Eats The Scheduler

3,764 commits1,182 issues3,191 PRs96 releases

Cover for 2026-W25

2026-W25·17 min read

Qualcomm Swallows Modular As Inference Splinters

3,155 commits1,029 issues2,841 PRs112 releases

Cover for 2026-W24

2026-W24·15 min read

SGLang Drags DFlash Into Serving

3,893 commits925 issues2,147 PRs76 releases

Cover for 2026-W23

2026-W23·16 min read

Gemma 4 Exposes Inference’s Memory Wall

3,825 commits1,275 issues3,188 PRs132 releases

Cover for 2026-W22

2026-W22·18 min read

vLLM Ignites the KV-Cache War

3,448 commits1,167 issues3,283 PRs139 releases

Cover for 2026-W21

2026-W21·18 min read

Qwen3.7-Max Forces Runtimes Into Session Mode

3,164 commits1,045 issues2,779 PRs94 releases

Cover for 2026-W20

2026-W20·16 min read

llama.cpp Shoves MTP Into the Mainstream

3,702 commits1,340 issues3,109 PRs95 releases

Cover for 2026-W19

2026-W19·19 min read

DeepSeek V4 Drags Every Runtime

3,961 commits1,385 issues3,190 PRs99 releases

Cover for 2026-W18

2026-W18·27 min read

Google Bets LiteRT-LM Owns Edge LLMs

5,247 commits1,900 issues4,147 PRs150 releases

Cover for 2026-W17

2026-W17·19 min read

DeepSeek V4 Sets Off a Stackwide Sprint

4,069 commits1,381 issues3,134 PRs118 releases

Cover for 2026-W16

2026-W16·20 min read

Inference Layers Collapse Into One

3,741 commits1,437 issues2,899 PRs107 releases

Cover for 2026-W15

2026-W15·20 min read

Local Runtimes Turn Into Serving Platforms

3,731 commits1,535 issues2,941 PRs114 releases

Cover for 2026-W14

2026-W14·17 min read

Gemma 4 Ignites the KV-Cache Wars

1,816 commits996 issues1,330 PRs101 releases

Cover for 2026-W13

2026-W13·18 min read

KV Cache Wars Go Local

1,956 commits870 issues1,714 PRs92 releases