Cloudflare Infire: Rust inference engine — 7% faster than vLLM, 82% less CPU
Source blog_post Strength strong
Cloudflare built Infire, an LLM inference engine written in Rust. Results: 7% faster than vLLM 0.10.0, uses only 25% CPU vs vLLM's >140% (82% reduction). Cuts CPU overhead via compiled CUDA graphs. Powers Llama 3.1 8B on Cloudflare edge. Real production evidence that Rust is viable for AI infrastructure.
Published June 1, 2025
Added March 21, 2026