Blog

Latest Thinking

Insights on AI infrastructure, model training, and production systems.

February 12, 2026

LLM Inference Optimization: Model-Level Techniques

A deep dive into quantization, pruning, knowledge distillation, and FlashAttention for efficient model compression.

February 4, 2026

A deep dive into the cost curve, caching, and tradeoffs behind S3-first vector search at scale.

January 26, 2026

Exploring batching strategies, KV cache management, and speculative decoding for production LLM serving.