June 10, 2026·2 min readDeploying LLMs in Production: An Infrastructure PlaybookA practical walkthrough of the infrastructure decisions behind serving large language models reliably — from GPU selection to batching, autoscaling, and observability.LLMInfrastructureInference
May 22, 2026·1 min readCutting GPU Inference Costs Without Hurting LatencyQuantization, batching, and right-sizing strategies that reduced our inference bill by 60% while keeping p99 latency flat.InferenceOptimizationCost
April 30, 2026·2 min readBuilding an MLOps Platform on Kubernetes from ScratchThe core building blocks of a production MLOps platform — model registry, CI/CD for models, and safe rollouts with canaries and shadow deployments.MLOpsKubernetesInfrastructure