Blog

Notes and deep dives on AI infrastructure, LLM deployment, MLOps, and building reliable production ML systems.

3 articles

June 10, 20262 min read

Deploying LLMs in Production: An Infrastructure Playbook

A practical walkthrough of the infrastructure decisions behind serving large language models reliably — from GPU selection to batching, autoscaling, and observability.

LLMInfrastructureInference

May 22, 20261 min read

Cutting GPU Inference Costs Without Hurting Latency

Quantization, batching, and right-sizing strategies that reduced our inference bill by 60% while keeping p99 latency flat.

InferenceOptimizationCost

April 30, 20262 min read

Building an MLOps Platform on Kubernetes from Scratch

The core building blocks of a production MLOps platform — model registry, CI/CD for models, and safe rollouts with canaries and shadow deployments.

MLOpsKubernetesInfrastructure