Projects

Selected work across AI infrastructure, inference platforms, and MLOps tooling. Most are open source — explore the code on GitHub.

LLM Inference Gateway

A high-throughput inference gateway for serving open-weight LLMs with token streaming, request batching, and per-tenant rate limiting. Built to run on Kubernetes with autoscaling backed by GPU node pools.

  • Python
  • vLLM
  • FastAPI
  • Kubernetes
  • Triton

MLOps Platform Blueprint

Reference architecture and Terraform modules for an end-to-end MLOps platform on AWS — feature store, model registry, CI/CD for models, and automated rollout with shadow deployments.

  • Terraform
  • AWS
  • SageMaker
  • MLflow
  • GitHub Actions

GPU Cost Observability

A monitoring stack that attributes GPU utilization and cloud spend to individual models and teams, with Grafana dashboards and Prometheus exporters for inference workloads.

  • Go
  • Prometheus
  • Grafana
  • DCGM
  • Kubernetes

Vision Pipeline Toolkit

A modular toolkit for building real-time computer vision pipelines with hardware-accelerated decoding, model ensembling, and ONNX/TensorRT export for edge deployment.

  • Python
  • PyTorch
  • TensorRT
  • ONNX
  • OpenCV

Embeddings Search Service

A semantic search microservice with pluggable vector stores, hybrid retrieval, and a thin caching layer to keep tail latencies predictable under load.

  • Python
  • Qdrant
  • FastAPI
  • Redis
  • Docker

Model Deployment CLI

A developer-friendly CLI that packages, validates, and rolls out models to staging and production with reproducible builds and automatic canary checks.

  • TypeScript
  • Node.js
  • Docker
  • Helm