// category

AI

4 articles on AI.

How to Build a Production-Ready Multi-LLM System: A 2026 Architecture Guide

A deep architecture guide to multi-LLM systems — model routing, fallbacks, cost instrumentation, and caching — from someone who runs these in production and cut a client's model bill 40–60%.

Jun 15, 2026

AI 6 min read

RAG Explained: Building Retrieval-Augmented Generation with LangChain

A practical LangChain RAG tutorial that goes past the demo — chunking strategy, embedding choice, hybrid search, evaluation, and the source-citation grounding that keeps a chatbot from making things up.

May 28, 2026

AI 7 min read

FastAPI for AI Apps: Serving LLMs in Production Without the 2am Pages

How to serve LLMs in production with FastAPI — async streaming endpoints, auth, rate limiting, caching, and observability. The production scaffolding I rebuilt one too many times, explained.

May 2, 2026

AI 7 min read

Choosing a Vector Database in 2026: pgvector vs Pinecone vs Chroma

A practical vector database comparison for RAG — pgvector vs Pinecone vs Chroma on cost, scale, ops, and filtering. Which one I default to, when I switch, and the decision rule I use on client builds.

Apr 20, 2026