// category
AI
4 articles on AI.
AIHow to Build a Production-Ready Multi-LLM System: A 2026 Architecture Guide
A deep architecture guide to multi-LLM systems — model routing, fallbacks, cost instrumentation, and caching — from someone who runs these in production and cut a client's model bill 40–60%.
AIRAG Explained: Building Retrieval-Augmented Generation with LangChain
A practical LangChain RAG tutorial that goes past the demo — chunking strategy, embedding choice, hybrid search, evaluation, and the source-citation grounding that keeps a chatbot from making things up.
AIFastAPI for AI Apps: Serving LLMs in Production Without the 2am Pages
How to serve LLMs in production with FastAPI — async streaming endpoints, auth, rate limiting, caching, and observability. The production scaffolding I rebuilt one too many times, explained.
AIChoosing a Vector Database in 2026: pgvector vs Pinecone vs Chroma
A practical vector database comparison for RAG — pgvector vs Pinecone vs Chroma on cost, scale, ops, and filtering. Which one I default to, when I switch, and the decision rule I use on client builds.