Question 1

Why does retrieval usually go wrong in RAG demos?

Accepted Answer

Because the chunking and retrieval strategy is almost always an afterthought. Most demos chunk naively, embed everything the same way, and retrieve purely on cosine similarity. Real RAG requires thinking about how your data is structured, what your users’ queries actually look like, and what “correct retrieval” means for your specific use case.

Question 2

What data formats can you work with?

Accepted Answer

PDFs, Word docs, Notion exports, web scrapes, CSVs, SQL tables, plain text — most common formats. If your data is in it, there’s a path to getting it into a retrieval pipeline.

Question 3

Will the chatbot hallucinate answers?

Accepted Answer

It’s grounded in your documents, so it won’t generate answers from nothing. I also build in source citation tracking so every response can be traced back to the document it came from, and proper fallback handling for questions your data doesn’t cover.

RAG & Chatbots

What it is

Who it's for

What you get

Common questions