r1: Retrieval
You start the RAG track here. Until now the model has answered from what it absorbed in training, which is fine for “what is the capital of Portugal” and useless for “which of our destinations suits this traveller”. Retrieval-Augmented Generation fixes that. You embed your own documents, embed the question, rank your documents by how close they are, and hand the closest few to the model. The model then answers from data you chose.
The new idea is the ranking. Everything around it you have already met: the blurbs are
data, and you expose the search as a tool(), the same shape as f4.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
npm run r1and read the retrieval block: it returns Bali, Cancún and the Swiss Alps for “warm beach on a budget”, becausesearchignores the query and returns the first three. - Edit
start/agent.ts, the body ofsearch: embed the query (TODO 1), score every destination by cosine similarity (TODO 2), sort and take the topk(TODO 3). - Done when the retrieval block lists three real warm-beach destinations in descending score order, and TripMate recommends one of them.
A destination’s blurb becomes a vector. The query becomes a vector in the same space. “Close together” means “about the same thing”, and cosine similarity is how you measure it. Rank by that, keep the top few, and you have retrieval.
The top-K cutoff is the part people underestimate. The model can only recommend what you put in front of it. Retrieve the wrong three and a confident, well-written, wrong answer is exactly what you get.
Mental model
Section titled “Mental model”Each blurb is embedded once at startup. The query is embedded per call. Similarity ranks them; the top K is all the model ever sees.
The mechanic, in another domain
Section titled “The mechanic, in another domain”Forget the catalogue. Say you want the FAQ entry that best answers a question. Embed the answers once, embed the question, score each by cosine similarity, and keep the top few:
embedMany embeds the documents in one batch, embed embeds a single query, and cosineSimilarity (all from ai) scores two vectors. The query and the docs must use the same embedding model, or the scores are noise. That embed → score → sort → slice is retrieval; below you write it for TripMate’s catalogue.
The setup
Section titled “The setup”Open start/agent.ts. The DESTINATIONS catalogue, indexDestinations (it embeds every blurb once at startup into docEmbeddings), and the searchDestinations tool that wraps search are all provided. The blank is the body of search: it returns the first three and ignores the query. The three ranking lines (TODO 1, 2, 3) are yours.
Run it
Section titled “Run it”You get two blocks. The retrieval block prints the “top” three for “warm beach on a
budget”, and the Swiss Alps is in there, because nothing compares the query
to the blurbs yet. The agent block then recommends from that broken set. Fix search and
both blocks come good at once.
The catalogue’s embeddings are pre-generated and committed (
embeddings.embeddinggemma.jsonin this folder), so the first run is instant. Delete that file to regenerate them (a few seconds, once, while Ollama loadsembeddinggemma); a different embedder writes its own. See TROUBLESHOOTING.md.
Build it
Section titled “Build it”-
Run it and read the gap. Run
npm run r1. The scores are all0.000and the order is catalogue order, so “warm beach on a budget” returns the Swiss Alps. That is what “ignores the query” looks like. -
Embed the query (TODO 1). Put the query into the same vector space as the blurbs with one
embed({ model: embeddingModel, value: query })call, and read theembeddingoff the result. Use the sameembeddingModelthe blurbs used, embeddings are only comparable within one model. -
Score every destination (TODO 2). Map over
DESTINATIONSand attach ascoreto each: its cosine similarity to the query vector.docEmbeddingsis parallel toDESTINATIONSby index, soDESTINATIONS[i]’s vector isdocEmbeddings[i], andcosineSimilarity(fromai) takes the two vectors. -
Rank and cut (TODO 3). Sort the scored list highest-first and return the top
k. The rest never reaches the model, that cutoff is the whole game. -
Run it again. The retrieval block should now read something like
0.55 Bali,0.48 Cancún,0.48 Lisbon, real warm-beach options in descending order, and TripMate should recommend one of them and say why. The Swiss Alps drops out because skiing and fondue are not close to “warm beach”. -
Poke the cutoff. Change
search(query, 3)to1, then to6. Atk = 1the model sees a single option and has no choice; atk = 6it sees half the catalogue, including things that do not fit, and the recommendation gets vaguer. Retrieval quality is mostly about giving the model enough and no more. -
Check you’ve got it. You should be able to say, in one sentence, why the same query returned the Swiss Alps before and Bali after, and point at the line that made the difference. Look at the trace too: you will see
ai.embedspans for the query alongside theai.generateTextandsearchDestinationstool spans.
Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.
- Mixing embedding models. Vectors are only comparable if they came from the same model. Embed your docs with one and your query with another and the scores are noise.
- Trusting the top result. The closest match is still only the closest of what you have. If nothing fits, retrieval hands over the least bad option and the model presents it confidently. Watch for that when you poke the query.
- Embedding too much at once. One blurb per vector is fine here. When a document is long, a single vector becomes a blurry average and specific questions stop matching. That is exactly what r2 is about.
A couple of things worth knowing
Section titled “A couple of things worth knowing”Embeddings, in one paragraph
An embedding model turns text into a list of numbers (a vector) so that text about similar things lands in similar places. “Warm beach on a budget” and “Caribbean white-sand beaches, great-value resorts” end up close; “snow-sure skiing, fondue” ends up far. Cosine similarity measures the angle between two vectors, so it ignores length and asks “do these point the same way”. You never read the numbers yourself; you only ever compare them.
Why retrieval is a tool
Nothing new happened at the agent level. You defined a tool() with a Zod schema, gave
it a clear description, and added it to the agent. The model decided to call it. The only
difference from f4 is that the tool’s execute does a similarity search instead of
returning a mock. That is the whole trick: RAG is retrieval inside the tool loop, not a
separate architecture. If the model never calls the tool, the problem is almost always
the description or the instructions, not the embeddings.
Index once, query many
indexDestinations runs once at startup and embeds all eight blurbs. search only embeds
the query. That split matters: embedding documents is the slow, expensive part, so you do
it ahead of time and cache the vectors. Here that cache is a JSON file in the challenge
folder, pre-generated and committed so the first run skips the embed; in production it is a
vector database. The shape is identical, only the store changes.
Next up is r2, where the documents get long. One embedding for a whole multi-paragraph guide is too blunt: you chunk the guide into passages, embed those, and retrieve the paragraph that answers the question.