r1: Retrieval

You start the RAG track here. Until now the model has answered from what it absorbed in training, which is fine for “what is the capital of Portugal” and useless for “which of our destinations suits this traveller”. Retrieval-Augmented Generation fixes that. You embed your own documents, embed the question, rank your documents by how close they are, and hand the closest few to the model. The model then answers from data you chose.

The new idea is the ranking. Everything around it you have already met: the blurbs are data, and you expose the search as a tool(), the same shape as f4.

Quick path

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

Run npm run r1 and read the retrieval block: it returns Bali, Cancún and the Swiss Alps for “warm beach on a budget”, because search ignores the query and returns the first three.
Edit start/agent.ts, the body of search: embed the query (TODO 1), score every destination by cosine similarity (TODO 2), sort and take the top k (TODO 3).
Done when the retrieval block lists three real warm-beach destinations in descending score order, and TripMate recommends one of them.

A destination’s blurb becomes a vector. The query becomes a vector in the same space. “Close together” means “about the same thing”, and cosine similarity is how you measure it. Rank by that, keep the top few, and you have retrieval.

The top-K cutoff is the part people underestimate. The model can only recommend what you put in front of it. Retrieve the wrong three and a confident, well-written, wrong answer is exactly what you get.

Mental model

query --embed--> [ vector ]
                     |  cosine similarity vs every doc vector
                     v
docs --embed--> [ vectors ] --rank--> top K --> tool result --> model answers

Each blurb is embedded once at startup. The query is embedded per call. Similarity ranks them; the top K is all the model ever sees.

The mechanic, in another domain

Forget the catalogue. Say you want the FAQ entry that best answers a question. Embed the answers once, embed the question, score each by cosine similarity, and keep the top few:

// index once: embed every FAQ answer
const { embeddings: docs } = await embedMany({ model: embeddingModel, values: FAQS.map((f) => f.answer) });

// per query: embed it, score every doc, take the top k
const { embedding: q } = await embed({ model: embeddingModel, value: question });
const top = FAQS
  .map((f, i) => ({ ...f, score: cosineSimilarity(q, docs[i]) }))
  .sort((a, b) => b.score - a.score)
  .slice(0, k);

embedMany embeds the documents in one batch, embed embeds a single query, and cosineSimilarity (all from ai) scores two vectors. The query and the docs must use the same embedding model, or the scores are noise. That embed → score → sort → slice is retrieval; below you write it for TripMate’s catalogue.

The setup

Open start/agent.ts. The DESTINATIONS catalogue, indexDestinations (it embeds every blurb once at startup into docEmbeddings), and the searchDestinations tool that wraps search are all provided. The blank is the body of search: it returns the first three and ignores the query. The three ranking lines (TODO 1, 2, 3) are yours.

Run it

npm run r1

You get two blocks. The retrieval block prints the “top” three for “warm beach on a budget”, and the Swiss Alps is in there, because nothing compares the query to the blurbs yet. The agent block then recommends from that broken set. Fix search and both blocks come good at once.

The catalogue’s embeddings are pre-generated and committed (embeddings.embeddinggemma.json in this folder), so the first run is instant. Delete that file to regenerate them (a few seconds, once, while Ollama loads embeddinggemma); a different embedder writes its own. See TROUBLESHOOTING.md.

Build it

Run it and read the gap. Run npm run r1. The scores are all 0.000 and the order is catalogue order, so “warm beach on a budget” returns the Swiss Alps. That is what “ignores the query” looks like.
Embed the query (TODO 1). Put the query into the same vector space as the blurbs with one embed({ model: embeddingModel, value: query }) call, and read the embedding off the result. Use the same embeddingModel the blurbs used, embeddings are only comparable within one model.
Score every destination (TODO 2). Map over DESTINATIONS and attach a score to each: its cosine similarity to the query vector. docEmbeddings is parallel to DESTINATIONS by index, so DESTINATIONS[i]’s vector is docEmbeddings[i], and cosineSimilarity (from ai) takes the two vectors.
Rank and cut (TODO 3). Sort the scored list highest-first and return the top k. The rest never reaches the model, that cutoff is the whole game.
Run it again. The retrieval block should now read something like 0.55 Bali, 0.48 Cancún, 0.48 Lisbon, real warm-beach options in descending order, and TripMate should recommend one of them and say why. The Swiss Alps drops out because skiing and fondue are not close to “warm beach”.
Poke the cutoff. Change search(query, 3) to 1, then to 6. At k = 1 the model sees a single option and has no choice; at k = 6 it sees half the catalogue, including things that do not fit, and the recommendation gets vaguer. Retrieval quality is mostly about giving the model enough and no more.
Check you’ve got it. You should be able to say, in one sentence, why the same query returned the Swiss Alps before and Bali after, and point at the line that made the difference. Look at the trace too: you will see ai.embed spans for the query alongside the ai.generateText and searchDestinations tool spans.

Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.

Traps

Mixing embedding models. Vectors are only comparable if they came from the same model. Embed your docs with one and your query with another and the scores are noise.
Trusting the top result. The closest match is still only the closest of what you have. If nothing fits, retrieval hands over the least bad option and the model presents it confidently. Watch for that when you poke the query.
Embedding too much at once. One blurb per vector is fine here. When a document is long, a single vector becomes a blurry average and specific questions stop matching. That is exactly what r2 is about.

A couple of things worth knowing

Embeddings, in one paragraph

An embedding model turns text into a list of numbers (a vector) so that text about similar things lands in similar places. “Warm beach on a budget” and “Caribbean white-sand beaches, great-value resorts” end up close; “snow-sure skiing, fondue” ends up far. Cosine similarity measures the angle between two vectors, so it ignores length and asks “do these point the same way”. You never read the numbers yourself; you only ever compare them.

Why retrieval is a tool

Nothing new happened at the agent level. You defined a tool() with a Zod schema, gave it a clear description, and added it to the agent. The model decided to call it. The only difference from f4 is that the tool’s execute does a similarity search instead of returning a mock. That is the whole trick: RAG is retrieval inside the tool loop, not a separate architecture. If the model never calls the tool, the problem is almost always the description or the instructions, not the embeddings.

Index once, query many

indexDestinations runs once at startup and embeds all eight blurbs. search only embeds the query. That split matters: embedding documents is the slow, expensive part, so you do it ahead of time and cache the vectors. Here that cache is a JSON file in the challenge folder, pre-generated and committed so the first run skips the embed; in production it is a vector database. The shape is identical, only the store changes.

Next up is r2, where the documents get long. One embedding for a whole multi-paragraph guide is too blunt: you chunk the guide into passages, embed those, and retrieve the paragraph that answers the question.