Skip to content

f6: Descriptions are the interface

In f4 the agent had one tool. Real agents have several, and the model must pick the right one per question. It picks on the one thing it can read about each tool: its description, which in Pydantic AI is the function’s docstring. It never sees your function body. So the description is the interface it routes on, and writing a good one is the whole job here.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run make f6 with QUERY_TO_RUN = 1 (a packing question). get_weather has no description, so routing flounders.
  2. Edit start/agent.py: TODO 1 (write get_weather’s docstring so query 1 lands on it), TODO 2 (write get_flights’s so query 2 does), then run queries 1, 2, 3 and check each routes right, or to none.
  3. Done when query 1 reaches get_weather, query 2 get_flights, query 3 none, you wrote zero routing logic, and you can say why a vague description breaks it.
prompt  ->  model reads each tool's DESCRIPTION  ->  best match, or none

A clear description owns a kind of question; a vague one owns nothing. There is no if.

A docs assistant with two tools, each scoped by its docstring:

@agent.tool_plain
async def search_docs(query: str) -> dict:
    """Search the API reference. Use for how-to, syntax, and which-function questions."""
    ...

@agent.tool_plain
async def run_snippet(code: str) -> dict:
    """Run a code snippet and return its output. Use for what-does-this-print and debugging questions."""
    ...

“How do I format a date?” routes to search_docs; “why does this print None?” to run_snippet. You wrote no if: each docstring claims a kind of question and the model matches it. Change search_docs to "Gets data." and the same question has nothing to match, so it mis-routes. A description is a claim about which questions a tool answers. Below you write those claims for TripMate.

Open start/agent.py. get_current_time is described for you, as the shape to copy:

"""Get the current local time and part of day for a city. Use for timing and meal questions."""

What it returns, and when to use it. get_weather and get_flights have blank docstrings.

ToolYour description has to win…
get_weatherquery 1, the packing question
get_flightsquery 2, the flight-price question
get_current_timedone (the timing question)
  1. Run it blank. With QUERY_TO_RUN = 1, run make f6. With get_weather undescribed, the packing question has nothing to land on; watch it misfire.
  2. Write get_weather (TODO 1). In your own words, in its docstring, name what it returns and the questions it answers. Predict, run query 1. Lands elsewhere? Your wording overlaps another tool; tighten it.
  3. Write get_flights (TODO 2). Set QUERY_TO_RUN = 2, write it, predict, run. Expect get_flights.
  4. Run query 3. Set QUERY_TO_RUN = 3 (“capital of Portugal?”). Expect no tool: the model already knows this.
  5. Break it on purpose. Set get_weather’s docstring to "Gets data.", re-run query 1. On Gemini routing clearly breaks; on the granite default it is muddier (small models lean on the tool name and prompt too). Only the words changed.
  6. Check you’ve got it. Query 1 to get_weather, query 2 to get_flights, query 3 to none, and you can say why you wrote no routing code.

Stuck? finish/agent.py routes cleanly. Read it after you write your own; your wording will read differently, and that is fine.

  • Wrong tool fires. Two descriptions overlap; tighten each until it owns one kind of question.
  • Query 3 still calls a tool. Small models lean toward using an available tool; re-run, the trend is none.
  • Vague description still routes on granite. Expected. The contrast is sharp on Gemini; swap shared/model.py to see it.
How is this different from the agentic challenge (p5)?

Here the model picks one tool for one question: routing. In p5 (agentic) it calls several tools for one request, in order, feeding each result into the next. Both run on descriptions like the ones you just wrote.

You can now call an agent, read its loop and cost, get typed output, give it a tool, and shape which tool it reaches for by what you write. That is the augmented LLM.

One foundation left. f7 is testing: prove this guardrail blocks the wrong question and lets the right one through, using a fake model so the check is instant, deterministic, and never makes a real call.