p2: Routing

A chain (p1) runs every step in order. Routing does the opposite: it picks one path. A small classifier call labels the input, then your code branches to the component built for that kind of work. The model reads meaning, your code picks the path, and you write no keyword matching.

The classifier is only as good as its prompt, so that prompt is the thing you write here.

Quick path

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

Run make p2 and watch every query land on one generalist, with no visibility into where the work goes.
Edit start/agent.py: write the triage prompt (TODO 1), then classify (TODO 2), branch by label (TODO 3), and dispatch, deleting the generalist lines (TODO 4).
Done when each query prints a [route] line and reaches a different specialist: weather to the packing expert, flights to the booking expert, food to the general advisor.

Mental model

query  ->  classify (in code)  ->  weather | booking | general  ->  one answer

The model labels the query; your code reads the label and picks the one branch. There is no if on keywords.

This is the same idea you met in f6, one layer up. There the model routed inside a single call, choosing which tool to invoke from the tools’ descriptions. Here you route, in code: the classifier returns a label, and your lookup chooses which agent runs next. That decision is a line you can read, log, and test.

The mechanic, in another domain

Forget travel. A support desk routes tickets to the right team with a classifier whose whole brain is its prompt:

class TicketTriage(BaseModel):
    category: Literal["billing", "technical", "account"]
    reasoning: str = Field(description="one short clause on why")


triage_tickets = Agent(
    model,
    output_type=TicketTriage,
    instructions=(
        "Label a support ticket as exactly one category. "
        "billing: charges, refunds, invoices, payment methods. "
        "technical: errors, crashes, things not working. "
        "account: login, passwords, profile and settings. "
        "Pick the single best fit."
    ),
)

Two things make this work. output_type returns the label typed, so the branch is an ordinary table lookup, not guesswork. And the prompt names each category in its own words, so the model can tell them apart; a blurry prompt (“sort the ticket”) gives coin-flip labels. Below, you write TripMate’s version of that prompt.

The setup

Open start/agent.py. The Triage schema (weather | booking | general), three specialist agents, and the routing table are all provided. The one blank is triage’s instructions, the prompt that defines the three categories. That is TODO 1.

specialists = {"weather": weather_expert, "booking": booking_expert, "general": general_expert}

The table keys and the Literal match, so once the classifier returns a label, dispatch is one lookup.

Build it

Run it and watch the generalist cope. Run make p2. Every query goes to one general_expert. It manages, but one prompt cannot specialise, and you cannot see where the work goes.
Write the triage prompt (TODO 1). Fill triage’s instructions so it labels each query weather, booking, or general. Name each category in your own words, like the support-desk example, and tell it to pick the single best fit.
Classify (TODO 2). Call triage on the query, read category and reasoning off .output (the f3 move), and print a [route] line so you can see the label your branch will use.
Branch and dispatch (TODO 3, 4). Look the specialist up in specialists by category (plain code, no model call), run it on the query, print the answer, and delete the two cycle-1 generalist lines. Each query now takes exactly one route.
Find the edges (poke it). Add a query of your own to QUERIES and predict its route. Then try a blurry one like "What should I pack?" with no mention of weather; on a small model it sometimes lands in general. Read the reasoning to see why. Tightening the triage prompt is how you fix it.
Check you’ve got it. Point at the branch line and say why it is code, not a model call, and how this differs from the in-call tool choice of f6. The trace shows two spans per query: the classify call, then one specialist call. You never see two specialists run for the same query.

Stuck? finish/agent.py is one version of the triage prompt. Read it after you write your own; yours will read differently, and that is fine.

Traps

Branching with a model. Once the classifier returns a label, the branch is a lookup. If you ask a model “which agent should handle this?”, you have rebuilt the classifier. Keep the branch in code.
Fuzzy categories. Routing is only as reliable as the labels are distinct. Overlapping categories (“weather” vs “general”) coin-flip on a small model; make each one a clear, separate job.
No default. Inputs that fit nothing should land somewhere safe. general is the catch-all here.

A couple of things worth knowing

Why classify with a separate call at all?

You could give one agent every instruction and hope it adopts the right voice per question. On a small model it blurs the personas together and you lose the ability to tune one case without disturbing the others.

A separate classifier gives you a typed label you can branch on, log, and test, and it lets each specialist stay small and focused. Anthropic gives this “separation of concerns” as the reason to route.

Routing (you decide) vs an agent (the model decides)

Routing is a workflow: you fix the branches in advance and the model only labels the input. That fits when the categories are known and stable. When you cannot predict them, or one request needs several at once in an order you cannot fix, a single agent with all the tools deciding for itself fits better. That is the agentic challenge (p5). Routing trades the agent’s flexibility for a control flow you can see.

Next up is p3, where instead of picking one path you run several at once and combine the results.