Skip to content

p1: Prompt chaining

You start the Patterns track here. In Foundations each challenge was one model call. From now on you compose several of them, in code you write. This one teaches the simplest composition: a chain. You break a task into small steps, run them in order, and put a gate between them so a bad result never flows downstream.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run make p1 and read the draft, which usually skips the £500 budget or the call to action, with no review and no repair path.
  2. Edit start/agent.py: do TODO 1 (call the reviewer), TODO 2 (the plain-code gate plus the missing list), TODO 3 (the editor’s instructions plus its run), and TODO 4 (re-review the rewrite).
  3. Done when a failing draft is repaired and the final re-review reports both checks true.

The task is to produce a trip pitch that mentions the traveller’s £500 budget and ends with a call to action. Asking one prompt for all of that at once is hopeful: you cannot tell whether the model did it. A chain makes the work easier to debug. Draft the pitch, judge the draft, fix only what the judge flagged, then check the fix.

The gate is what matters. The reviewer returns a typed verdict, so the decision to ship or to fix is an ordinary if. That is what makes a workflow a workflow: you hold the control flow, in code you can read and test.

draft  ->  [ review: typed verdict ]  ->  gate  --pass-->  ship
                                            \--fail-->  edit only what failed  ->  ship

The reviewer returns a typed verdict, so a plain if in your code decides whether to ship or to fix.

Forget travel. Say you gate a pull-request description: it must summarise the change and give a test plan. A reviewer returns a typed verdict, so the decision is a plain if, not more prose:

class PRCheck(BaseModel):
    has_summary: bool = Field(description="true only if the description summarises the change")
    has_test_plan: bool = Field(description="true only if it says how the change was tested")


reviewer = Agent(model, output_type=PRCheck, instructions="...")

# The gate is plain code, because the verdict is typed:
verdict = (await reviewer.run(pr)).output
missing = [
    None if verdict.has_summary else "add a one-line summary",
    None if verdict.has_test_plan else "add a test plan",
]
missing = [m for m in missing if m]
if not missing:
    return pr                                          # pass: ship it
fixed = await editor.run(f"Fix only: {'; '.join(missing)}\n\n{pr}")

Three moves. The reviewer’s output_type turns its judgement into booleans (the f3 trick), so the gate is an ordinary if. The missing list holds only the failed checks, so the editor repairs those and leaves the rest alone. And the reviewer is a separate call from the writer, so its verdict is independent of the thing it judges. Below you write that gate and the editor’s brief for TripMate.

Open start/agent.py. The writer and the reviewer are provided. The writer is deliberately never told about the £500 budget or the call to action, so the gate has something real to catch; the reviewer returns a typed QualityCheck (mentions_budget, has_call_to_action, notes) verdict. What is blank: the gate, the editor’s instructions, and the calls that wire them.

make p1

The writer produces a vivid Lisbon pitch. Read it back: it often skips the £500 budget, and it often does not ask you to book anything. Nothing checks its work yet, so you have a nice paragraph and no way to repair it.

The rest of the file fixes that dead end.

  1. Run it and read the draft. Run make p1 and look at what comes back. The writer was never asked for the budget or a call to action, so it usually drops one or both. There is no review, no gate, no repair path.

  2. Call the reviewer and read the verdict (TODO 1). Run the reviewer on the draft and read its typed .output, the same call-an-agent, read-.output move you used in f3. Print the two booleans and the notes, so you can see what the gate will act on.

  3. Write the gate (TODO 2). This is the heart of the pattern, and it is plain code, not a model call. Read the two booleans on verdict. If both pass, print [gate] passed and return, and the draft ships as is. Otherwise build a missing list naming only the failed checks, in words the editor can act on (the budget mention, the call to action), and drop the ones that passed. That missing list is the whole point of the gate: it carries forward what still needs fixing and nothing else.

  4. Write the editor’s instructions and call it (TODO 3). First fill in the empty instructions on the editor agent: keep its brief tight so the edit stays faithful, fix only the missing points, preserve the draft’s voice, keep the length, return only the rewritten pitch. Then call editor.run(...) with a prompt that hands it two things, the missing points to fix and the draft to rewrite, and read .output into improved. Sending only the failed points, not “make it better”, is what keeps the edit small and faithful.

  5. Re-review the edit (TODO 4). Run the same review call you made in step 2, this time on improved instead of draft, and read its .output into a final verdict you can print. Without this you only know you attempted a repair, not whether it worked.

  6. Try to skip the chain (poke it). Add "mention the £500 budget and end with a call to action" to the writer’s instructions, so the draft attempts everything in one shot. Run it a few times. A small model still drops one requirement now and then. The gate turns “usually” into “every time”, and it is the part a single prompt cannot give you.

  7. Check you’ve got it. You should be able to point at the gate line and say why it is code and not a model call, and say in one sentence how a workflow differs from an agent. Scroll up to the trace too: you will see separate agent run spans in sequence (writer, reviewer, maybe editor, then final reviewer), each with its own token usage. When the gate passes first time, the editor and final-review spans are absent, because the repair path never ran.

Stuck? finish/agent.py is the canonical version. Read it after you’ve had a real go.

  • A gate that is not plain code. If you find yourself asking a model “should I rewrite this?”, stop. The reviewer already returned booleans; the decision is an if.
  • Feeding the editor everything. Send only the failed points, not “make it better”. A specific instruction produces a faithful edit; a vague one rewrites the voice away.
  • Over-checking. Two concrete, checkable criteria beat ten fuzzy ones. The reviewer is only as reliable as its criteria are concrete.
  • No exit. This chain runs each step once. If you ever loop a fix-and-recheck (the p4 evaluator-optimizer), bound the loop so it cannot run forever.
Workflows vs agents

This is the distinction the rest of the workshop turns on, from Anthropic’s “Building effective agents”.

A workflow orchestrates model calls through code paths you write. You decide the order and the branches. That is this challenge.

An agent is a model that directs its own process and tool use in a loop. The model decides the order. That is the agentic challenge later in the workshop.

Workflows give you predictability and a place to put checks; agents give you flexibility when you cannot predict the steps. You are learning to choose between them.

Why split the writer and the reviewer at all?

You could ask one agent to “write a pitch, on budget, with a call to action, and tell me if you succeeded”. On a large model that often works. It degrades on a small one, and it hides the failure: the same call that writes the pitch also grades it, so a miss and a false “looks good” arrive together.

Splitting the roles gives each call one job and gives you an independent verdict you can gate on. Anthropic gives this “separation of concerns” as the reason for chaining and routing.

When is a chain the wrong tool?

When you cannot predict the steps. A chain is a fixed path: draft, review, edit, in that order, always. If the work needs a different number or order of steps depending on the input (search this, then maybe search that, then maybe call a tool), a fixed chain fights you and an agent fits better. The agentic challenge is that case.

Next up is p2, where the gate becomes a fork. You classify the input first, then send it down a different path depending on what it is.