Resilience: failures as data
Self-serve track. Not part of the live 90-minute block; do it any time after Foundations. Run with
make resilience(reference:make solution-resilience).
Every tool so far has worked. Real APIs go down, and users type places that do not exist. This challenge is about what the agent does when a tool fails.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
make resilienceand watch the run crash on Atlantis, becauseget_weatherraises for an unknown city. - Edit
start/agent.py: do TODO 1 (replace theraiseinget_weatherwith a{"error": ...}return carrying a recovery hint), TODO 2 (add a recovery clause to theinstructions). - Done when the run no longer crashes, the model names the Atlantis failure, and it offers a real alternative instead of stopping.
You ask TripMate to plan a weekend in Atlantis, which no tool can find, and you make that failure into something the model can handle well.
The one idea here: return failures as data, do not raise them. A returned
{"error": "..."} is a normal tool result the model reads and acts on, and the message is
one you wrote, so you can include recovery guidance. A raised exception is different: it
escapes the tool, agent.run(...) propagates it, and your program stops.
The mechanic, in another domain
Section titled “The mechanic, in another domain”Forget travel. Say a charge_card tool can fail. The instinct is to raise:
Pydantic AI does not swallow a raised exception: it escapes the tool, agent.run(...) propagates it, and your program stops. Return the failure as data instead, with a message and recovery guidance you control:
The returned version keeps the agent alive and gives the model something to act on, and the wording is yours. That wording is the whole lesson, so it is yours to write, not to copy. Below you do this to TripMate’s get_weather.
The setup
Section titled “The setup”Open start/agent.py. get_weather raises for any city it does not know, and the prompt asks about Atlantis, which no tool can find. TODO 1 is yours: replace the raise with a return {"error": ...} carrying a recovery hint. TODO 2: add a clause to the agent’s instructions telling TripMate what to do with an error field.
Run it
Section titled “Run it”Build it
Section titled “Build it”-
Run it and watch it crash. Run
make resilience.get_weatherraises for Atlantis, the exception is not caught, and the run falls over: you see “TripMate crashed on a tool error”. One unknown city took the whole plan down, and you had no say in what happened next. -
Return the error as data (TODO 1). In
get_weather, replace theraisewith areturn {"error": ...}whose message names what failed and tells the model what to do next (suggest a real city, or continue without the weather). The exact wording is yours to write, and it is the whole point: the model can only act on what your message says. The TODO comment instart/agent.pymarks the spot. -
Tell the agent what to do with an error field (TODO 2). Add a clause to the instructions, in your own words, telling TripMate what to do when a tool returns an
errorfield: acknowledge the failure plainly, then suggest a real alternative or continue with what worked. Write the sentence yourself; there is a marked spot in the instructions string.Run again. The run no longer crashes. The model gets a message you wrote, names the failure (“no weather data for Atlantis, it is not a real destination”), and offers a real alternative. Both tools fail for Atlantis, so there is nothing to confabulate around: the only honest response is to surface the failure.
-
Run the bare-vs-hinted poke. Temporarily change your error message to just
"No weather data."Predict how TripMate recovers, then run. With nothing to act on, the recovery goes thin: it apologises and stalls. Put your hinted message back and run again. Same failure, two recoveries. The model can only act on what your message says, which is the whole reason you return the error instead of raising. -
Verify what you’ve got. The first run crashes on the raised error. After your changes,
get_weatherreturns{"error": ...}with a recovery hint, the instructions carry your recovery clause, and the agent names the Atlantis failure and offers a real alternative without crashing. You should be able to say why returning an error beats raising one, and whenModelRetryis the right call instead.
- The recovery is thin even after you return the error. Put more in the error message. The model can only act on what the message says; a bare
"error: failed"gives it nothing to work with. - A small model still pushes ahead and invents. granite4.1:3b sometimes does, if any tool succeeded and gave it material. Both tools fail for Atlantis, so there’s nothing to confabulate around: the only honest response is to surface the failure.
- Reaching for
ModelRetryon a permanent failure. It retries the same doomed call until the budget runs out, then pydantic-ai raisesUnexpectedModelBehavior. Use it only when another attempt could actually work; return errors as data for failures that never will.
A couple of things worth knowing
Section titled “A couple of things worth knowing”What about ModelRetry?
Pydantic AI has ModelRetry: raise it from a tool and the model gets your message and
tries the call again. It is the right tool for a transient or fixable failure: a
validation miss the model can correct, a rate limit worth retrying, “you passed an unknown
airport code, try the IATA code.” It is the wrong tool for a permanent failure like
Atlantis: the model retries the same doomed call until it exhausts the retry budget, then
Pydantic AI raises UnexpectedModelBehavior and the run crashes anyway. For a failure
that will never succeed, return it as data so the model adapts instead of retrying. Reach
for ModelRetry only when another attempt could actually work.
Why make both tools fail for Atlantis?
If the flight lookup had succeeded for Atlantis, the model would have a real price to build a pitch around and would bury the weather failure under a confident itinerary, especially a small model. Making every tool fail for the fake destination removes the material to confabulate with, so the only honest response is to surface the failure.
These three self-serve tracks sit outside the numbered path. When you’re done here, head back to the main tracks, foundations f1–f5 and patterns p1–p7.