TripMate: the Pydantic AI path

Everyone starts Foundations f1–f7 · the augmented-LLM primitives

Build a trip-planning agent, one idea at a time, in Python with Pydantic AI. This is the Python path of the two-path ai-workshop; the TypeScript sibling (Vercel AI SDK) lives at ../vercel-ai-sdk and hits the same checkpoints.

First five minutes

You need uv and Ollama running locally.

cd pydantic-ai
uv sync --extra notebook: venv and deps, including Jupyter for the notebooks
ollama pull granite4.1:3b: the default local model (skip if you’ll use Gemini) — and ollama pull embeddinggemma if you’ll do the RAG track (needed even on Gemini)
make verify: don’t go further until you see every check pass (make verify checks that embeddinggemma is pulled when Ollama is up; that check is for the RAG track only)
make f1: your first agent call

Foundations, Patterns, and the RAG track are Jupyter notebooks: you read, run, and edit cells in place. Open them however you like, make lab launches Jupyter, or open the .ipynb files in VS Code, Cursor, or Colab. The RAG track also runs as scripts (make r1, make r2). Each notebook is self-contained and needs no API key.

What you’re building

You: Plan a weekend in Lisbon for me

TripMate:
  [calls lookup_traveler]                  -> Jag, London, hiking + food, £500
  [calls get_weather("Lisbon")]            -> 22°C, sunny, wildfire advisory
  [calls get_flights("London", "Lisbon")]  -> TAP £142

  Hi Jag. Lisbon for £142 on TAP, 22°C and sunny. Heads up: smoke advisory
  mid-afternoon, so plan hikes for the morning.

By the end you have built that agent, streamed it, given it tools from a server you did not write, put it behind a web UI, and made it survive a tool failure.

The model

Ollama (granite4.1:3b) is the default and needs no API key. For Gemini, set GOOGLE_GENERATIVE_AI_API_KEY in your environment and it switches automatically.

Already have a key for another provider (OpenAI, Anthropic, Mistral, …)? Set it in your environment and replace the model assignment in shared/model.py with a provider:model string, e.g. model = "anthropic:claude-sonnet-4-6"; the models list has them all.

The notebooks inline that Ollama-or-Gemini switch in their setup cell, so each one is self-contained (amend that cell to use your provider there). The scripts (the app, mcp, make verify) read it from shared/model.py.

Tracing

Every challenge from f2 on is traced. The best way to read a run is autotel-devtools, a local browser trace viewer. Start it in one terminal, then run challenges in another:

make devtools          # terminal 1: autotel-devtools on http://127.0.0.1:4446

Then run a challenge from f2 on as a script, make f2, and open http://127.0.0.1:4446: you get the run as a tree of spans you can click into, with the model, the prompt and response, the timings, and the token usage. enable_tracing() finds the viewer when that port holds autotel-devtools, and stays console-only when it does not. (In the Jupyter notebooks the same span tree prints inline in the cell output.)

The same spans also print to your console as a tree, via logfire configured with send_to_logfire=False, so you can read a run with no viewer open. The TypeScript path does the same through autotel, so a run looks the same whichever language you are in.

logfire is a fully OTLP-compliant OpenTelemetry SDK, so the spans can go anywhere else too: set OTEL_EXPORTER_OTLP_ENDPOINT (or WORKSHOP_OTLP_ENDPOINT) to another OTLP/HTTP viewer (otel-tui, Jaeger, Grafana Tempo) and they render there instead.

Challenges

A common Foundations trunk, then the tracks you choose between, following Anthropic’s “Building effective agents”. Everyone does Foundations (the augmented LLM), then picks one track for the room session (Patterns, RAG, or Full-Stack). The others are take-home. Jump straight to a track if you already know the basics.

The workshop ends with a 20-minute discussion (see DISCUSSION.md): wherever you got to, the closing questions are the same — was working with the model what you expected, what surprised you, what did you learn. You can stop after Foundations and still contribute fully.

Foundations, Patterns, and RAG are notebooks: open the .ipynb in each challenge folder and run the cells top to bottom, editing the TODO cells as you go. make lab opens Jupyter; the reference solution sits in a collapsed “Solution” block in each notebook.

Foundations (f1–f7): the augmented LLM

#	Challenge	Goal	Notebook
f1	Hello + the two inputs	Call an agent; shape it with `instructions` vs prompt; stream the reply	`foundations/f1_hello/`
f2	See the loop + tokens	Inspect `all_messages()` and `usage`, read the trace	`foundations/f2_inspect/`
f3	Structured output	Typed `Recommendation` via `output_type=`, no parsing	`foundations/f3_structured_output/`
f4	Tools	Write a tool the model calls; using it takes another round-trip	`foundations/f4_tools/`
f5	Guardrails	A cheap check runs first and refuses off-topic or unsafe requests	`foundations/f5_guardrails/`
f6	Descriptions (authoring lab)	A docstring routes the model to the right tool	`foundations/f6_descriptions/`
f7	Testing	Prove the gate’s branches with `TestModel` + `override`, no real model call	`foundations/f7_testing/`

Don’t just complete Foundations: experiment. Rerun each challenge with different instructions and watch what changes. How short can a prompt get before the agent loses the plot? Find where you have to spell things out, and where the model works it out on its own.

Then your choice of track. (The Discussion closes the workshop at the end.)

Patterns track (p1–p7): you orchestrate, then the model does

#	Challenge	Goal	Notebook
p1	Prompt chaining	Draft, check with a code gate, then fix only what failed	`patterns/p1_chaining/`
p2	Routing	Classify the input, then branch in code to a specialist	`patterns/p2_routing/`
p3	Parallelization	Fan out independent reviewers with `asyncio.gather`, then aggregate	`patterns/p3_parallelization/`
p4	Evaluator-optimizer	Score and improve in a loop until a bar or a cap	`patterns/p4_evaluator/`
p5	Agentic	Tools chain in order; the traveller arrives via typed `deps`	`patterns/p5_agentic/`
p6	Delegation	An orchestrator agent whose tools are other agents (`ctx.usage`)	`patterns/p6_delegation/`
p7	Conversation	A multi-turn chat loop that streams and remembers (`message_history`)	`patterns/p7_conversation/`

RAG track (r1–r2): ground the model in your own data

Needs a local embedding model: ollama pull embeddinggemma (separate from the chat model, and needed even if you chat on Gemini). Retrieval uses Pydantic AI’s Embedder.

#	Challenge	Goal	Notebook
r1	Retrieval	Embed your docs with `Embedder`, rank by cosine similarity, expose search as a tool	`rag/r1_retrieval/`
r2	Chunking	Split long documents into passages so a specific question matches a specific paragraph	`rag/r2_chunking/`

Full-stack track: the agent behind a web UI (assumes some frontend comfort)

You start from a production-shaped template (a FastAPI backend running a Pydantic AI agent, bridged to a React useChat UI) and build on it. Clone it with the workshop CLI, then add your own tool and watch its tool-call card render. See app/fullstack/ for the full lesson.

npx @jagreehal/ai-workshop fullstack-pydantic
cd fullstack-pydantic && npm install && uv sync --extra dev && npm run dev

Two self-serve tracks go further, top-level siblings of patterns/rag: resilience is a notebook (resilience/); mcp runs a separate server, so it stays a script (make mcp-server then make mcp).

What’s in a challenge

Foundations, Patterns, and the resilience track each ship as a folder with a notebook plus a runnable script pair:

foundations/f1_hello/
├── README.md          # canonical lesson (the site + GitHub doc)
├── start/agent.py     # canonical starter: `make f1`
├── finish/agent.py    # canonical solution: `make solution-f1`
└── f1_hello.ipynb     # generated from the three above (`make notebooks`); run cells in Jupyter

The README + start + finish are the source of truth (same as the TypeScript path). The notebook is generated from them with make notebooks. Prefer the terminal? make f1 runs the starter and make solution-f1 the reference. Prefer cells? Open the notebook (make lab) and work top to bottom; it has the same starter and a collapsed Solution block.

The full-stack track (app/fullstack/) is a separate template you clone with the workshop CLI, not a notebook. mcp stays a script because it runs a separate server process (mcp/mcp_server.py).

The mental model

If you get lost, ask four questions:

What does the model know right now?
What tool is available to help it?
What shape of input does that tool expect?
What does the tool return back into the loop?

Everything in the workshop is a variation on that loop: the model generates, decides when it needs a tool, you run the tool, the result goes back, and it repeats until the model can answer.

Tech stack

Pydantic AI (pydantic-ai): Agent, the class used in every challenge
Pydantic: typed output_type models and tool signatures
Ollama or Google: the model, switched in shared/model.py
logfire: OpenTelemetry tracing, printed to the console (OTLP viewer optional)
MCP (mcp, pydantic_ai.mcp): the mcp self-serve track
FastAPI + VercelAIAdapter (pydantic_ai.ui.vercel_ai) + React (@ai-sdk/react useChat): the full-stack track’s web UI

Two paths, same outcomes

If you also know the TypeScript path: the concepts line up one-to-one, the idioms differ. Agent(instructions=...) ↔ new ToolLoopAgent({ instructions }), @agent.tool_plain ↔ tool({ description, inputSchema, execute }), output_type= ↔ Output.object, deps/RunContext ↔ a closure, run_stream ↔ .stream(), MCPToolset ↔ createMCPClient. See CLAUDE.md for the full table.

License

Code (starter and solution files, scripts) — MIT; the LICENSE file ships alongside it.
Lessons (this and the challenge READMEs, diagrams) — CC BY-NC 4.0: share and adapt with attribution, not commercially. Build on the code freely; don’t resell the lessons.