p7: Conversation that streams
Every run so far was one shot: one input, one answer, done. A real assistant holds a conversation. Two things make it feel like one, and they pull in different directions.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
printf 'My name is Jag.\nWhat is my name?\nquit\n' | make p7and watch it stream a reply but forget your name on the next turn, because no history is carried. - Edit
start/agent.py: do the TODO (wire memory using the chef-example history pattern: declarehistory, pass it intorun_stream, then refresh it fromresult.all_messages()after each turn). - Done when the same piped script answers “Jag” on the second turn.
The first is streaming, which you already know from f1: agent.run_stream(...) and an
async for over result.stream_text(delta=True), the reply printing as the model writes
it. It is wired here for you; in a chat loop it is what keeps each turn feeling live
rather than frozen.
The second is memory, and that is the part you wire. The agent remembers earlier turns because you feed the whole conversation back in each time. Without it, every turn starts from nothing. Streaming alone does not make a conversation; memory does.
Mental model
Section titled “Mental model”History grows every turn and the model re-reads all of it, which is why memory, not streaming, is what makes a conversation.
Open start/agent.py. The agent is plain, with no tools, so the lesson
stays on delivery and memory. Here is the loop you will work in:
The loop reads input with input("You: ") and caps the turns at MAX_TURNS. A while True
chat loop has no guaranteed end, so pipe it some input and it can hang, or a stuck condition
can spin forever. A bounded for loop always terminates, even unattended. Catching
EOFError lets piped input end the chat cleanly when it runs out.
Right now each turn calls agent.run_stream(user). One prompt in, one reply out, nothing
carried over. You close that gap next.
The mechanic, on a throwaway chat
Section titled “The mechanic, on a throwaway chat”Memory in pydantic-ai is a list of ModelMessages you feed back in each turn. Here it is on a
throwaway chef bot that has nothing to do with TripMate, three moves:
Take the three moves into TripMate’s streaming loop: pass message_history=history into
run_stream, and after each turn refresh history = result.all_messages(). The third
move is the one that creates the memory, drop it and the agent forgets everything it said.
(The chef uses plain run; TripMate streams with run_stream, but the memory wiring is
identical.)
Run it
Section titled “Run it”You can chat interactively, or pipe a script in to test:
Tell TripMate your name, then ask it back. It streams a fluent reply, then on the next turn it does not know your name, because each turn is sent on its own with no history. The streaming works; the memory is not there yet.
Build it
Section titled “Build it”-
Run it and watch it forget. Pipe the script above, or run
make p7and chat. Give your name, ask for it back, and confirm the second answer cannot recall it. -
Declare a running history (TODO). Add a
historylist above the loop, the conversation the agent will see, the samelist[ModelMessage]as the chef example above (importModelMessagefrompydantic_ai.messages). -
Feed the history in, then refresh it (TODO). The two remaining moves: pass
message_history=historyinto therun_stream(...)call, and after each turn reassignhistory = result.all_messages(). That last line is the memory. Run the same script again, and now it answers “Jag”, because the history goes back in on every turn.Stuck?
finish/agent.pyis the canonical version, read it after you’ve had a real go. -
Watch the cost grow (poke). Keep chatting for several turns, then look at
result.usageon the later turns. The input tokens climb every turn, because the whole history is re-sent each time. Memory is not free: long conversations get expensive, which is why production apps eventually summarise or trim old turns. -
Check you’ve got it. You should be able to point at the one line that is the memory (
history = result.all_messages()), say why the loop is bounded rather thanwhile True, and show the input-token count climbing across turns.
- Forgetting to refresh the history. If you pass
message_historyin but never reassignhistory = result.all_messages(), the agent never sees its own previous answers, and it repeats itself or loses the thread. That refresh line is the memory. while True. Always bound the loop. Unbounded input loops hang when piped and can spin forever on a bad condition; the starter usesfor _turn in range(MAX_TURNS)instead.- Unbounded growth. Re-sending the full history every turn costs more tokens each time. Fine for a workshop chat; for a long-running assistant, trim or summarise.
A couple of things worth knowing
Section titled “A couple of things worth knowing”Why all_messages(), not a list of strings?
result.all_messages() returns the structured conversation, including any tool calls and
results, in the form pydantic-ai expects back as message_history. That means later turns
see exactly what happened, tools included, with no manual bookkeeping.
Feeding it straight back is both simpler and more faithful than rebuilding the history from strings yourself. The plain agent here has no tools, but the same line carries them when it does.
Could I keep only the last few messages?
Yes, and that is how real apps keep the cost down. Trim history to the latest handful
before each run, but hold on to anything you cannot afford to lose, like the user’s name. A
tiny summary string you update when they introduce themselves, then prepend to each run,
keeps durable facts cheap.
Raw transcript memory is easy; durable memory needs a little design.
Why catch EOFError?
When you pipe a script in (printf ... | make p7), input() raises EOFError once the
piped lines run out. Catching it ends the loop cleanly instead of crashing, which is what
lets the same file work both interactively and in an automated test. Forgetting it is the
usual cause of a traceback at the end of a piped run.
That is the last of the patterns track. Next is the discussion that closes the workshop: compare what you saw across the workflows you orchestrated (p1 to p4) and the agent patterns the model orchestrated (p5 to p7), and decide where each fits in real work.