p7: Conversation that streams

Every run so far was one shot: one input, one answer, done. A real assistant holds a conversation. Two things make it feel like one, and they pull in different directions.

Quick path

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

Run printf 'My name is Jag.\nWhat is my name?\nquit\n' | make p7 and watch it stream a reply but forget your name on the next turn, because no history is carried.
Edit start/agent.py: do the TODO (wire memory using the chef-example history pattern: declare history, pass it into run_stream, then refresh it from result.all_messages() after each turn).
Done when the same piped script answers “Jag” on the second turn.

The first is streaming, which you already know from f1: agent.run_stream(...) and an async for over result.stream_text(delta=True), the reply printing as the model writes it. It is wired here for you; in a chat loop it is what keeps each turn feeling live rather than frozen.

The second is memory, and that is the part you wire. The agent remembers earlier turns because you feed the whole conversation back in each time. Without it, every turn starts from nothing. Streaming alone does not make a conversation; memory does.

Mental model

turn 1:  [you]  ->  model  ->  [reply]
turn 2:  [you][reply][you]  ->  model  ->  [reply]

History grows every turn and the model re-reads all of it, which is why memory, not streaming, is what makes a conversation.

Open start/agent.py. The agent is plain, with no tools, so the lesson stays on delivery and memory. Here is the loop you will work in:

MAX_TURNS = 50  # a bounded loop, so it always ends even when piped or unattended

for _turn in range(MAX_TURNS):
    try:
        user = input("You: ").strip()
    except EOFError:
        break  # piped input ran out
    if not user or user.lower() in ("quit", "q", "exit"):
        break

    print("TripMate: ", end="", flush=True)
    async with agent.run_stream(user) as result:
        async for chunk in result.stream_text(delta=True):
            print(chunk, end="", flush=True)
    print("\n")

The loop reads input with input("You: ") and caps the turns at MAX_TURNS. A while True chat loop has no guaranteed end, so pipe it some input and it can hang, or a stuck condition can spin forever. A bounded for loop always terminates, even unattended. Catching EOFError lets piped input end the chat cleanly when it runs out.

Right now each turn calls agent.run_stream(user). One prompt in, one reply out, nothing carried over. You close that gap next.

The mechanic, on a throwaway chat

Memory in pydantic-ai is a list of ModelMessages you feed back in each turn. Here it is on a throwaway chef bot that has nothing to do with TripMate, three moves:

from pydantic_ai.messages import ModelMessage

chef = Agent(model, instructions="You are a friendly chef. Keep replies short.")
history: list[ModelMessage] = []                       # 1. the running conversation

# turn 1
result = await chef.run("I'm vegetarian.", message_history=history)   # 2. send the WHOLE history
print(result.output)
history = result.all_messages()                        # 3. refresh it, THIS is the memory

# turn 2, remembers turn 1, because history goes back in
result = await chef.run("Suggest a dinner for me.", message_history=history)
print(result.output)                                   # uses "vegetarian" from turn 1

Take the three moves into TripMate’s streaming loop: pass message_history=history into run_stream, and after each turn refresh history = result.all_messages(). The third move is the one that creates the memory, drop it and the agent forgets everything it said. (The chef uses plain run; TripMate streams with run_stream, but the memory wiring is identical.)

Run it

You can chat interactively, or pipe a script in to test:

printf 'My name is Jag.\nWhat is my name?\nquit\n' | make p7

Tell TripMate your name, then ask it back. It streams a fluent reply, then on the next turn it does not know your name, because each turn is sent on its own with no history. The streaming works; the memory is not there yet.

Build it

Run it and watch it forget. Pipe the script above, or run make p7 and chat. Give your name, ask for it back, and confirm the second answer cannot recall it.
Declare a running history (TODO). Add a history list above the loop, the conversation the agent will see, the same list[ModelMessage] as the chef example above (import ModelMessage from pydantic_ai.messages).
Feed the history in, then refresh it (TODO). The two remaining moves: pass message_history=history into the run_stream(...) call, and after each turn reassign history = result.all_messages(). That last line is the memory. Run the same script again, and now it answers “Jag”, because the history goes back in on every turn.

Stuck? finish/agent.py is the canonical version, read it after you’ve had a real go.
Watch the cost grow (poke). Keep chatting for several turns, then look at result.usage on the later turns. The input tokens climb every turn, because the whole history is re-sent each time. Memory is not free: long conversations get expensive, which is why production apps eventually summarise or trim old turns.
Check you’ve got it. You should be able to point at the one line that is the memory (history = result.all_messages()), say why the loop is bounded rather than while True, and show the input-token count climbing across turns.

Traps

Forgetting to refresh the history. If you pass message_history in but never reassign history = result.all_messages(), the agent never sees its own previous answers, and it repeats itself or loses the thread. That refresh line is the memory.
while True. Always bound the loop. Unbounded input loops hang when piped and can spin forever on a bad condition; the starter uses for _turn in range(MAX_TURNS) instead.
Unbounded growth. Re-sending the full history every turn costs more tokens each time. Fine for a workshop chat; for a long-running assistant, trim or summarise.

A couple of things worth knowing

Why all_messages(), not a list of strings?

result.all_messages() returns the structured conversation, including any tool calls and results, in the form pydantic-ai expects back as message_history. That means later turns see exactly what happened, tools included, with no manual bookkeeping.

Feeding it straight back is both simpler and more faithful than rebuilding the history from strings yourself. The plain agent here has no tools, but the same line carries them when it does.

Could I keep only the last few messages?

Yes, and that is how real apps keep the cost down. Trim history to the latest handful before each run, but hold on to anything you cannot afford to lose, like the user’s name. A tiny summary string you update when they introduce themselves, then prepend to each run, keeps durable facts cheap.

Raw transcript memory is easy; durable memory needs a little design.

Why catch EOFError?

When you pipe a script in (printf ... | make p7), input() raises EOFError once the piped lines run out. Catching it ends the loop cleanly instead of crashing, which is what lets the same file work both interactively and in an automated test. Forgetting it is the usual cause of a traceback at the end of a piped run.

That is the last of the patterns track. Next is the discussion that closes the workshop: compare what you saw across the workflows you orchestrated (p1 to p4) and the agent patterns the model orchestrated (p5 to p7), and decide where each fits in real work.